Can we machine-learn a programming language semantics?

In this post I will talk about whether denotational semantics of programming languages can be machine-learned and why that could be useful.

But let's set some background first.

Programming language semantics is a collection of mathematical theories about representing the meaning of programming languages in an abstract, machine-independent ways. The semantics of a language can be given either operationally, by a set of syntactic transformations which, loosely speaking, show how a program is evaluated, or can be given denotationally, by mapping any piece of code into a mathematical object, its "meaning". Operational specifications can be given to realistic languages and, as a method, is the workhorse of formal aspects of programming languages from verification to compilation. Denotational interpretations, on the other hand, are mathematically much more mathematically challenging to develop but are also more powerful. In particular, denotational models can be used to reason about equivalence of sub-programs by reducing it to the equality of their denotations. But perhaps the right phrasing is that denotational models would be useful, because they haven't been developed yet for actual programming languages, just for idealised versions of some programming languages.

When the operational and denotational models coincide, i.e. equivalence in one model coincides with equality in the other, the situation is said to be fully abstract, and it's generally considered a good thing. Full abstraction is a technically fascinating problem, but the notion of equivalence it models can be seen as somewhat contrived. I wrote about these issues earlier and I won't reprise the argument. But even with these caveats, it would certainly be neat to be able to say whether two program fragments are equivalent. This could have some nice application in program verification, in particular compiler optimisation verification.

But can we derive a denotational model of a programming language automatically, via some kind of machine learning? This question was posed to me by Martín Abadi, whom I had the pleasure to run into at a very nice research conference in Berlin a couple of years ago. Martín has been a leading light of programming language research for many years, and is now involved with the TensorFlow project at Google. He asked me this question for a particular reason. My research gravitates around a particular kind of denotational semantics called game semantics. I wrote about game semantics earlier, and I recommend these posts for an in-depth informal introduction.

What makes game semantics appealing from the point of view of machine learning is that it is mathematically elementary. Its basic notion is that of a move, i.e. a way in which a sub-program can interact with its broader context. More precisely, function calls and returns, either from the sub-program into the context or the other way: in total, four kinds. Moves are sequenced into plays, which represent the way a sub-program interacts with its larger context during a particular program execution. Finally, all these plays are grouped into sets called strategies, which then model that particular sub-program. So in the end, each sub-program is represented by a set of sequences -- a language. And we know that languages can be, to some extent, machine learned.

Luckily, we can conduct a controlled experiment using one of the numerous idealised languages for which game semantic models (fully abstract nonetheless) have been created. We can generate random plays (sequences) using the model and see whether they can be learned. To make a long story short, it turns out that yes we can learn these sets, for rather complex languages (higher-order functions, state, concurrency). And we can learn them rather well and rather efficiently, using LSTMs and TensorFlow. The full story is in the paper On the Learnability of Programming Language Semantics, which has recently been presented at the 10th Interaction and Concurrency Experience (June 21-22, 2017, Neuchâtel, Switzerland), in collaboration with Khulood Alyahya [PDF].

The work of creating such sequences for production languages is harder, but not impossible. Any sub-term to be modeled in a larger program must be instrumented to record its interactions with the context, in a way similar to a profiler, then the program must be executed on random inputs and the interaction data collected. And then other sub-terms in other programs must be instrumented in the same way. It is a lot of work but it all can be automated. And, considering our proof-of-concept experiment, there is a good chance that the resulting model is going to be accurate enough to be useful!


Popular posts from this blog

What else are we getting wrong?

Comparing apples with oranges: OCaml vs Java in teaching

Welcome to my blog