### Can we machine-learn a programming language semantics?

In this post I will talk about whether denotational semantics of programming languages can be machine-learned and why that could be useful.

But let's set some background first.

When the operational and denotational models coincide, i.e. equivalence in one model coincides with equality in the other, the situation is said to be

But can we derive a denotational model of a programming language automatically, via some kind of machine learning? This question was posed to me by Martín Abadi, whom I had the pleasure to run into at a very nice research conference in Berlin a couple of years ago. Martín has been a leading light of programming language research for many years, and is now involved with the TensorFlow project at Google. He asked me this question for a particular reason. My research gravitates around a particular kind of denotational semantics called

What makes game semantics appealing from the point of view of machine learning is that it is mathematically elementary. Its basic notion is that of a

Luckily, we can conduct a controlled experiment using one of the numerous idealised languages for which game semantic models (fully abstract nonetheless) have been created. We can generate random plays (sequences) using the model and see whether they can be learned. To make a long story short, it turns out that

The work of creating such sequences for production languages is harder, but not impossible. Any sub-term to be modeled in a larger program must be instrumented to record its interactions with the context, in a way similar to a profiler, then the program must be executed on random inputs and the interaction data collected. And then other sub-terms in other programs must be instrumented in the same way. It is a lot of work but it all can be automated. And, considering our proof-of-concept experiment, there is a good chance that the resulting model is going to be accurate enough to be useful!

But let's set some background first.

*Programming language semantics*is a collection of mathematical theories about representing the meaning of programming languages in an abstract, machine-independent ways. The semantics of a language can be given either*operationally*, by a set of syntactic transformations which, loosely speaking, show how a program is*evaluated,*or can be given*denotationally,*by mapping any piece of code into a mathematical object, its "meaning". Operational specifications can be given to realistic languages and, as a method, is the workhorse of formal aspects of programming languages from verification to compilation. Denotational interpretations, on the other hand, are mathematically much more mathematically challenging to develop but are also more powerful. In particular, denotational models can be used to reason about*equivalence*of sub-programs by reducing it to the*equality*of their denotations. But perhaps the right phrasing is that denotational models*would*be useful, because they haven't been developed yet for actual programming languages, just for idealised versions of some programming languages.When the operational and denotational models coincide, i.e. equivalence in one model coincides with equality in the other, the situation is said to be

*fully abstract*, and it's generally considered a good thing. Full abstraction is a technically fascinating problem, but the notion of equivalence it models can be seen as somewhat contrived. I wrote about these issues earlier and I won't reprise the argument. But even with these caveats, it would certainly be neat to be able to say whether two program fragments are equivalent. This could have some nice application in program verification, in particular compiler optimisation verification.But can we derive a denotational model of a programming language automatically, via some kind of machine learning? This question was posed to me by Martín Abadi, whom I had the pleasure to run into at a very nice research conference in Berlin a couple of years ago. Martín has been a leading light of programming language research for many years, and is now involved with the TensorFlow project at Google. He asked me this question for a particular reason. My research gravitates around a particular kind of denotational semantics called

*game semantics.*I wrote about game semantics earlier, and I recommend these posts for an in-depth informal introduction.What makes game semantics appealing from the point of view of machine learning is that it is mathematically elementary. Its basic notion is that of a

*move,*i.e. a way in which a sub-program can interact with its broader context. More precisely, function calls and returns, either from the sub-program into the context or the other way: in total, four kinds. Moves are sequenced into*plays,*which represent the way a sub-program interacts with its larger context during a particular program execution. Finally, all these plays are grouped into sets called*strategies*, which then model that particular sub-program. So in the end, each sub-program is represented by a set of sequences -- a*language*. And we know that languages can be, to some extent, machine learned.Luckily, we can conduct a controlled experiment using one of the numerous idealised languages for which game semantic models (fully abstract nonetheless) have been created. We can generate random plays (sequences) using the model and see whether they can be learned. To make a long story short, it turns out that

*yes*we can learn these sets, for rather complex languages (higher-order functions, state, concurrency). And we can learn them rather well and rather efficiently, using LSTMs and TensorFlow. The full story is in the paper*On the Learnability of Programming Language Semantics,*which has recently been presented at the*10th Interaction and Concurrency Experience*(June 21-22, 2017, Neuchâtel, Switzerland), in collaboration with Khulood Alyahya [PDF].The work of creating such sequences for production languages is harder, but not impossible. Any sub-term to be modeled in a larger program must be instrumented to record its interactions with the context, in a way similar to a profiler, then the program must be executed on random inputs and the interaction data collected. And then other sub-terms in other programs must be instrumented in the same way. It is a lot of work but it all can be automated. And, considering our proof-of-concept experiment, there is a good chance that the resulting model is going to be accurate enough to be useful!

## Comments

## Post a Comment