Sunday, 22 October 2017

The logic of machine learning: inductive and abductive reasoning

"The division of all inference into Abduction, Deduction, and Induction may almost be said to be the Key of Logic." C.S. Peirce

Deduction, Induction, Abduction

C.S. Peirce, in his celebrated Illustrations of the Logic of Science, re-introduced three kinds of Aristotelian reasoning: deductive, inductive, and abductive. Deduction and induction are widely used in mathematics and computer science, and they have been thoroughly studied by philosophers of science and knowledge. Abduction, on the other hand, is more mysterious. Even the name “abduction” is controversial. Peirce claims that the word is a mis-translation of a corrupted text of Aristotle (“απαγωγη´”), and sometimes used “retroduction” or “hypothesis” to refer to it. But the name “abduction” seems to be the most common, so we will use it. According to Peirce the essence of deduction is the syllogism known as “Barbara”:

Rule: All men are mortal.
Case: Socrates is a man.
Result: Socrates is a mortal.

Peirce calls all deduction "analytic" reasoning, the application of general rules to particular cases. Deduction always results in "apodeictic" knowledge: incontrovertible knowledge you can believe as strongly as you believe the premises. Peirce’s interesting formal experiment was to then permute the Rule, the Case, and the Result from this syllogism, resulting in two new patterns of inference which, he claims, play a key role in the logic of scientific discovery. The first one is induction:

Case: Socrates is a man.
Result: Socrates is a mortal.
Rule: All men are mortal.

Here, from the specific we infer the general. Of course, as stated above the generalisation seems hasty, as only one specific case-study is generalised into a rule. But consider

Case: Socrates and Plato and Aristotle and Thales and Solon are men.
Result: Socrates and Plato and Aristotle and Thales and Solon mortal.
Rule: All men are mortal.

The Case and Result could be extended to a list of billions, which would be quite convincing as an inductive argument. However, no matter how extensive the evidence, induction always involves a loss of certainty. According to Peirce, induction is an example of a synthetic and "ampliative" rule which generates new but uncertain knowledge. If a deduction can be believed, an inductively derived rule can only be presumed. (Also, one must not confuse this 'logical' induction, which is unsound, with 'mathematical' induction which is actually a sound deductive rule for introducing a universal quantifier.)

The other permutation of the statements is the rule of abductive inference or, has Peirce originally called it, “hypothesis”:

Result: Socrates is a mortal.
Rule: All men are mortal.
Case: Socrates is a man.

This seems prima facie unsound and, indeed, Peirce acknowledges abduction as the weakest form of (synthetic) inference, and he gives a more convincing instance of abduction in a different example:

Result: Fossils of fish are found inland.
Rule: Fish live in the sea.
Case: The inland used to be covered by the sea.

We can see that in the case of abduction the inference is clearly ampliative and the resulting knowledge has a serious question mark next to it. It is unwise to believe it, but we can surmise it. This is the word Peirce uses to describe the correct epistemological attitude regarding abductive inference. Unlike analytic inference, where conclusions can be believed a priori, synthetic inference gives us conclusions that can only be believed a posteriori, and even then always tentatively. This is why experiments play such a key role in science. They are the analytic test of a synthetic statement.

But the philosophical importance of abduction is greater still. Consider the following instance of abductive reasoning:

Result: The thermometer reads 20C.
Rule: If the temperature is 20C then the thermometer reads 20C.
Case: The temperature is 20C.

Peirce’s philosophy was directly opposed to Descartes’s extreme scepticism, and abductive reasoning is really the only way out of the quagmire of Cartesian doubt. We can never be totally sure whether the thermometer is working properly. Any instance of trusting our senses or instruments is an instance of abductive reasoning, and this is why we can only generally surmise the reality behind our perceptions. Whereas Descartes was paralysed by the fact that believing our senses can be questioned, Peirce just took it for what it was and moved on.

A computational interpretation of abduction: machine learning

Formally, the three rules of inference could be written as:

A ⋀ (A → B) ⊢ B (Deduction)
A ⋀ B ⊢ A → B (Induction)
B ⋀ (A → B) ⊢ A (Abduction)

Using the Curry-Howard correspondence as a language design guide, we will arrive at some programming language constructs corresponding to these rules. 

Deduction corresponds to producing B-data from A-data using a function A → B. Induction would correspond to creating a A → B function when we have some A-data and some B-data. And indeed, computationally we can (subject to some assumptions we will not dwell on in this informal discussion) create a look-up table from As to the Bs, which maybe will produce some default or approximate or interpolated/extrapolated value(s) when some new A-data is input. The process is clearly both ampliative, as new knowledge is created in the form of new input-output mappings, and tentative as those mappings may or may not be correct. 

Abduction by contrast assumes the existence of some facts B and a mechanism of producing these facts A → B. As far as we are aware there is no established consensus as to what the As represent, so we make a proposal: the As are the parameters of the mechanism A → B of producing Bs, and abduction is a general process of choosing the “best” As to justify some given Bs. This is a machine-learning situation. Abduction has been often considered as “inference to the best explanation”, and our interpretation is consistent with this view if we consider the values of the parameters as the “explanation” of the facts.

Let us consider a simple example written in a generic functional syntax where the model is a linear map with parameters a and b. Compared to the rule above, the parameters are A = float × float and the “facts” are a model B = float → float

f : (float × float) → (float → float)
f (a, b) x = a ∗ x + b

A set of reference facts can be given as a look-up table data : B. The machine-learning situation
involves the production of an “optimal” set of parameters, relative to a pre-determined error (e.g.
least-squares) and using a generic optimisation algorithm (e.g. gradient descent):

(a0, b0) = abduct f data
f0 = f (a0, b0)

Note that a concrete, optimal model f0 can be now synthesised deductively from the parametrised model f and experimentally tested for accuracy. Since the optimisation algorithm is generic any model can be tuned via abduction, from the simplest (linear regression, as above) to models with thousands of parameters as found in a neural network.

From abductive programming to programmable abduction

Since abduction can be carried out using generic search or optimisation algorithms, having a fixed built-in such procedure can be rather inconvenient and restrictive. Prolog is an example of a language in which an abduction-like algorithm is fixed. The idea is to make abduction itself programmable.

We propose a new functional programming construct the type of which is inspired by the abductive logical rule above. The rule is realised by identifying all provisional constants (parameters) in a term, collecting them, and producing a version of the term which is now explicitly parameterised. Concretely, we write let f @ p = t in t′ to mean that the term t is “abductively decoupled” into a function f and a collection of parameters p. Identifiers f and p are bound in t′.

We illustrate abductive programming with a typical machine-learning program. Provisional constants (parameters) are syntactically indicated using braces {−}. The model defines upper and lower boundaries for a set of points in the plain, taking any two arbitrary functions as boundaries. Concretely, it will use a line and a parabola. The parameters need to be tuned in order to best fit some reference data set:

let model low high x = (low x, high x) 
let linr x = {1} ∗ x + {0} 
let quad x = {4} ∗ x * x + {3} ∗ x + {2} 
let m @ p = model linr quad
let p′ = optimise m loss data p 
let model′ = m p′

The decoupling will extract a parameter vector p with values [0;1;2;3;4] and create a parameterised model m. Improved parameters p′ can be computed using the parameterised model and an elsewhere-defined generic optimisation and loss functions (e.g. gradient descent and least squares), reference data and the original parameter values. The improved concrete model is model′ which can be used as a standard function in the sequel. We leave it as a reader exercise to reimplement this program fragment in conventional (functional) styles and note the unpleasant accruing of complex boilerplate code required to manage parameters.

The idea we propose here is not entirely original. The transparent update of parameters via generic optimisation algorithms is one of the features that makes Google's TensorFlow so easy to use. What we wanted is to consider this feature as a first-class language feature, rather than an embedded DSL as is TensorFlow presented, under a Curry-Howard-like correspondence with logical inference. The abductive explanation seems to us plausible (and cute). 

A longer discussion of abduction, its relation with machine learning, and more examples are given in our paper Abductive functional programming, a semantic approach [arXiv:1710.03984]. The paper has a link to a visualiser [link] which allows a step-by-step execution of an abductive functional program using a semantics in the style of the "Geometry of Interaction". 

Friday, 4 August 2017

Can we machine-learn a programming language semantics?

In this post I will talk about whether denotational semantics of programming languages can be machine-learned and why that could be useful.

But let's set some background first.

Programming language semantics is a collection of mathematical theories about representing the meaning of programming languages in an abstract, machine-independent ways. The semantics of a language can be given either operationally, by a set of syntactic transformations which, loosely speaking, show how a program is evaluated, or can be given denotationally, by mapping any piece of code into a mathematical object, its "meaning". Operational specifications can be given to realistic languages and, as a method, is the workhorse of formal aspects of programming languages from verification to compilation. Denotational interpretations, on the other hand, are mathematically much more mathematically challenging to develop but are also more powerful. In particular, denotational models can be used to reason about equivalence of sub-programs by reducing it to the equality of their denotations. But perhaps the right phrasing is that denotational models would be useful, because they haven't been developed yet for actual programming languages, just for idealised versions of some programming languages.

When the operational and denotational models coincide, i.e. equivalence in one model coincides with equality in the other, the situation is said to be fully abstract, and it's generally considered a good thing. Full abstraction is a technically fascinating problem, but the notion of equivalence it models can be seen as somewhat contrived. I wrote about these issues earlier and I won't reprise the argument. But even with these caveats, it would certainly be neat to be able to say whether two program fragments are equivalent. This could have some nice application in program verification, in particular compiler optimisation verification.

But can we derive a denotational model of a programming language automatically, via some kind of machine learning? This question was posed to me by Martín Abadi, whom I had the pleasure to run into at a very nice research conference in Berlin a couple of years ago. Martín has been a leading light of programming language research for many years, and is now involved with the TensorFlow project at Google. He asked me this question for a particular reason. My research gravitates around a particular kind of denotational semantics called game semantics. I wrote about game semantics earlier, and I recommend these posts for an in-depth informal introduction.

What makes game semantics appealing from the point of view of machine learning is that it is mathematically elementary. Its basic notion is that of a move, i.e. a way in which a sub-program can interact with its broader context. More precisely, function calls and returns, either from the sub-program into the context or the other way: in total, four kinds. Moves are sequenced into plays, which represent the way a sub-program interacts with its larger context during a particular program execution. Finally, all these plays are grouped into sets called strategies, which then model that particular sub-program. So in the end, each sub-program is represented by a set of sequences -- a language. And we know that languages can be, to some extent, machine learned.

Luckily, we can conduct a controlled experiment using one of the numerous idealised languages for which game semantic models (fully abstract nonetheless) have been created. We can generate random plays (sequences) using the model and see whether they can be learned. To make a long story short, it turns out that yes we can learn these sets, for rather complex languages (higher-order functions, state, concurrency). And we can learn them rather well and rather efficiently, using LSTMs and TensorFlow. The full story is in the paper On the Learnability of Programming Language Semantics, which has recently been presented at the 10th Interaction and Concurrency Experience (June 21-22, 2017, Neuchâtel, Switzerland), in collaboration with Khulood Alyahya [PDF].

The work of creating such sequences for production languages is harder, but not impossible. Any sub-term to be modeled in a larger program must be instrumented to record its interactions with the context, in a way similar to a profiler, then the program must be executed on random inputs and the interaction data collected. And then other sub-terms in other programs must be instrumented in the same way. It is a lot of work but it all can be automated. And, considering our proof-of-concept experiment, there is a good chance that the resulting model is going to be accurate enough to be useful!

Sunday, 23 July 2017

Logical Inference in Computer Science: deduction, induction, abduction

Logic acknowledges three kinds of inference: deductive, inductive, and abductive. What kind of impact did they have on computer science and programming languages in particular?

Deduction is in some sense the golden standard of logical inference. It represents a set of fixed rule through which general statements are made more particular in a way which can be guaranteed to be sound, that is to preserve truth. Aristotle's syllogisms are an example of deductive inference:

All men are mortal.
Socrates is a man.
Therefore Socrates is mortal.

In programming language theory the impact of deductive thinking is best captured by the celebrated Curry-Howard correspondence between propositions (in a logic) and types (in a language). This correspondence is more than a technical result, although it can be precisely formulated. It is a guiding methodological principle which drives the design of functional programming languages. 
Besides type systems, programming logics are usually given a deductive formulation (axioms and rules of inference), and they can be used to provide extremely strong guarantees of correctness for software, at a pretty high cost (i.e. many PhD students working for many years on a medium-sized piece of code). Types and logics can come together, as in the extremely elegant form of dependent types, which lie at the foundation of powerful proof assistants such as Coq or Agda. Coq in particular has made some headway in proving the correctness of significant software projects such as an optimising C compiler (the CompCert project).
Induction is a form of inference in which the premises of an argument are generalised into universal statements. Induction in general must not be confused with mathematical induction, which is still a deductive argument for introducing universal quantifiers. Whereas mathematical induction provides absolute certainty as to the result, in general induction does not. It provides a measure of truth which is merely probable, informally speaking. Induction was long considered at the root of scientific discovery: you observe a bunch of facts, then you generalise your observations into a theory. This is what the pioneers of the scientific method (Galileo, Bacon) proposed. Empiricist philosophers like Hume and JS MIll looked into this way of doing science more critically, but it is perhaps fair to say that the current prevailing view is Popper's who, somewhat dogmatically, says that induction has no place in science. He holds that no amount of data can confirm a theory, but experimental data could contradict a theory, insofar as that theory is formulated in a logically reasonable way ("falsifiable"). 
The most common Computer Science manifestation of inductivism is software testing, or system testing more generally speaking. Usually it is impractical to validate the correctness of a system deductively for the sheer amount of work involved. The developer runs tests, a collection of instances in which, if successful, the system behaves as supposed to. The tests can be extensive and systematic, but even so they do not provide evidence on par with that of deductive reasoning. This is typical inductive reasoning. It is perhaps fair to say that in the practice of verification the Popperian view that inductive arguments play no role is strongly rejected -- rightly or wrongly, I don't know. But it seems to be generally accepted that a thoroughly tested piece of software is more likely to be correct than an untested one. 
A more spectacular instance of inductivism in Computer Science is machine learning. ML is nothing but inductivism at its most unrestrained: the theory is created directly from the data, by-passing the creation of a deductive framework or the requirement of explanatory powers. This is incredibly fashionable now, of course, and it solves many problems in a fully automated way. The problem with it is, well, that it's inductive. Which means that it works until it doesn't. Just like in the case of testing, the predictive power is limited to instances similar to what has already been tried out. 
Abduction seems to be more mysterious, even dangerous. However, it merely describes the logical process of analysing a situation and extracting (abducting) the best possible explanation. Looking at science from a logical perspective, the inductivists (Galileo) thought that universal statements arise naturally, whereas the deductivists (Popper) thought that they are mere guesses. The abductivists think, on the contrary, that the process of formulating "best explanations" can be made systematic and are at the root of how scientific discovery progresses. A persuasive proponent of this view is Lakatos who, despite not using the term "abduction" explicitly, gives a wonderful argument for the use of abduction in mathematical discovery in his book Proofs and Refutations.
In Computer Science I would say that the most interesting manifestation of abductive reasoning is program analysis: starting with a piece of code, extract logical conditions as to when the code does not crash, and logical descriptions as to what the code achieves. The term "abduction" is, as far as I know, only used in a paper describing the logic behind the verification tool Infer by Facebook. The methodology equally applies to other, more common, analyses such as type inference. Because what kind of "inference" are we talking about here? Clearly not deductive, since the type has not been established yet, and obviously not inductive. 
To conclude this brief tutorial, all three forms of logical inference are richly reflected in computer science and programming languages. The one area in which they seem to come together is that of program verification. And this is not accidental. Programming languages are strongly related to logics, and their verification is as hard as it is necessary. We need to use every trick in the book.

Monday, 22 May 2017

Reasoning about digital circuits using string diagrams

Hardware and software, two worlds apart

As I drifted between the worlds of programming languages and hardware design over the last ten years or so, I was quite struck by one significant and maybe difficult to explain difference when it comes to modelling and semantics. 

In the world of software the workhorse of formal verification and validation is operational semantics: programming languages come with a set of syntactical rules showing how the program and other components of its configuration (e.g. the state) evolves during execution.  Let's take, for example, a simple language of commands. We may want to model a program (c) by showing how it changes the state (s), which is a function mapping memory location (addresses) into values. A configuration is a program together with its state, (c, s). We show the execution of a program in a state to a new state as run (c, s) = s' or, using fancier mathy notation (c, s⇓ s'. Here are some sample rules (this list is quite boring, you can skip it if you don't care much about details):
  • skip, the trivial program, does nothing: if we run skip in some state it produces the same state. We write this using a formal notation as run (skip, s) = s or (skip, s⇓ s.
  • sequencing is described by a composite rule: if the first command (c) runs in the initial state (s) and produces some new state (s'), and if the second command (c') runs in that state and produces another updated state (s") it is as if we run the composite command (c;c') in the initial state (s), as the same final state results (s"), or, more formally, if run (c, s) = s' and run (c', s') = s" then run (c;c', s) = s". This can be written more compactly as run  (c;c's) = run (c', run (c, s)). Or, using a more formal logical deductive notation:
      (c, s⇓ s'      (c', s') ⇓ s" 
                (c;c's) ⇓ s"
  • if statements: If in some state (s) the expression b is true, and if running the then-branch in that state produces a new state (s'), the same result is achieved by running the if-then-else statement in the initial state.
          (b, s)  true     (c, s)  s'     
     (if b then c else c', s) ⇓ s'
    Note that the behaviour of the else-branch is irrelevant here since it never gets executed. Conversely, if the guard is false, the behaviour is:
          (b, s)  false    (c', s)  s'      
     (if b then c else c', s) ⇓ s'
  • loops, which describe recursive behaviour, are modeled by recursive rules. Evaluating a while-loop in some state is the same thing as evaluating the one-step unfolding of the while-rule:
    (if b then {c; while b do c}, s) ⇓ s'
                   (while b do c, s) ⇓ s'
These rules are concrete enough to allow the definition of a (inefficient but effective) interpreter for the language. Virtually all work done on formal certification of programs is using this kind of operational-style semantics. 

Operational semantics is not the only game in town when it comes to modelling programming languages, it is just the most successful game in town. A much researched alternative is denotational semantics. A denotational semantics works by translating code -- a bit like a compiler -- into a mathematical semantic domain. So you can elevate yourself above the messy irregular world of code, into the pure and abstract world of Platonic mathematical objects. Denotational semantics sheds some much needed light on some foundational aspects of programming languages, in particular recursion via domain theory, but for most practical ends it just doesn't help that much. This is not to say that denotational semantics is a bad idea, on the contrary, it is a brilliant idea. It's just that making it work for realistic languages is quite hard. The world of ideals is not easily accessible. 

When you cross over to the world of hardware something weird happens: you discover that digital circuits have no syntactic-style operational semantics but the workhorse of validation and verification is denotational semantics. Circuits are "compiled" into a semantic domain of functions on streams of data (waveforms). So what doesn't work for programming languages seems to work fine for circuits, and vice versa.

The morass of hardware semantics

When you start teaching yourself the basics of circuit design one of the first things that you learn is that there are two kinds of circuits: synchronous and asynchronous. In the synchronous world circuits come with clocks, and time is sliced by the clock into discrete helpings. If two signals occur within a time slice they can be assumed to be simultaneous, or synchronous -- hence the name. Most electronics such as computers and mobile phones work like this. Asynchronous circuits don't make this assumption and, as a result, are much harder to use to build useful and exciting stuff. This delights academics, who find asynchronous circuits an interesting and fruitful area of research, but industry not so much. Of the various battery-powered devices strewn about your house, the remote controls are likely to contain asynchronous circuitry, but not much else. So let's focus on the more useful synchronous circuits. 

When you delve a bit deeper into the world of synchronous circuit design, and are not content merely to build useful and fascinating stuff, but try to understand what is going on mathematically, you quickly realise that in the synchronous world the greatest challenge is presented by asynchronous effects: the order in which signals happen within a clock slice. These effects are abstracted away by the semantic model but this abstraction is not compositionally sound: when you plug together digital circuits these asynchronous effects must be, disappointingly, taken into account. And there is no easy way to do that because they break the abstraction. 

One common way in which the abstraction is broken is via combinational feedback loops, i.e. feedback which doesn't go through clocked elements. If you are a seasoned digital designer you may instinctively think of circuits with combinational feedback as broken, when in fact it is the semantic model which is broken. This circuit for example has a perfectly well defined behaviour, z=x:

However, most digital design tools for simulation and verification will reject it because of the feedback loop. This is sad because combinatorial loops are what makes memory elements possible:

In the practice of digital design such circuits must be dealt with as "black boxes" because they break the abstraction. If you talk to digital design engineers from the trenches they will not acknowledge this as a problem, and if you insist they will point at your mobile phone and explain how this is what they build, successfully, so if it is a problem it is not a serious one. And I agree. But theoretically it's not entirely satisfactory.  

Going back to compositionality, what is the connection to the problem of combinational feedback? Compositionality means that the behaviour of a composite circuit can be derived from the behaviour of its components. When we feedback over a circuit, taken as a black box

there are several situations, illustrated by the following internal structures of the box:

The first one (A) represents a "trivial" feedback in which the result of the feedback is actually no feedback at all: the wire can be yanked straight. The second one (B) is a legal feedback where there is actually a memory element on the fed-back wire. Finally, the third one (C) represents the kind of circuit we genuinely want to avoid because it would be unstable. But just from looking at the input-output behaviour of a box we cannot tell in which of these situations we will end up. This is not good!

So what is the way out of the morass? Following the lead of programming language semantics we can try a syntactic, operational approach. This is the topic of a recent paper I wrote with my colleague Achim Jung, and with Aliaume Lopez from École normale supérieure Cachan: Diagrammatic Semantics for Digital Circuits.

Hardware semantics from string diagrams

To reason syntactically we first need a syntax. What is the right syntax for circuits? Conventional hardware languages such as VHDL or Verilog have a "structural" fragment which can be used to describe circuits as "netlists". Ironically, netlists are devoid of any apparent structure. They are simply a dump of the incidence relation of the circuit seen as a (hyper)graph. All components and their ports are labeled, and are connected by wires. Syntactic manipulation at this level is clearly unappealing. 

A more natural starting point is the graph of the circuit itself. If programming languages can be defined by operational semantics acting on terms, it makes sense to model circuits by acting on graphs instead. Graph rewriting is a well-studied area of research with many powerful conceptual tools so lets go there. The first observation is that rewriting a circuit-as-graph by propagating values through gates is fun and easy. We can easily define a system which works something like this:

You can see some and and or gates and some logical values (H/L) and how they can propagate through the circuit via rewriting. You can easily prove results such as confluence. However, the system wouldn't be very useful because it wouldn't tell you how to handle feedback. In the case of feedback, the input comes from the output, which is not known. So what do you do?

A feedback loop in a circuit looks a bit like recursion in a programming language, and we have seen such a rule earlier, for while-loops. What is remarkable about that rule above is that in order to model the while loop we need another copy of the while loop (in red above)! The same thing happens in the case of the Y-combinator, which is how recursion happened in the original ur-programming language, the lambda calculus. If g is some function then: 

Y g = g (Y g) = g (g (Y g)) = ...

Note again the need to copy the function g in order to evaluate it recursively. Can we do something like that for circuits though in the case of feedback? The answer is not obvious but I will summarise it below. Consider this circuit, taken as a black box; it has m input and n outputs, and we are feeding back k of the outputs:

What is the correct unfolding of such a loop? It turns out that in certain conditions (which we shall discuss below) the unfolding of this circuit diagram can be made to be this:

First of all we need to be able to define such a diagram, which requires some special componentry, highlighted below:

We need to be able to fork wires and to leave wires dangling. The forking operation is something that takes one input and produces two outputs, and the dangling operation is something that takes one input and produces zero outputs. They are "co-operations" (by contrast with operations) which take zero or more inputs and produce one single output, so have co-arities two and zero, respectively. The fork is a co-commutative co-monoid, with the dangling wire being the co-unit. Diagrammatically,  the equations they must satisfy are these:

These are just the obvious properties that you would expect of any "wire splitter":

Most importantly for us, a point which we shall revisit, is that you don't need to know the semantic model for the wire splitter in order to know that it will satisfy the equations above. What kind of current flows through it? What direction? It doesn't matter (so long as it's within some operational parameters which we shall take for granted at this level)! In some sense you can think of them as the "natural laws" of the wire-splitter! 

Having the wire-splitter-come-co-monoid structure we can construct the unfolded circuit, but is it ok to do that? Are the folded and unfolded circuit the same, in terms of behaviour? In general, for a box with wires sticking in and out this is not the case, but for many kinds of digital circuits this is the case! The reasons go back to work in pure algebra done by two Romanian mathematicians, Cãzãnescu and Stefãnescu in the 1980s, and they were also independently discovered and reformulated in terms of traced monoidal categories by Hasegawa in the 1990s. The idea is that diagrams can be unfolded only if there is a notion of Cartesian product of such diagrams. The axioms of the Cartesian product can also be neatly formulated diagrammatically: 

Informally, it means that sharing the outputs of some circuit f is the same as copying the circuit f and sharing its inputs. The other axiom says that ignoring all the outputs of some circuit f is the same as ignoring the whole circuit. That's it! In an earlier paper, Categorical semantics of digital circuits, co-authored with my colleague Achim Jung, we show that for a large class of digital circuits that is indeed the case, so digital circuits can be safely unfolded. The unfolding rule together with the simple rewrites mentioned above can be used together to evaluate circuits syntactically directly at the level of diagrams. 

Still, what's the deal with string diagrams? The key difference between string diagrams and vanilla graphs is that string diagrams have a well-defined interface, so composition ("plugging together") is a well defined operation. This little bit of extra machinery makes all the reasoning above possible -- and more. 

There are some other interesting axioms in the categorical and diagrammatic models which are worth discussing, but for now I will stop here. Because the key point was made: all the results have been derived axiomatically, from basic properties of circuits and components of circuits which are in some sense "obvious", i.e. they can be directly tested in the lab! They are model-agnostic and in fact hold for both circuits made of Boolean gates and for circuits made of saturation-mode transistors. These two have very different semantic models, yet they have the same diagrammatic semantics because they satisfy the same key axioms.


Lets look now at this circuit again:

A string diagram representation of it (noting the input and output interfaces) is this:
Plugging true, a sequence of obvious rewrites leads to:
Note that the last (trivial) circuit has a spurious loop. The only way we can get rid of loops is by unfolding them, even if they are spurious. Of course, if we were obsessed with efficiency then this would be done differently, but strictly following the diagrammatic cookbook we get:
From the co-monoid rule and the second Cartesian equation we then obtain simply the constant true.


The content of this post relies on two papers I have co-authored:
For lots of fascinating stuff on string diagrams Pawel Sobocinsky's blog is a must-read. In particular, the application of string diagrams to quantum mechanics is worked out in exquisite detail in a book by Coecke and Kissinger, Picturing Quantum Processes.

To read more about the connections between category theory and string diagrams Peter Selinger's has a superb survey of graphical languages for monoidal categories. Specifically on rules for diagrams with feedback, Masahito Hasegawa's doctoral dissertation is an excellent reference. The other relevant paper is Virgil Emil Cazanescu, Gheorghe Stefanescu, A note on axiomatizing flowchart schemes (Acta Cybern. 9(4): 349-359. 1990).

Circuits with combinational loops have been discussed in a classic paper by S. Malik, Analysis of cyclic combinational circuits, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (Volume: 13, Issue: 7, Jul 1994). Semantic models for such circuits have been developed by Berry, Mendler and Shiple in Constructive Boolean circuits and the exactness of timed ternary simulation, Formal Methods in System Design (Volume 40 Issue 3, June 2012, Pages 283-329). 

Finally, I will talk about the material in this post as an invited tutorial speaker at the Workshop on String Diagrams in Computation, Logic, and Physics in Oxford on 8-9 September 2017 [link]. 

Monday, 16 January 2017

Comparing apples with oranges: OCaml vs Java in teaching

Can we compare programming languages? Can we ever say that subject to some criteria one language is better than the alternatives? Very hard, in general. But this year I was in a position to do this comparison in a way which even though far from perfect is at least vaguely meaningful.

We teach our freshmen two programming languages simultaneously in Year 1 Term 1: Java and OCaml. We don't assume the incoming students have programming experience although, of course, some of them do. But in terms of backgrounds this is as clean and as homogeneous as you can hope for. In Week 5 and Week 6 two of the assignments are shared between OCaml and Java. In total this means 6 short algorithmic programming exercises, administered via HackerRank (and tested for correctness and efficiency). The assignments were completed by about 180 students each.
  1. zipping three lists (arrays in Java).
  2. unifying two strings with constants and variables.
  3. finding the best trade in a historical sequence of prices.
  4. running average of consecutive elements in a list (array for Java).
  5. running average of k consecutive elements in a list (array for Java).
  6. finding longest increasing sequence in a list. 
Here are the question-by-question statistics collected by HackerRank:

median time median score
assignment OCaml Java OCaml Java
1 1 hr 17 min 1 hr 3 min 3/3 5/5
2 1 hr 35 min 2 hrs 45 min 3/3 5/5
3 1 hr 29 min 0 hrs 32 min 0/3 5/5
4 0 hrs 44 min 0 hrs 22 min 4/4 5/5
5 0 hrs 23 min 0 hrs 34 min 3/3 5/5
6 2 hrs 13 min 1 hr 39 min 2/3 5/5

One very important caveat is that all students tried each assignment in OCaml first, then in Java in the following week. They were therefore familiar with the algorithmic solution when attempting it in Java, whereas in OCaml they saw the problem as new. However, some of the list-based algorithms are quite different than their array-based counterparts, so some element of novelty remains.

The only thing that stands out from this comparison is that questions (3) and (6) had efficient solutions in which keeping track of an array index was much easier than expressing the problem recursively using lists (try it!), which seems to have made the OCaml version harder than the Java.

Following this exercise in comparative programming (on simple algorithmic problems) I also gave students a survey to gage their perception of the comparative advantages and disadvantages of the two approaches, asking them to agree or disagree with the common claims made when comparing functional and imperative or OO programming. These claims are almost never, to my knowledge, backed by data but supported by arguments which, however thoughtful, are largely speculative.

I found that:

  • only 19% of students were bothered by off-by-one array errors
  • 54% of students found functional programs more concise and enjoyed that
  • only 20% appreciated using the OCaml interpreter for rapid prototyping or quick testing
  • 57% students prefered OO-style to OCaml-style polymorphism
  • 40% of students considered that pattern matching in OCaml improved their productivity
  • opinions were evenly divided on whether type inference and the stricter discipline of OCaml led to bug avoidance
  • 35% thought OCaml's type system got in the way and reduced their productivity
  • only 30% thought immutable data was helpful in avoiding programming errors
  • whereas 40% thought immutable data was inconvenient to work with 
  • 59% thought structural recursion (on a data structure, using pattern matching) was helpful
  • only 14% thought OCaml programs are more readable than Java programs
  • the opinions were evenly split on whether programming in OCaml was enjoyable 
  • 78% liked programming in Java. 

This post is loosely related to my previous post, in which I was worrying about the fact that language comparisons, from the point of view of programmer productivity, are a rather neglected area of research. This is not a proper study, certainly, but there are some surprises in there for me. I personally much favour OCaml over Java, and the fact that the putative advantages of functional programming are not immediately apparent to the novice programmer is disappointing. In comparison with Java, the initial appeal seems questionable, for most students at least. On the other hand, certain features of functional programming (e.g. patterns) seem to be broadly liked. 

Maybe I will have time to conduct this study more thoroughly next year. Any suggestions for how to make it better would be warmly appreciated.  

Making OCaml more like Excel

Excel, spreadsheet programming in general, is broadly considered a very accessible style of programming. Could we make functional programmin...