Saturday 4 May 2019

The essence of (im)pure computation

This blog post is an informal introduction to a forthcoming research paper in collaboration with my students Koko Muroya and Todd Waugh Ambridge. This is the second post of dedicated to the SPARTAN calculus. The first post was How to make a programming language out of an algebra.

For the impatient reader I will summarise the point of the previous post:
algebra + thunking + copying = pure functional programming
The "algebra" part are the operations of the language. They include not only what we commonly think of as operations, such as arithmetic, but also function definition and application or recursion. The "thunking" is a way of postponing computations and "copying" is self explanatory. Together, thunking and copying are a means of providing control over the resources used in the process of computation, both time and space. A program that illustrates all these concepts put together is the factorial, which you can explore using the Spartan online visualiser.

How about "impure" features of programming languages? For example state and assignment. The first step is to add a new feature to our core calculus: sharing. In pure computation variables are "referentially transparent", which is a fancy way of saying that any term bind x to u in v can be equivalently rewritten by substituting u for x in v. This means that assignment cannot be naturally realised as an operation unless we add a notion of name or reference to the Spartan infrastructure:
new a = u in v
Contrast this with variable binding. The expression bind x to 0 in ASG(x, ADD(DEREF(x), 1)), which is Spartan syntax for something like int x=0; x=x+1, should be equivalent by referential transparency with 0=0; 0=0+1, which is clearly nonsense. On the other hand names are not copied, so new a = 0 in ASG(a, ADD(DEREF(a), 1)) is not subject to any substitutional simplification. To emphasise the fact that names are "like constants" rather than "like variables" we use a, b, ... to denote them.

In terms of diagrammatic representation, names (new) are represented as a new kind of node, an undecorated dot. Here are the two terms mentioned above, diagrammatically:

 

The (I)-labelled nodes play no active role, they are merely "adaptors", like between an European and an American socket. The first one is not a well-structured program and will block at run-time when it reaches this stage, which corresponds to ASG(0, ADD(DEREF(0), 1)):

The lighting bolt indicates that DEREF(0) (written as !0) is about to be evaluated -- but there is no rule for that.

On the other hand, the key stages for the second term are:
 to  to  to  to 

The evaluation of dereferencing, the (!) node, resolves in the argument of (+) being linked to whatever the (unlabelled) name node is linked to, namely the constant 0, which is duly copied. The addition (+) is computed in the obvious way. Finally, the assignment itself produces a dummy value (), and changes the term linked to by the name node to (1). At the same time this node becomes unreachable, i.e. "garbage".

The name can connect to anything, data ("reference") or code ("label"), storing functions or reference points in a program. The only feature of the name is that it is never copied, always shared. We can thus conclude that
Pure languages copy, impure languages share. Copying is done through variables, sharing is done through names.  
The issue of names and sharing is foundationally interesting, but not essential as far as expressiveness goes. Indeed, stateful-like behaviour can be encoded into (pure) higher-order functions using (for example) monads. Why do we then bother with adding native or primitive state, as we do? Simply because of the cost involved. The overhead required by encoding is significant.

In a language such as OCaml, which has both native state and higher-order functions, we can experiment by running the same computation both natively and using the state monad. On the computer I am writing this blog now the overhead of the simulated monadic state is approximately 3,200%. But this is contingent on many factors, which usually make benchmarks unreliable for such analyses.

In contrast, an abstract machine allows an abstract, platform-independent evaluation of performance by counting steps and associating costs with the various actions performed at each particular step. In the case of Spartan, the monadic simulation takes 3,498 steps (see Example 10) whereas the native state implementation takes 76 steps (see Example 9). The 46X overhead exhibited by the abstract machine is surprisingly close to the 32X overhead observed in the benchmark. Part of the execution (monadic simulation takes too long...) is shown in this side-by-side video:


The Spartan framework is flexible and general enough that we could express any known operations we could think of. Here is for example what can happen when you have a language combining call/cc, a very powerful control operator, with state. Such a combination is called "stored continuation" and it is so expressive that you can, for example, evaluate both branches of an if statement!. The code (see Example 6 for proper Spartan syntax and simulation) in generic functional syntax is:
let state = () in    
var saveState = 位x.callcc(位c.c:=state; x) in    
let loadState = !state in   
if saveState false then 1 else loadState true
A visualisation of the execution is given below:


Spartan-like graph rewriting machines can go even far beyond, modelling more exotic operations such as found in data-flow like languages as can be seen in this separate on-line visualiser which shows the execution of typical examples such as finite-input-response filters.

To summarise what we have learned so far, the main ideas are:

  • Pure functional programming languages are algebras with resource management, achieved via thunking (time) and copying (space). The syntactic mechanism for managing copying are variables.
  • Impure languages are languages which are also capable of sharing. The syntactic mechanism for managing sharing are names.
  • The Spartan Universal machine can represent a very broad range of operations while keep track of costs accurately. 
In the next post I will talk about reasoning about properties of programs using the Spartan calculus. 

Thursday 2 May 2019

How to make a programming language out of an algebra

This blog post is an informal introduction to a forthcoming research paper in collaboration with my students Koko Muroya and Todd Waugh Ambridge. This is the first post of a series dedicated to the SPARTAN calculus. 

Algebra, invented by the Babylonians in Antiquity and perfected by Muhammad al-Khwarizmi in the Middle Ages, is a formal language for performing calculations algorithmically. It is not a coincidence that 'algorithm' is derived from 'al-Khwarizmi' and 'algebra' from the word 'al-jabr' introduced by al-Khwarizmi, meaning "reunion of broken parts", or what we might prosaically call 'reduction'.

Yet, if algebra is the language of algorithmic calculations, then how is it different from a programming language? This blog post will sketch out the beginning of an answer to this question. First we look at functional programming languages, then we will progress to non-functional (dis-functional?) features in a follow-on post.

Managing time


To illustrate by example, let us start with a simple algebraic expression:

(1+2)+(1+2)

How is it different from the same expression written in some programming language? 

First of all, algebra has nothing to say regarding the order in which sub-expressions (1+2) should be evaluated. In contrast, programming languages (the vast majority) have a fixed order of evaluation, usually left-to-right. This means that the expression above, seen as a an expression in a programming language, will evaluate like this:
(1+2)+(1+2) =
3+(1+2) =
3+3 =
6
This is in contrast with this alternative, and reasonable, calculation of the algebraic expression:

(1+2)+(1+2) =
(1+2)+3 =
3+3 =
6
Both these sequences are valid for the algebraic expression, but only the former is valid as an execution of the program corresponding to the expression (presuming a set order of evaluation). 

With this in mind, we can now formulate the first and the most obvious distinction between an algebra and a programming language:
In a programming language the order of evaluation should be determined.
The order of evaluation may also involve "skipping" evaluating some sub-expressions altogether, as is the case of the if statement. The programming expression if 0=0 then 1+2 else 3+4 should never require the evaluation of both branches.  The guard 0=0 should be evaluated, then, because it is true, only 1+2. The 3+4 branch should be discarded. We can indicate this graphically by putting expressions with deferred evaluation in boxes. The execution of if can be represented graphically as (ignoring for now the annotations in red):
= 
=
= 3.

So by "order of evaluation" we must understand more than a mere policy of traversal of the expression syntax tree, but also the possibility of deferring some evaluation altogether. 

Managing copying


But what if we name a subexpression in an algebraic expression? In informal mathematics the expression (1+2)+(1+2) is in all respects the same as let x be 1+2 in x+x. However, in a programming language the cost of the two phrases might be different (compiler optimisations aside):

  • if addition is an expensive operations relative to other execution overheads, then the program let x be 1+2 in x+x would run faster than (1+2)+(1+2) because it requires two additions, whereas the latter requires three additions
  • the in-memory size of the two programs is also different. 
For this very simple example the time and size cost differences may be negligible, but for large programs they can be strikingly different. 

Let us represent the two expressions graphically. 

(1+2)+(1+2) is

whereas let x be 1+2 in x+x is 

The node ⊗ corresponds to a "sharing" of sub-expressions, so the syntax tree of the expression is actually a directed acyclic graph. This brings us to our second observation, that
A programming language should have a means of control over sharing of resources. 
This is in fact the main role of variables in a programming language: allowing us to specify how code is to be shared. 

In terms of evaluation, sharing corresponds to copying. For example, let x be 1 in x+x would first become 1+1 and then 2, i.e.
 becomes  which becomes 2.

In more complex expressions there are options in terms of how copying is interleaved with other evaluations. For example let x be 1+2 in x+x could first evaluate the shared sub-expression, going to let x=3 in x+x or copy the shared sub-expression wholesale, going to (1+2)+(1+2). Which brings us to the next key point:
A programming language should have a defined policy for interleaving evaluation and copying.

SPARTAN: a core calculus

Let us now define a core calculus that starts from an algebra and takes into account sharing and the schedule of evaluation. The underlying algebra will have operations 饾湏 of some arity n along with equations to be used in the course of evaluation. We will provide the arguments to an operation as 饾湏(x1, x2, ... ; t1, t2, ...) in the understanding that the xi arguments are always going to be evaluated left-to-right and ti will be deferred. For example, addition is ADD(x, y; -) and if is IF(x ; t, f). We call the arguments before the semicolon "eager" and the rest "thunks". Variables are introduced using the syntax bind x to u in v.

The copying policy is minimalistic: copy only as little as possible. For example, the evaluation steps of bind x to 1+2 in x+x is:


(copy the (+) node) =

 (copy the (1) node) =


(copy the (2) node) =

which eventually produces the final value 6.

Functions as operations


The thunks in the definition of if are quite simple, as the deferment of execution does not involve any variables. This is not the case when we need to deal with functions.  Because when we think of a functional programming language we don't just think expressions, but also mechanisms to define and apply functions. We can handle abstraction and application as two special operations LAMBDA(-; x.t) and APP(u, v; -). Function definition (LAMBDA) has no eager arguments and a thunk t with a special bound variable x. The equation for LAMBDA and APP is the one you would expect:

APP(LAMBDA (-; x.t), v; -) = bind x to v in t

Note that it not redundant to provide all these: LAMBDA and x.t and bind x to v in t. They are each subtly different and serve different purposes:
  • LAMBDA is the operation of function definition
  • x.t indicates that x must be instantiated when thunk t is forced (evaluated)
  • bind x to v in t indicates that sub-term v is to be shared in t via x
A simple example of using the constructs above differently are operations for defining and applying functions with two arguments, LAMBDA2(- ; xy.t) and APP2(t, u, v ; -) with equation 

APP2(LAMBDA2(-; xy.t), u, v) = bind x to u in bind y to v in t

The astute and knowing reader will be perhaps slightly puzzled here. Why define a separate LAMBDA2 and APP2 combination of operations instead of just writing

APP(APP(LAMBDA(-;x.(LAMBDA(-;y.t))), u), v)?

After all, isn't (位xy.t)uv the same as ((位x.位y.t)u)v)? Yes and no. The two are the same in terms of the result, but they are not the same in terms of the cost of execution. The APP2/LAMBDA2 combo requires 4 elementary operations whereas the APP/LAMBDA encoding requires 13 basic operations. Since functional programming requires a lot of abstractions and applications the cost difference in the execution of a program can add up to a lot. And in programming we care about the cost; otherwise we would not be doing programming but algebra. 

Just because some Y can be encoded into X it does not make it redundant. Of course, we know that ultimately everything can be encoded in the lambda calculus, yet we never use Church-encoding numerals in actual programming. All encodings have a cost, and encoding into higher-order functions is expensive.

The same consideration applies to recursion. We can provide an operation REC(-;x.t) with an equation describing the unfolding of recursion:

REC(-;x.t) = bind x to REC(-; x.t) in t

The story of recursion in functional languages is quite amuzing. Recursion can be encoded, like anything else, into the lambda calculus (e.g. the famous Y combinator) but in this case it is not only more expensive than a built-in native recursor but also it interferes with the type structure. Types, of course, were introduced precisely to rule out recursion from the simply-typed lambda calculus! So recursors must be snuck into the language by one syntactic trick or another. 

Now let's put it all together and write the most popular functional program, the factorial, using our core calculus of operations/equations, sharing/copying, and scheduled evaluation:

bind fact = REC (
   ; f.LAMBDA (
      ; x.IF (EQUALS (x, 1)
         ; 1                            
         , bind y = APP (f, MINUS(x, 1)) 
           in TIMES (x, y))))
in APP(fact, 3)                   

You can execute the program step-by-step in the on-line interactive visualiser at https://tnttodda.github.io/Spartan-Visualiser/?4 (preloaded as Example 4). You will see: 
  • the initial syntax tree of the expression, evolving as it is reduced
  • the sharing of sub-expressions via variables, which makes the tree into a directed acyclic graph
  • the thunks marked as boxes, which for the purpose of evaluation and copying are treated as single nodes.
Additionally you will see a pointer (or token) which roams the program graph and which has three states:
  • when it is ? it is traversing the graph (depth-first search, left-to-right)
  • when it is  it is about to apply an equational rewrite
  • when it is ✔︎ it has recognised an operation with no rewrites (i.e. a value) and is returning towards the root of the program. 
If the factorial is confusing then try examples 1-3 first, which are simpler. 

The execution of the factorial is screen-captured in this video (https://youtu.be/b2AEz02eenY):  


In a follow-up post I will discuss adding native effects to this calculus. 

Understanding the issue of equality in Homotopy Type Theory (HoTT) is easier if you are a programmer

We programmers know something that mathematicians don't really appreciate: equality is a tricky concept. Lets illustrate this with a str...