Monday 4 November 2019

Making OCaml more like Excel

Excel, spreadsheet programming in general, is broadly considered a very accessible style of programming. Could we make functional programming more like Excel programming? Or, rather, how can we extract what is appealing about spreadsheet programming and graft it onto a functional programming language?

I will give an answer here. This is an answer rather than the answer because whether you accept it or not is contingent on whether you agree or not with what I will identify as appealing when it comes to Excel.

I will start with a concrete example, and I will conclude with some brief general remarks. The concrete example is running a simple (but not trivial) simulation in Excel, then showing the same simulation in TSD, our new OCaml language extension.

For the Excel simulation I used an existing (and popular) tutorial from SpreadsheetWEB. It is quite important that you at least browse this tutorial before moving on to the rest of this post.

The idea is to construct a "profit estimator" based on historical sales data, then use a Monte Carlo simulation to predict a range of possible future profits. This latter part is the most non-trivial part, relying on a feature called Data Table, part of the What-if Analysis toolset of Excel. As far as I can tell these features are not available in Office Libre or Mac OS's Numbers. I found it rather hacky, but it gets the job done. Another way of running Monte Carlo simulations is using VBA, as seen is this tutorial.

The simulation is conceptually organised as follows:



We have cells containing Cost and Price data, along with Historical Sales, which are a list of numbers. From the Historical Sales we compute Average and Standard Deviation, which are used to generate a Random Sale, using a Gaussian distribution with said average and standard deviation. Together with cost and price this will compute a Profit, as a random variable. Running this experiment 1,000 times we process the data and display it as a histogram. 

Creating a model


In TSD we create cells storing the Cost, Price, and Historical sales as follows:

let cost  = cell [%dfg  8.]
let price = cell [%dfg 12.]
let hs    = lift [122.; 138.; 163.; 161.; 139.; ...  
let historical_sales = cell [%dfg hs]

The let keyword introduces a variable declaration. The cell function creates a cell (think an Excel cell, but identified by name rather than coordinates). The %dfg tag indicates a "data-flow graph", i.e. the data stored in cells. Unlike the native data of the host programming language, data-flow consists not only of values but also of dependencies on other cells. Finally, the lift function creates a data-flow out of ordinary data -- in this case a list.

Up to this point the key difference between TSD and Excel is that cells are identified by names, and that any data type can be stored in a cell, including lists. This is different than Excel, where a list must be stored in a range of cells.

Next, we set up dependencies between cells:

let average_hs = [%dfg average historical_sales]
let std_dev_hs = [%dfg std_dev historical_sales]
let random_hs  = cell [%dfg gauss average_hs std_dev_hs]            
let profit     = [%dfg random_hs * (price - cost)]   

The random sale based on the Gaussian distribution with historical average and standard deviation is placed in a cell, but the average and the standard deviation themselves do not need to be placed in cells. They establish data-dependencies between the cell holding the historical data and the randomly generated sale, but do not require storage in a separate cell, although we could have. The functions average, std_dev, and gauss have standard implementations (not shown).

Using the model


We now have the model above, except for the Profit Histogram cell, which we will implement later. And we can experiment with it, Excel-style. Except that instead of the Excel GUI we use the OCaml interactive top-level environment (REPL). To load the TSD code and the PPX extension:

OCaml version 4.07.1

Findlib has been successfully loaded. Additional directives:
  #require "package";;      to load a package
  #list;;                   to list the available packages
  #camlp4o;;                to load camlp4 (standard syntax)
  #camlp4r;;                to load camlp4 (revised syntax)
  #predicates "p,q,...";;   to set these predicates
  Topfind.reset();;         to force that packages will be reloaded
  #thread;;                 to enable threads

# #ppx "tsd_ext";;
# #require "tsd";;
/Users/danghica/.opam/default/lib/tsd: added to search path
/Users/danghica/.opam/default/lib/tsd/tsd.cma: loaded
# open Tsd;;

After this we load the model:

# #use "tutorial.ml" ;;
val average : (float list -> float) Tsd.graph = <abstr>
val std_dev : (float list -> float) Tsd.graph = <abstr>
val gauss : (float -> float -> float) Tsd.graph = <abstr>
val cost : float Tsd.graph = <abstr>
val price : float Tsd.graph = <abstr>
val hs : float list Tsd.graph = <abstr>
val historical_sales : float list Tsd.graph = <abstr>
val average_hs : float Tsd.graph = <abstr>
val std_dev_hs : float Tsd.graph = <abstr>
val random_hs : float Tsd.graph = <abstr>
val profit : float Tsd.graph = <abstr>

And we are ready to experiment. First, let's use the peek function to inspect what are the computed values in various cells:

#  peek average_hs;;
- : float = 148.95
# peek std_dev_hs;;
- : float = 25.443024584353175
# peek random_hs;;
- : float = 137.839327531968763
# peek profit;;
- : float = 551.357310127875

Now we can recalculate the randomly generated data using the command step, which recomputes the data-flow graph (DFG):

# step ();;
- : bool = true
# peek average_hs;;
- : float = 148.95
# peek std_dev_hs;;
- : float = 25.443024584353175
# peek random_hs;;
- : float = 154.58015036351415
# peek profit;;
- : float = 618.3206014540566

The function step returns true, which means some data was re-computed. This is because the random function call, buried inside the gauss cell generated a new value whenever the DFG is recomputed. As you would hope, the average and standard deviation stay the same, but the random sales and the profit are recomputed.

We can also change some cells and recompute:

# set price 120.;;
- : unit = ()
# step ();;
- : bool = true
# peek profit;;
- : float = 20345.062816955171

Note the function set which raised the Price cell to 120, leading to a much higher Profit value.


Monte Carlo simulation



This is where our idiom diverges from Excel's. We can take advantage of two useful features that Excel does not have: storing lists in cells and creating circular dependencies between cells:

let profit_series = let s = cell [%dfg []] in 
                    link s [%dfg hd profit s]; s

We first create a cell s which is initialised to the empty list, written as '[]' in OCaml. We then create a dependency by linking cell s to the DFG hd profit s. Which can be read as "store in s the list obtained from putting profit at the head of s". This is not quite the same as imperative assignment, because the command link does not simply change s. What it does it creates a dependency between it and an arbitrary DFG.

Now whenever we run our DFG a new value will be prepended to profit_series:

# peek profit_series ;;
- : float list = []
# step();;
- : bool = true
# peek profit_series ;;
- : float list = [363.304693159913768]
# step();;
- : bool = true
# peek profit_series ;;
- : float list = [206.299286780092984; 363.304693159913768]
# step();;
- : bool = true
# peek profit_series ;;
- : float list =

[335.831250389538866; 206.299286780092984; 363.304693159913768]

Finally, we can put it all together using a function that runs n experiments and presents the results as a histogram:

let experiment n =
  assign profit_series [];
  for i = 1 to n do
    step ()
  done;
  profit_series |> peek |> histogram 100. 10.

The assign command instantly changes the value of a cell but not its dependencies, resetting the experiment. Finally histogram x0 n is a function that presents the results as a histogram starting at x0 in buckets of size n. The x |> f syntax in OCaml is a pipeline-style operator for function application, the same as f(x)

The result is:

# experiment 1000;;
- : int list =
[0; 0; 0; 0; 0; 0; 1; 3; 3; 11; 12; 18; 36; 30; 38; 44; 51; 69; 83; 83; 93; 60; 55; 71; 49; 47; 34; 22; 18; 13; 12; 3; 1; 0; 0; 2; 1]

Conclusions


So what are the nice features of Excel we added to OCaml?

The first one is the use of a modal programming style. TSD has two modes: constructing the DFG and running the DFG. The second one is related to the "running" mode: it is transparent. Changes made to some cells are automatically propagated along the DFG.

There are other programming languages that use this idiom, such as TensorFlow or Zélus. TSD is somewhere on that spectrum. It is (a lot more) functional and safe than TensorFlow, but more imperative and less stream-oriented than Zélus. Another OCaml extension which is also somewhat similar is Incremental, with the caveat that it does not allow cycles in the DFG.


Wednesday 4 September 2019

Discovering Category Theory with primary school children


A few years ago I ran a maths circle for primary school children experimenting with teaching category theory. The meetings were documented in a series of blog posts which were recently published as a series of articles by Mathematics Teaching, the journal of the Association of Teachers of Mathematics (issues 264-268).

I am posting a copy of the articles (PDF) with the permission of the ATM: 

[Link to PDF]

Tuesday 16 July 2019

This is not a Turing Machine

The Wikipedia page on Turing Machines shows the following picture as an implementation of a Turing Machine:


I didn't see any problem with that picture until I got into an intricate discussion with my colleague and friend Paul Levy about 'realistic' models for costs of computation. One of the basic premises for such a model is that individual atomic operations have fixed and constant costs in time, space, energy, etc.

This is why the machine above cannot be a proper TM! As the machine is supplied with more and more tape, the spools will grow larger and larger, and their mass will make the (fixed) motor move the tape gradually slower and slower. The cost of computation (time and energy) in the early stages of computation would be lower than in the later stages of computation when the spools of tape are more massive.

This is why the common descriptions of TMs, (see for example the one in the SEP) present the tape fixed, more like a track, with the head gliding left and right along it. With this design it no longer matters how long the tape is, since it is the fixed-size head which is moving.

It is amusing to notice that Turing himself seems to have made this mistake, implying that the head of what he called an a-machine is fixed and the tape is moving:


I have little doubt that in the current reviewing climate such an oversight would have earned his "On Computable Numbers, with an Application to the Entscheidungsproblem" a strong reject.

Joking aside, I think such details are important when it comes to analysing cost of computation. For example, the moving-head-style TM makes it much more complicated to reduce a two-tape TM to a one-tape TM. With mobile tapes the task is trivial -- but mobile tape TMs are not right. With mobile head the task is more subtle since the communication overhead between the two heads and wherever the machine state is stored is no longer fixed! And this problem does not seem very easy to solve, at least not in a straightforward or clean way.

Before I conclude this pedantic rant, this is not the only thing about computability theory that puzzles me. Ever since I was un undergrad learning about the asymptotic complexity of sorting algorithms I couldn't understand how an array element access can be considered to be performable in constant time. To do that we need to read the index in constant time. But how can you read an arbitrarily large natural number, in whatever representation, in constant time? I still can't pretend I understand this.

Tuesday 2 July 2019

A note on homogeneous integration

This is a follow-on to my earlier post on automatic application modernisation

In application modernisation the goal is to reduce a large monolithic application to a modular structure which can be deployed in containers and managed separately, e.g. with Kubernetes. The conventional approach relies on technologies such as web APIs or microservices to integrate the components. In the blog post I showed how laborious this process is, and proposed an automatic approach (research in collaboration with Queen Mary University of London). 

In this post I want to enumerate other of the advantages of the automatic approach, noting that application modernisation is not the only instance when integration is an issue. Another situation when a number of disparate application need to be integrated into a coherent whole: Internet of Things (IoT). I am particularly thinking "smart manufacturing". Of course in application modernisation we start with a homogeneous code base, so a language-specific methodology is possible. In IoT this is not always possible. However, JVMs are quite commonly available on "smart devices" so a JVM-specific approach should be often possible. Even if only some devices are JVM-enabled, integrating them should be much easier using the automatic approach rather than conventional (e.g. microservices) techniques. 

But integration is integration, whether we talk about containerised applications or about communicating smart devices. The layered approach which involves creating, sending, receiving, and decoding messages is neatly subsumed by method calls. Method calls are not only more concise, as they represent a higher level of integration, but are also more secure because of language syntax and type checking. 

In fact let us go through all the criticisms of microservice-based integration and see how mokapot solves that problem:

  • Information barriers: The automatic approach is seamless, so no barriers appear
  • Testing and development: With automatic integration, the behaviour of a multi-device integrated application is consistent to running the entire application on a single node (e.g. an integration server). So development and testing can be carried out on a single node, as if the application was monolithic.
  • Centralised logging and metrics: It is generally desirable that operational data is collected centrally rather than locally on devices. Mokapot has this facility built in. This is used to enable the next bullet point. 
  • Moving responsibilities between services: Deciding what functionality is assigned to what service is a major upfront commitment, which is almost irreversible. The automatic approach doesn't just make this a non issue, but we provide an automatic tool (filtr) which can suggest optimal deployment based on collected runtime data (profiling). 
  • Development and support of many services is more challenging if they are built with different tools and technologies: A homogeneous, automatic, language-based approach makes this again a non-issue.
  • Resilience and fault tolerance: The communication protocols used by Mokapot is flexible, robust, and configurable. Unrecoverable errors are propagated as runtime exceptions, that can be handled programmatically by other components or by an integration server.
  • Packaging and deployment: With Mokapot these steps are greatly simplified. In particular the deployment of software updates is much more secure, as inadvertent interface changes would lead directly to compilation errors, rather than costlier runtime errors.  

Friday 28 June 2019

An Automatic Application Modernisation Toolkit for the JVM

Many large corporations have legacy enterprise applications which are web-enabled, sometimes migrated to the cloud, which are monolithic giants. To fully take advantage of what the cloud can offer, these applications need to be modernised, i.e. broken up into smaller apps which can be then containerised and managed using Kubernetes or related technologies.

There are a variety of technologies for modernising legacy apps, but they all share one feature: they are laborious. Even a monolithic application is only run-time monolithic. The code itself usually is not, but is divided into Classes (in the case of the JVM) or other language-specific logical and compilation units (modules, libraries, etc). The way source code is most commonly glued together in one whole app is via method calls and via shared state. Shared state does not mean global variables, which are best avoided, but also passing objects as method arguments. In Java this means that only a reference to the object is passed, which is aliased to the original reference.

When the application is modernised the method-call composition mechanism is no longer available. The various runtime components no longer communicate by calling each other, but they need to send each other messages via a (web) API or microservices. Moreover, they can no longer share state: objects need to be serialised, sent wholesale, then de-serialised. But aliasing no longer works, because mutations do not propagate across the component boundaries. This means that the application often requires deep re-engineering to replace method calls with message passing and to eliminate all state sharing.

What our research group (Alex Smith and me at the University of Birmingham, Thomas David Cuvillier and Nikos Tzevelekos at Queen Mary University of London) has developped over the last few years is an automatic modernisation toolkit which includes:
  • a runtime library called mokapot which creates a unified state and control space across any number of JVM containers; 
  • a static analysis tool called millr which automatically instruments the bytecode of a JVM app (if needed) to ensure compatibility with mokapot;
  • a profiler and optimiser called filtr which, relying on mokapot, splits a JVM-based app into components in a way that minimises communication overheads (serialisation, deserialisation and network costs). 
Let us illustrate automatic application modernisation with a very simple application, a reminders server. The server is just a priority queue storing events. There are trivial wrapper methods to submit and extract events.
The client is similarly straightforward, submitting some events then retrieving them and printing them to the console:

Let us now see how we modernise this application so that the "client" and the "server" can be placed in two distinct containers. 

First we look at the mokapot-based approach, which is automatic. What we want to do is to instruct the client to create the server on a different node. The software server is now a different process, possibly on a different (physical or virtual) server. Server creation (line 4 above, with the call to ReadyReminderServer constructor) will become instead:

Note that the call to the constructor now is passed as an argument to the method runRemotely of a communicator object. This method will create the object at the stipulated serverAddress but otherwise treat it exactly as if it were a local object. The client code is otherwise unchanged, except for code to create the communicator, start it up, and retrieve its address (in the prelude) and shutting it down (in the postlude):
No change at all is required to the server code. The whole modernisation process takes a few minutes and consists of the addition of a small number (half-dozen or so) of LLOCs.

Let us now consider a more conventional modernisation approach using web APIs. First of all we note that there is no single approach, but a dizzying array of competing technologies and frameworks. Let us pick something that seems commensurate to the simplicity of the example, using directly the HttpURLConnection Java API. The client can be reimplemented by wrapping-up each call in a HTTP request method, so that the new version of the client looks close enough to the original:


What we add is a constant (also half-dozen or so) LLOCs in the prelude of the file. However, the wrappers themselves need to be implemented. Even something as simple as a pair of strings needs to be formatted properly to create a correct URL, create the connection, send the request, and make sure that the request was correctly processed by the server. This [external link] shows a minimalistic implementation of such functionality on top of HTTP. This is already 100+ LLOCs, so too long to inline. One could ask whether this is the most economical way, but the answer is not a simple one. Any framework would require a certain amount of configuring and setting up, not to mention training and learning. (We are happy to record better, alternative web API-based solutions if any.)

On the server side things are more complicated. Whereas in the old code the client simply calls methods on the server, in the "modernised" code the client calls have become HTTP requests which need to be handled very differently. The original ReadyReminderServer class needs to be wrapped in code which handles the requests. The code is given below:

Note that this entire class serves as a wrapper around the original server class.

In addition, we still need the code that processes the HTTP request, parsing it into key/value pairs. This code is algorithmically straightforward but it is over 300 LOCs (including comments) and can be seen following this [external link].

So here you have it. The web API approach to modernising this simple app will take a couple of solid days of full time work from a talented developer. In addition, the web API approach will lead you into the standard hazards of stepping outside the language. The mokapot approach is not only automatic but also stays within language, so many trivial mistakes (e.g. typos) will be caught up by the compiler. In contrast, all the encodings to-and-from text that the web API requires may lead to difficult to debug runtime errors.

The one advantage of the conventional approach is that it results in an "open" web API which can be accessed from any language or from command line (e.g. via curl), if that is desirable. In contrast the mokapot communicators are reserved just for its own use.

The example we chose is actually not so hard to modernise using web APIs, just laborious and error-prone. But imagine a situation when the messages contain pointers to shared structures. Because through a web API we cannot normally send object pointers the refactoring would be significantly harder, going beyond wrappers of the original code. If those shared structures are mutable then the problem of modernisation becomes almost intractable. The mokapot framework not only handles access to shared data transparently and efficiently (by eliminating the need to replicate it) but it also preserves all single-JVM behaviour, including garbage collection of useless objects.

Let us illustrate one such situation now, which is trivial to modernise via mokapot because of automatisation, but becomes virtually intractable via web APIs. Suppose that we want to send a reminder from the server to the client when an event is ready. In the monolithic application we do it by associating a call-back function with the event (lines 7-8 and 10-11):

Because mokapot creates a virtual memory and control space across all nodes, the same instrumentation as before


enabled by the same boilerplate as before will allow the code to just work across node boundaries:


No changes needed to the server.

In contrast, modernising this version of the application, which uses callbacks for reminders, using web APIs is a significant challenge, which can no longer follow the standard recipe. The application needs to be deeply reengineered and recoded. This is because callback functions contain code, which makes it impossible for them to be serialised as pure data, and sent over the net. The automatic approach works because the memory space is virtualised and includes both nodes, making any kind of method call possible in a transparent way. So we leave this as an exercise to the reader. 🙃

Some final notes and comments of additional benefits of the automatic approach:

  • If the intention is to expose some of the functionality of an app component as a web API, this is compatible with the automatic approach. Nothing prevents a mixed automatic and manual (conventional, API-based) approach to modernisation.
  • The automatic approach is non-commital. It will not lock you in any particular technology, including mokapot itself, since the re-engineering and re-factoring effort is minimal. It will also not lock you into any particular architecture, since the decisions of how a monolithic application is split into component require little re-engineering and re-factoring, compared to the manual approach. 
  • The built-in serialisation and de-serialisation of objects is very fast in mokapot, surpassing standard Json libraries. 
  • The mokapot communicators which handle remote objects can be used in much more sophisticated ways than the boilerplate-style indicated in the example, making the node-selection algorithmic (e.g. for load-balancing), all seamlessly integrated into the same virtual memory and control space of the application. 
  • Besides what we already mentioned, mokapot performs other advanced operations for managing the virtual memory space, for example automatically migrating objects at runtime between nodes in order to maintain a high level of performance. 
  • The software works both on server and mobile (Android) JVMs. On mobile it handles connections robustly, for example allowing the mobile component to switch seamlessly between WiFi and the mobile (cell) network without service disruption. 
  • Error management (node failure, permanent connection failure) is managed programmatically via special mokapot exceptions, which can be handled programmatically, for example to migrate objects between nodes at runtime. 
  • The mokapot crypto is two-factor. The code shows a hard-coded password, but a P12 file is also required along it. 

Note: All code courtesy of Alex Smith. Comments and import statements have been removed for brevity.

Saturday 4 May 2019

The essence of (im)pure computation

This blog post is an informal introduction to a forthcoming research paper in collaboration with my students Koko Muroya and Todd Waugh Ambridge. This is the second post of dedicated to the SPARTAN calculus. The first post was How to make a programming language out of an algebra.

For the impatient reader I will summarise the point of the previous post:
algebra + thunking + copying = pure functional programming
The "algebra" part are the operations of the language. They include not only what we commonly think of as operations, such as arithmetic, but also function definition and application or recursion. The "thunking" is a way of postponing computations and "copying" is self explanatory. Together, thunking and copying are a means of providing control over the resources used in the process of computation, both time and space. A program that illustrates all these concepts put together is the factorial, which you can explore using the Spartan online visualiser.

How about "impure" features of programming languages? For example state and assignment. The first step is to add a new feature to our core calculus: sharing. In pure computation variables are "referentially transparent", which is a fancy way of saying that any term bind x to u in v can be equivalently rewritten by substituting u for x in v. This means that assignment cannot be naturally realised as an operation unless we add a notion of name or reference to the Spartan infrastructure:
new a = u in v
Contrast this with variable binding. The expression bind x to 0 in ASG(x, ADD(DEREF(x), 1)), which is Spartan syntax for something like int x=0; x=x+1, should be equivalent by referential transparency with 0=0; 0=0+1, which is clearly nonsense. On the other hand names are not copied, so new a = 0 in ASG(a, ADD(DEREF(a), 1)) is not subject to any substitutional simplification. To emphasise the fact that names are "like constants" rather than "like variables" we use a, b, ... to denote them.

In terms of diagrammatic representation, names (new) are represented as a new kind of node, an undecorated dot. Here are the two terms mentioned above, diagrammatically:

 

The (I)-labelled nodes play no active role, they are merely "adaptors", like between an European and an American socket. The first one is not a well-structured program and will block at run-time when it reaches this stage, which corresponds to ASG(0, ADD(DEREF(0), 1)):

The lighting bolt indicates that DEREF(0) (written as !0) is about to be evaluated -- but there is no rule for that.

On the other hand, the key stages for the second term are:
 to  to  to  to 

The evaluation of dereferencing, the (!) node, resolves in the argument of (+) being linked to whatever the (unlabelled) name node is linked to, namely the constant 0, which is duly copied. The addition (+) is computed in the obvious way. Finally, the assignment itself produces a dummy value (), and changes the term linked to by the name node to (1). At the same time this node becomes unreachable, i.e. "garbage".

The name can connect to anything, data ("reference") or code ("label"), storing functions or reference points in a program. The only feature of the name is that it is never copied, always shared. We can thus conclude that
Pure languages copy, impure languages share. Copying is done through variables, sharing is done through names.  
The issue of names and sharing is foundationally interesting, but not essential as far as expressiveness goes. Indeed, stateful-like behaviour can be encoded into (pure) higher-order functions using (for example) monads. Why do we then bother with adding native or primitive state, as we do? Simply because of the cost involved. The overhead required by encoding is significant.

In a language such as OCaml, which has both native state and higher-order functions, we can experiment by running the same computation both natively and using the state monad. On the computer I am writing this blog now the overhead of the simulated monadic state is approximately 3,200%. But this is contingent on many factors, which usually make benchmarks unreliable for such analyses.

In contrast, an abstract machine allows an abstract, platform-independent evaluation of performance by counting steps and associating costs with the various actions performed at each particular step. In the case of Spartan, the monadic simulation takes 3,498 steps (see Example 10) whereas the native state implementation takes 76 steps (see Example 9). The 46X overhead exhibited by the abstract machine is surprisingly close to the 32X overhead observed in the benchmark. Part of the execution (monadic simulation takes too long...) is shown in this side-by-side video:


The Spartan framework is flexible and general enough that we could express any known operations we could think of. Here is for example what can happen when you have a language combining call/cc, a very powerful control operator, with state. Such a combination is called "stored continuation" and it is so expressive that you can, for example, evaluate both branches of an if statement!. The code (see Example 6 for proper Spartan syntax and simulation) in generic functional syntax is:
let state = () in    
var saveState = λx.callcc(λc.c:=state; x) in    
let loadState = !state in   
if saveState false then 1 else loadState true
A visualisation of the execution is given below:


Spartan-like graph rewriting machines can go even far beyond, modelling more exotic operations such as found in data-flow like languages as can be seen in this separate on-line visualiser which shows the execution of typical examples such as finite-input-response filters.

To summarise what we have learned so far, the main ideas are:

  • Pure functional programming languages are algebras with resource management, achieved via thunking (time) and copying (space). The syntactic mechanism for managing copying are variables.
  • Impure languages are languages which are also capable of sharing. The syntactic mechanism for managing sharing are names.
  • The Spartan Universal machine can represent a very broad range of operations while keep track of costs accurately. 
In the next post I will talk about reasoning about properties of programs using the Spartan calculus. 

Thursday 2 May 2019

How to make a programming language out of an algebra

This blog post is an informal introduction to a forthcoming research paper in collaboration with my students Koko Muroya and Todd Waugh Ambridge. This is the first post of a series dedicated to the SPARTAN calculus. 

Algebra, invented by the Babylonians in Antiquity and perfected by Muhammad al-Khwarizmi in the Middle Ages, is a formal language for performing calculations algorithmically. It is not a coincidence that 'algorithm' is derived from 'al-Khwarizmi' and 'algebra' from the word 'al-jabr' introduced by al-Khwarizmi, meaning "reunion of broken parts", or what we might prosaically call 'reduction'.

Yet, if algebra is the language of algorithmic calculations, then how is it different from a programming language? This blog post will sketch out the beginning of an answer to this question. First we look at functional programming languages, then we will progress to non-functional (dis-functional?) features in a follow-on post.

Managing time


To illustrate by example, let us start with a simple algebraic expression:

(1+2)+(1+2)

How is it different from the same expression written in some programming language? 

First of all, algebra has nothing to say regarding the order in which sub-expressions (1+2) should be evaluated. In contrast, programming languages (the vast majority) have a fixed order of evaluation, usually left-to-right. This means that the expression above, seen as a an expression in a programming language, will evaluate like this:
(1+2)+(1+2) =
3+(1+2) =
3+3 =
6
This is in contrast with this alternative, and reasonable, calculation of the algebraic expression:

(1+2)+(1+2) =
(1+2)+3 =
3+3 =
6
Both these sequences are valid for the algebraic expression, but only the former is valid as an execution of the program corresponding to the expression (presuming a set order of evaluation). 

With this in mind, we can now formulate the first and the most obvious distinction between an algebra and a programming language:
In a programming language the order of evaluation should be determined.
The order of evaluation may also involve "skipping" evaluating some sub-expressions altogether, as is the case of the if statement. The programming expression if 0=0 then 1+2 else 3+4 should never require the evaluation of both branches.  The guard 0=0 should be evaluated, then, because it is true, only 1+2. The 3+4 branch should be discarded. We can indicate this graphically by putting expressions with deferred evaluation in boxes. The execution of if can be represented graphically as (ignoring for now the annotations in red):
= 
=
= 3.

So by "order of evaluation" we must understand more than a mere policy of traversal of the expression syntax tree, but also the possibility of deferring some evaluation altogether. 

Managing copying


But what if we name a subexpression in an algebraic expression? In informal mathematics the expression (1+2)+(1+2) is in all respects the same as let x be 1+2 in x+x. However, in a programming language the cost of the two phrases might be different (compiler optimisations aside):

  • if addition is an expensive operations relative to other execution overheads, then the program let x be 1+2 in x+x would run faster than (1+2)+(1+2) because it requires two additions, whereas the latter requires three additions
  • the in-memory size of the two programs is also different. 
For this very simple example the time and size cost differences may be negligible, but for large programs they can be strikingly different. 

Let us represent the two expressions graphically. 

(1+2)+(1+2) is

whereas let x be 1+2 in x+x is 

The node ⊗ corresponds to a "sharing" of sub-expressions, so the syntax tree of the expression is actually a directed acyclic graph. This brings us to our second observation, that
A programming language should have a means of control over sharing of resources. 
This is in fact the main role of variables in a programming language: allowing us to specify how code is to be shared. 

In terms of evaluation, sharing corresponds to copying. For example, let x be 1 in x+x would first become 1+1 and then 2, i.e.
 becomes  which becomes 2.

In more complex expressions there are options in terms of how copying is interleaved with other evaluations. For example let x be 1+2 in x+x could first evaluate the shared sub-expression, going to let x=3 in x+x or copy the shared sub-expression wholesale, going to (1+2)+(1+2). Which brings us to the next key point:
A programming language should have a defined policy for interleaving evaluation and copying.

SPARTAN: a core calculus

Let us now define a core calculus that starts from an algebra and takes into account sharing and the schedule of evaluation. The underlying algebra will have operations 𝜓 of some arity n along with equations to be used in the course of evaluation. We will provide the arguments to an operation as 𝜓(x1, x2, ... ; t1, t2, ...) in the understanding that the xi arguments are always going to be evaluated left-to-right and ti will be deferred. For example, addition is ADD(x, y; -) and if is IF(x ; t, f). We call the arguments before the semicolon "eager" and the rest "thunks". Variables are introduced using the syntax bind x to u in v.

The copying policy is minimalistic: copy only as little as possible. For example, the evaluation steps of bind x to 1+2 in x+x is:


(copy the (+) node) =

 (copy the (1) node) =


(copy the (2) node) =

which eventually produces the final value 6.

Functions as operations


The thunks in the definition of if are quite simple, as the deferment of execution does not involve any variables. This is not the case when we need to deal with functions.  Because when we think of a functional programming language we don't just think expressions, but also mechanisms to define and apply functions. We can handle abstraction and application as two special operations LAMBDA(-; x.t) and APP(u, v; -). Function definition (LAMBDA) has no eager arguments and a thunk t with a special bound variable x. The equation for LAMBDA and APP is the one you would expect:

APP(LAMBDA (-; x.t), v; -) = bind x to v in t

Note that it not redundant to provide all these: LAMBDA and x.t and bind x to v in t. They are each subtly different and serve different purposes:
  • LAMBDA is the operation of function definition
  • x.t indicates that x must be instantiated when thunk t is forced (evaluated)
  • bind x to v in t indicates that sub-term v is to be shared in t via x
A simple example of using the constructs above differently are operations for defining and applying functions with two arguments, LAMBDA2(- ; xy.t) and APP2(t, u, v ; -) with equation 

APP2(LAMBDA2(-; xy.t), u, v) = bind x to u in bind y to v in t

The astute and knowing reader will be perhaps slightly puzzled here. Why define a separate LAMBDA2 and APP2 combination of operations instead of just writing

APP(APP(LAMBDA(-;x.(LAMBDA(-;y.t))), u), v)?

After all, isn't (λxy.t)uv the same as ((λx.λy.t)u)v)? Yes and no. The two are the same in terms of the result, but they are not the same in terms of the cost of execution. The APP2/LAMBDA2 combo requires 4 elementary operations whereas the APP/LAMBDA encoding requires 13 basic operations. Since functional programming requires a lot of abstractions and applications the cost difference in the execution of a program can add up to a lot. And in programming we care about the cost; otherwise we would not be doing programming but algebra. 

Just because some Y can be encoded into X it does not make it redundant. Of course, we know that ultimately everything can be encoded in the lambda calculus, yet we never use Church-encoding numerals in actual programming. All encodings have a cost, and encoding into higher-order functions is expensive.

The same consideration applies to recursion. We can provide an operation REC(-;x.t) with an equation describing the unfolding of recursion:

REC(-;x.t) = bind x to REC(-; x.t) in t

The story of recursion in functional languages is quite amuzing. Recursion can be encoded, like anything else, into the lambda calculus (e.g. the famous Y combinator) but in this case it is not only more expensive than a built-in native recursor but also it interferes with the type structure. Types, of course, were introduced precisely to rule out recursion from the simply-typed lambda calculus! So recursors must be snuck into the language by one syntactic trick or another. 

Now let's put it all together and write the most popular functional program, the factorial, using our core calculus of operations/equations, sharing/copying, and scheduled evaluation:

bind fact = REC (
   ; f.LAMBDA (
      ; x.IF (EQUALS (x, 1)
         ; 1                            
         , bind y = APP (f, MINUS(x, 1)) 
           in TIMES (x, y))))
in APP(fact, 3)                   

You can execute the program step-by-step in the on-line interactive visualiser at https://tnttodda.github.io/Spartan-Visualiser/?4 (preloaded as Example 4). You will see: 
  • the initial syntax tree of the expression, evolving as it is reduced
  • the sharing of sub-expressions via variables, which makes the tree into a directed acyclic graph
  • the thunks marked as boxes, which for the purpose of evaluation and copying are treated as single nodes.
Additionally you will see a pointer (or token) which roams the program graph and which has three states:
  • when it is ? it is traversing the graph (depth-first search, left-to-right)
  • when it is  it is about to apply an equational rewrite
  • when it is ✔︎ it has recognised an operation with no rewrites (i.e. a value) and is returning towards the root of the program. 
If the factorial is confusing then try examples 1-3 first, which are simpler. 

The execution of the factorial is screen-captured in this video (https://youtu.be/b2AEz02eenY):  


In a follow-up post I will discuss adding native effects to this calculus. 

Tuesday 9 April 2019

Andromeda and vegetarian mathematics

This post was inspired by a Twitter dialogue between two mathematicians whom I admire.

Andrej Bauer tweeted to John Baez:
John, you should try programming in a language that guarantees your program is correct. It will feel like proving theorems all the way through. (Agda is a good candidate.) 
To which John replied:
Nothing will ever guarantee my programs are correct, because I don't have the patience to work through lots of details of syntax.  It's just not interesting enough to be worthwhile to me.   
I prefer to just think about stuff until I understand it clearly, then explain it.
This exchange motivated me to write down some thoughts on proof assistants, particularly since a wonderful graduate school dedicated to "univalent mathematics" just finished at our university, so the topic is fresh in my mind.


Computer proof assistants have been around since the 1970s and yet the term "assistant" is still almost a misnomer. You don't really feel much "assisted" when using them. I say this even though I like them enough to have introduced them in my university's undergraduate teaching curriculum. Or maybe I don't like them, I like the idea of them.

This is because the main purpose of a proof assistant is not to help you prove a statement, but to prevent you from making a mistake when proving a statement. They are more airbag than sat-nav. More seatbelt than gas pedal. This is why most people feel about proof assistants as they feel about vegetarianism. They respect it. They know the world would be a better place if everyone did it. But they like hamburgers too much.

I think that some in the proof assistant community are beginning to understand that the obsession with error avoidance could be a bit misguided. Andrej Bauer's talk on his new proof assistant Andromeda was truly illuminating. The new proof assistant seems very appealing. The idea can be summarised like this: Errors in proofs could happen because mathematicians make silly mistakes or because the underlying theory is unsound. The former is a practical concern, the latter a foundational concern. Lets focus mainly on former. 

The foundational concern has obsessed logicians and philosophers like Russell and Frege, and we have learned a great deal about how to avoid the most obvious pitfalls. But avoiding them comes at a cost. We need to restrict ourselves to a small set of syntactic primitives. In particular, we need to have as few axioms as possible, since too many axioms may lead to inconsistency. If axioms are the meat of mathematics, then type theory is the ultimate vegetarian cookbook.

The problem, and perhaps this is what John is alluding to, is that in this setting a proof about X often may not feel like a proof about X but like a proof about some Y, the encoding of X into the heavily regulated formal system. There could be a cold remoteness to such an encoding. Compare the concrete beauty of Euclidean geometry to its analytic version -- all symbols, no figures. Some people may like that, there is no accounting for taste, but others would rather just do geometry when they do geometry, not algebra. Lakatos, in his wonderful Proofs and Refutations, works this example in detail and in the process transforms a flawed but intuitively appealing geometrical proof into a flawless but impenetrable algebraic proof. He calls this premium on correctness at the expense of everything else deductivism. He advocates instead heuristic proofs, which have intrinsic value as a mathematical thought process. The most extreme instance of a deductivist proof is the purely computer-generated proof, which confirms that a sentence is true outside of any need for human insight.

An aside: The foundationalist preoccupation with encoding concepts into other concepts is quite pervasive in computer science. The Church-Turing hypothesis, the birth certificate of our discipline, is all about encodings. Its extreme reductionism is sensible in context, but the idea of encoding concepts into other concepts may not always be sensible. It is true that all x86 instructions can be reduced to MOV and it is true that many computational effects can be encoded into the lambda calculus via monads. But this is only interesting if you ignore the cost. In the case of programming the cost is space and time. In the case of mathematics the cost is understanding and insight. 

The practical concern -- mathematicians making mistakes -- is however a very reasonable one. Andrej's Andromeda prover discounts foundational worries and focusses on avoiding mistakes. The mathematician is not prevented from formulating possibly inconsistent axiom systems. This starts to smell like hamburger. This is a tool that lets you get things done with a minimum of fuss. This could be mathematical fast food -- sinful and delicious. Curiously, in some sense it is a throwback to old-fashioned systems such as LCF or ACL, and a departure from more modern systems like Agda or Coq. But it smells like hamburger.

I had my own personal encounter with such foundational issues when working on categorical semantics for digital circuits (paper in CSLinvited talk at Applied Category Theory 2018). A categorical model is ultimately a bunch of axioms. There is a danger of specifying a degenerate category in which the only models are trivial (e.g. everything is equal). To show that this is not the case it is usual to provide one or more interesting models -- in the case of digital circuits, a category of streams and functions on streams. The categorical model had a lot of meat (I mean axioms) so it is a reasonable worry that put together they would amount to a degenerate category. Or is it?

I always felt slightly annoyed about having to provide such a concrete model. It is technically difficult and I felt it was unnecessary. Luckily my colleague Achim Jung worked it out.

When modelling an aspect of physical reality (e.g. digital circuits) there is another non-mathematical possibility of validating the axioms: by experiment. Each axiom can be seen as a (natural) law. And in this case the 'model' of our theory is reality itself. On the other hand, the concrete mathematical model is not something we could validate experimentally, just mathematically. Moreover, the concrete model has more properties than the categorical model, some of them impossible to validate experimentally. So do we really need such a model? If the basis of truth is the reality of digital circuits, could a model of them be experimentally validated yet trivial/inconsistent? Does it even matter?

The categorical model is perhaps not exactly "hamburger", but it does have a lot of meaty axioms. And we can actually compute the properties of some tricky circuits with it! The concrete streams-and-functions models is a requirement only from a foundationalist perspective.

Of course, satisfying the practical and the foundational requirements is a best-case scenario. Having a concrete non-degenerate model or a consistent mathematical theory cannot hurt. On the contrary. Maybe a kind of healthy, environmental, lab-synthesised meat will be possible in the future. Until then, there are difficult choices to be made.

Understanding the issue of equality in Homotopy Type Theory (HoTT) is easier if you are a programmer

We programmers know something that mathematicians don't really appreciate: equality is a tricky concept. Lets illustrate this with a str...