Monday, 30 May 2022

Understanding the issue of equality in Homotopy Type Theory (HoTT) is easier if you are a programmer

We programmers know something that mathematicians don't really appreciate: equality is a tricky concept. Lets illustrate this with a straightforward example in some generic OO programming language. I am trying to make this language-independent and neither use any exotic concepts nor ban any common ones, but use whatever makes the issue easy to illustrate. 

interface Real;
interface Complex: getReal(); getImg(); getAbs(); getArg(); ... 

class Cartesian implements Complex: 
    Real x, y 
    Cartesian(x', y'): x = x'; y = y'
    getReal(): x
    getImg(): y
    getAbs(): sqrt(x*x + y*y) 
    getArg(): arctan(x/y)
    ...
 
class Polar implements Complex: 
    Real r, t    
    Polar(r', t'): r = r'; t = t' 
    getReal(): r * cos t
    getImg(): r * sin t
    getAbs(): r
    getArg(): t
    ...

We have the well trodden example of complex numbers with two implementations, Cartesian and Polar. 

Now, how can two Complex numbers c1 and c2 be equal?

Most superficially, if the language allows reference testing c1 and c2 are equal if they are the same reference, i.e. the same memory location, i.e. the "same thing" in an almost physical way. If not, they are in some sense "not equal" as there is an operation distinguishing them, namely reference-equality testing. But this is pretty much the only operation that can distinguish them, and a pretty particular one. This is not usually what we want. 

A deeper comparison is to compare the data, but that can only happen if the two Complex numbers are both Cartesian or both Polar. In this case, if their x, y members (or, respectively r, t, members) are equal then the two Complex numbers are also equal. 

If the type of Reals is a built-in type, for example that of floating point numbers, then they have a standard notion of equality, so we are done. But what if the type of Reals is itself an interface? There are, for example, several ways in which arbitrary precision Reals can be implemented in order to avoid the rounding errors that are so difficult to manage in the case of floating point. 

To compare Reals as an abstract data types (interface) we are back to the same situation: compare them as references (not good) or compare whatever data members they have, if they belong to the same class instantiating the interface. 

So the issue of equality is pretty complicated to begin with. But what about comparing Complex numbers from different classes? What about the Cartesian with x=y=0 and the Polar with r=t=0? They both represent the same (abstract) Complex number, i.e. the same point in the Complex plane, no matter what system of coordinates we use, Cartesian or Polar -- namely 0. The fact that x=y=r=t=0 is a coincidence, as the same could be said about points x=0, y=1 vs r=1, t=pi/2 which both represent the complex number i, or any complex number and its two implementations. 

This forces us to ask a deeper question: the two representations of Complex numbers 0 or i are clearly not "equal" if we peek under the covers. They rely on different implementations. However, if we have no access to the implementation, we can be sure that any program written just with Complex numbers and their interface will be unable to differentiate between them. We simply cannot write a function that returns a different result, when given either of the implementations of the same complex numbers, either Cartesian or Polar. This makes the two implementation of the same complex number intersubstitutable, hence equal. Note that certain low-level primitives in real-life languages (e.g. hash functions) could still distinguish them and break the equality. 

Mathematicians do not have to struggle, usually, with the issue of multiple implementations of the same concepts. This is why they tend to be happy enough with the notion of equality that says that two things are equal if their representations are equal. This is how equality is managed in the most common foundation of mathematics, set theory. However, some mathematicians such as Grothendieck, noticed that sometimes this is not good enough. These slides [PDF] are a very accessible introduction to the problem. (And in fact they were the motivation of this blog post.)

HoTT is a newer foundation of mathematics, much less settled than set theory, which takes a more refined view of equality. This makes it in some sense more complicated at the outset, when the objects are simple, but there is an expectation that this higher initial investment is worth it in the long term as we need to handle more complex mathematical objects. I am not going to say more about HoTT here, but hopefully you will have now understood why equality is such a sophisticated concept in it, unlike in good old fashioned set theory. 

Especially if you are a programmer, you will hopefully relate. 

Monday, 14 February 2022

MGS 2022: String diagrams for monoidal closed categories

Between 10-13 April 2022 I will teach an advanced course at the Midlands Graduate School, on the topic of string diagrams for monoidal closed categories. 

String diagrams are a graphical formalism for category theory which found recent widespread applications in areas as diverse as quantum computing or computational linguistics. These languages are based on compact-closed categories, a mathematical framework that generalises the concept of relations. 

In contrast, programming languages are based on monoidal-closed categories, a mathematical framework that generalises the concept of function spaces. String diagrams for these categories have been studied less, yet they are useful in bridging the gap between the syntax of programming languages and and the principled extraction (or distillation) of efficient abstract machines. They are also useful in clarifying complex syntactic transformations such as closure conversion or automatic differentiation. 

The attending student will be assumed to have some familiarity with basic category theory. Understanding of compilers is not required but it will provide additional motivation. 

Sunday, 19 April 2020

Teaching Compilers

For the first time in my 15 years as a lecturer I got to teach Compilers. I have written compilers before and I have published a bunch of research papers on the topic, so I considered myself reasonably qualified to teach it. I had also taught Compilers Workshop before, a hands-on project oriented course on building a working compiler. So from my own experience and from observing students, I had a pretty good idea about where the challenges lie. So I wanted to teach a course that was aimed particularly at those challenges. I looked around quite hard for a textbook that would give some emphasis to these topics which I deemed to be crucial, but I failed to find one.

Here are some random musings on these topics.

Parsing

Compiler books spend an inordinate amount of time on the intricacies of parsing. As a subject in itself parsing is fascinating both from a theoretical and algorithmic perspective. As a conceptual hurdle in building a parser for a compiler, it is not a significant one. If the aim is to build a compiler, rather than a parser, then the reasonable thing to do is to use a parser generator. To use a parser generator the required concepts are that of a grammar and that of conflict. The intricacies of understanding the details of the parsing algorithm are irrelevant as they do not help designing elegant grammars nor even resolving conflicts. Like grammar design, resolving conflicts is an art, because there are no algorithms for resolving grammar conflicts. I suspect the reason parsing is such a popular element of compiler courses is that it's very testable. Putting a grammar into a particular form so it becomes parsable by a particular parser is the kind of technical drudgery that the student will forget as soon as they write their final exam, but it makes the job of writing a final exam easy. Parsing could be a valid stand-alone course for those interested in that kind of stuff.

Operational Semantics

The absence of OS from compilers textbook is to me baffling. Any language needs to be specified before it's implemented and OS is a reasonably straightforward way to achieve that. Instead, conventional compilers books only show how to compile basic language features that you find in a C compiler and their behaviour is "obvious". Even closures, essential for functional languages, are often ignored. But if you wanted to write a new compiler, presumably your language had some unique features? How do you specify those new features? Operational Semantics.

Abstract Machines

They are the nexus between OS and executable code. There is some beautiful (and reasonably elementary and accessible) theory about how AMs can be derived from OS. In the case of functional programming in particular AMs are unavoidable. But even in the case of OO languages, which have a complex method dispatch protocol, an AM is the best way to understand the OS in an execution-oriented way.

Types

Modern programming languages are all about types, yet compiler textbook spend a trivial amount of time on what a type system is and on type inference algorithms. Even the terminology for type inference and type checking enshrined in compiler textbook is bizarre, "semantic analysis". Students must be aware of the powerful combination of OS and Type Systems in order to guarantee various soundness-style runtime properties, even though the details of such proofs do not belong in the compilers course (but in a Programming Languages course). However, these concepts must be understood by any compiler designer and implementer. 

Code generation

Instead of OS and AMs the conventional compiler book will jump straight into code generation, presented as a series of recipes for compiling this feature and that feature. Code generation is usually baffling because there is little understanding as to where those recipes come from (absent an AM model) and how we know they are correct (absent an OS). They are taken as obvious, I suppose, but they are not.

Front-end optimisations

So long as the code is still represented as some kind of syntax tree most compiler books do a good job at presenting optimisation. Things like constant folding and constant propagation are a good mix of interesting concepts and clever algorithms. Particularly interesting, both conceptually and algorithmically is Single Static Assignment and optimisations deriving from it. However, absent an OS it is difficult to justify that such transformations are meaning-preserving. Also, absent an AM with an execution-cost model it is difficult to justify that a particular transformation reduces the cost of execution. Both these aspects need to be taken as obviously correct, but it is not entirely clear that they always are.

Back-end optimisations

After code-generation optimisation becomes an extremely complex exercise. Things like register allocation and other memory and instruction management optimisations are so tied to the machine model that algorithms are fragile. We need to either give Mickey Mouse algorithms based on greatly simplified machine models or we stand to be drowned into a sea of processor and operating/runtime system details that change continuously due to evolving technologies. I think that at this level the student should only be made aware of the concepts, yet the compiler textbooks insist on technical detail -- register allocation via graph colouring is the darling here. I think this is also the kind of stuff that the student will instantly forget so there is no great reason to teach it, except that it makes for easy-to-grade exam questions.

Exotic architectures

Nowadays multicore, cloud, GPUs and FPGAs are no longer rare. They all raise interesting and pertinent challenges for compilers, yet they are all largely ignored. Technical details are complicated so they cannot be taught, but the concepts can be taught in a way that at least makes students aware of the challenges they will face once they leave the enchanted land of the compiler course and its vanilla Von Neumann architecture.


In my own course I tried to address all these issues by:
  • teaching only basic concepts of parsing along with some hands-on skills on parser generators
  • introducing operational semantics, for higher-order functions and state
  • introducing types, type inference, and the basic idea of runtime safety
  • introducing abstract machines derived from the operational semantics and showing how they lead to efficient interpreters
  • deriving code from the abstract machines
  • conventional presentation of optimisations (both front-end and back-end)
  • a tour of exotic architecture and programming languages for the future
It was an incredible amount of work and the course notes are still quite rough -- too rough for public consumption -- but very satisfying. Most satisfying was the very positive reaction of my students who rated the course very highly. 

Monday, 4 November 2019

Making OCaml more like Excel

Excel, spreadsheet programming in general, is broadly considered a very accessible style of programming. Could we make functional programming more like Excel programming? Or, rather, how can we extract what is appealing about spreadsheet programming and graft it onto a functional programming language?

I will give an answer here. This is an answer rather than the answer because whether you accept it or not is contingent on whether you agree or not with what I will identify as appealing when it comes to Excel.

I will start with a concrete example, and I will conclude with some brief general remarks. The concrete example is running a simple (but not trivial) simulation in Excel, then showing the same simulation in TSD, our new OCaml language extension.

For the Excel simulation I used an existing (and popular) tutorial from SpreadsheetWEB. It is quite important that you at least browse this tutorial before moving on to the rest of this post.

The idea is to construct a "profit estimator" based on historical sales data, then use a Monte Carlo simulation to predict a range of possible future profits. This latter part is the most non-trivial part, relying on a feature called Data Table, part of the What-if Analysis toolset of Excel. As far as I can tell these features are not available in Office Libre or Mac OS's Numbers. I found it rather hacky, but it gets the job done. Another way of running Monte Carlo simulations is using VBA, as seen is this tutorial.

The simulation is conceptually organised as follows:



We have cells containing Cost and Price data, along with Historical Sales, which are a list of numbers. From the Historical Sales we compute Average and Standard Deviation, which are used to generate a Random Sale, using a Gaussian distribution with said average and standard deviation. Together with cost and price this will compute a Profit, as a random variable. Running this experiment 1,000 times we process the data and display it as a histogram. 

Creating a model


In TSD we create cells storing the Cost, Price, and Historical sales as follows:

let cost  = cell [%dfg  8.]
let price = cell [%dfg 12.]
let hs    = lift [122.; 138.; 163.; 161.; 139.; ...  
let historical_sales = cell [%dfg hs]

The let keyword introduces a variable declaration. The cell function creates a cell (think an Excel cell, but identified by name rather than coordinates). The %dfg tag indicates a "data-flow graph", i.e. the data stored in cells. Unlike the native data of the host programming language, data-flow consists not only of values but also of dependencies on other cells. Finally, the lift function creates a data-flow out of ordinary data -- in this case a list.

Up to this point the key difference between TSD and Excel is that cells are identified by names, and that any data type can be stored in a cell, including lists. This is different than Excel, where a list must be stored in a range of cells.

Next, we set up dependencies between cells:

let average_hs = [%dfg average historical_sales]
let std_dev_hs = [%dfg std_dev historical_sales]
let random_hs  = cell [%dfg gauss average_hs std_dev_hs]            
let profit     = [%dfg random_hs * (price - cost)]   

The random sale based on the Gaussian distribution with historical average and standard deviation is placed in a cell, but the average and the standard deviation themselves do not need to be placed in cells. They establish data-dependencies between the cell holding the historical data and the randomly generated sale, but do not require storage in a separate cell, although we could have. The functions average, std_dev, and gauss have standard implementations (not shown).

Using the model


We now have the model above, except for the Profit Histogram cell, which we will implement later. And we can experiment with it, Excel-style. Except that instead of the Excel GUI we use the OCaml interactive top-level environment (REPL). To load the TSD code and the PPX extension:

OCaml version 4.07.1

Findlib has been successfully loaded. Additional directives:
  #require "package";;      to load a package
  #list;;                   to list the available packages
  #camlp4o;;                to load camlp4 (standard syntax)
  #camlp4r;;                to load camlp4 (revised syntax)
  #predicates "p,q,...";;   to set these predicates
  Topfind.reset();;         to force that packages will be reloaded
  #thread;;                 to enable threads

# #ppx "tsd_ext";;
# #require "tsd";;
/Users/danghica/.opam/default/lib/tsd: added to search path
/Users/danghica/.opam/default/lib/tsd/tsd.cma: loaded
# open Tsd;;

After this we load the model:

# #use "tutorial.ml" ;;
val average : (float list -> float) Tsd.graph = <abstr>
val std_dev : (float list -> float) Tsd.graph = <abstr>
val gauss : (float -> float -> float) Tsd.graph = <abstr>
val cost : float Tsd.graph = <abstr>
val price : float Tsd.graph = <abstr>
val hs : float list Tsd.graph = <abstr>
val historical_sales : float list Tsd.graph = <abstr>
val average_hs : float Tsd.graph = <abstr>
val std_dev_hs : float Tsd.graph = <abstr>
val random_hs : float Tsd.graph = <abstr>
val profit : float Tsd.graph = <abstr>

And we are ready to experiment. First, let's use the peek function to inspect what are the computed values in various cells:

#  peek average_hs;;
- : float = 148.95
# peek std_dev_hs;;
- : float = 25.443024584353175
# peek random_hs;;
- : float = 137.839327531968763
# peek profit;;
- : float = 551.357310127875

Now we can recalculate the randomly generated data using the command step, which recomputes the data-flow graph (DFG):

# step ();;
- : bool = true
# peek average_hs;;
- : float = 148.95
# peek std_dev_hs;;
- : float = 25.443024584353175
# peek random_hs;;
- : float = 154.58015036351415
# peek profit;;
- : float = 618.3206014540566

The function step returns true, which means some data was re-computed. This is because the random function call, buried inside the gauss cell generated a new value whenever the DFG is recomputed. As you would hope, the average and standard deviation stay the same, but the random sales and the profit are recomputed.

We can also change some cells and recompute:

# set price 120.;;
- : unit = ()
# step ();;
- : bool = true
# peek profit;;
- : float = 20345.062816955171

Note the function set which raised the Price cell to 120, leading to a much higher Profit value.


Monte Carlo simulation



This is where our idiom diverges from Excel's. We can take advantage of two useful features that Excel does not have: storing lists in cells and creating circular dependencies between cells:

let profit_series = let s = cell [%dfg []] in 
                    link s [%dfg hd profit s]; s

We first create a cell s which is initialised to the empty list, written as '[]' in OCaml. We then create a dependency by linking cell s to the DFG hd profit s. Which can be read as "store in s the list obtained from putting profit at the head of s". This is not quite the same as imperative assignment, because the command link does not simply change s. What it does it creates a dependency between it and an arbitrary DFG.

Now whenever we run our DFG a new value will be prepended to profit_series:

# peek profit_series ;;
- : float list = []
# step();;
- : bool = true
# peek profit_series ;;
- : float list = [363.304693159913768]
# step();;
- : bool = true
# peek profit_series ;;
- : float list = [206.299286780092984; 363.304693159913768]
# step();;
- : bool = true
# peek profit_series ;;
- : float list =

[335.831250389538866; 206.299286780092984; 363.304693159913768]

Finally, we can put it all together using a function that runs n experiments and presents the results as a histogram:

let experiment n =
  assign profit_series [];
  for i = 1 to n do
    step ()
  done;
  profit_series |> peek |> histogram 100. 10.

The assign command instantly changes the value of a cell but not its dependencies, resetting the experiment. Finally histogram x0 n is a function that presents the results as a histogram starting at x0 in buckets of size n. The x |> f syntax in OCaml is a pipeline-style operator for function application, the same as f(x)

The result is:

# experiment 1000;;
- : int list =
[0; 0; 0; 0; 0; 0; 1; 3; 3; 11; 12; 18; 36; 30; 38; 44; 51; 69; 83; 83; 93; 60; 55; 71; 49; 47; 34; 22; 18; 13; 12; 3; 1; 0; 0; 2; 1]

Conclusions


So what are the nice features of Excel we added to OCaml?

The first one is the use of a modal programming style. TSD has two modes: constructing the DFG and running the DFG. The second one is related to the "running" mode: it is transparent. Changes made to some cells are automatically propagated along the DFG.

There are other programming languages that use this idiom, such as TensorFlow or Zélus. TSD is somewhere on that spectrum. It is (a lot more) functional and safe than TensorFlow, but more imperative and less stream-oriented than Zélus. Another OCaml extension which is also somewhat similar is Incremental, with the caveat that it does not allow cycles in the DFG.


Wednesday, 4 September 2019

Discovering Category Theory with primary school children


A few years ago I ran a maths circle for primary school children experimenting with teaching category theory. The meetings were documented in a series of blog posts which were recently published as a series of articles by Mathematics Teaching, the journal of the Association of Teachers of Mathematics (issues 264-268).

I am posting a copy of the articles (PDF) with the permission of the ATM: 

[Link to PDF]

Tuesday, 16 July 2019

This is not a Turing Machine

The Wikipedia page on Turing Machines shows the following picture as an implementation of a Turing Machine:


I didn't see any problem with that picture until I got into an intricate discussion with my colleague and friend Paul Levy about 'realistic' models for costs of computation. One of the basic premises for such a model is that individual atomic operations have fixed and constant costs in time, space, energy, etc.

This is why the machine above cannot be a proper TM! As the machine is supplied with more and more tape, the spools will grow larger and larger, and their mass will make the (fixed) motor move the tape gradually slower and slower. The cost of computation (time and energy) in the early stages of computation would be lower than in the later stages of computation when the spools of tape are more massive.

This is why the common descriptions of TMs, (see for example the one in the SEP) present the tape fixed, more like a track, with the head gliding left and right along it. With this design it no longer matters how long the tape is, since it is the fixed-size head which is moving.

It is amusing to notice that Turing himself seems to have made this mistake, implying that the head of what he called an a-machine is fixed and the tape is moving:


I have little doubt that in the current reviewing climate such an oversight would have earned his "On Computable Numbers, with an Application to the Entscheidungsproblem" a strong reject.

Joking aside, I think such details are important when it comes to analysing cost of computation. For example, the moving-head-style TM makes it much more complicated to reduce a two-tape TM to a one-tape TM. With mobile tapes the task is trivial -- but mobile tape TMs are not right. With mobile head the task is more subtle since the communication overhead between the two heads and wherever the machine state is stored is no longer fixed! And this problem does not seem very easy to solve, at least not in a straightforward or clean way.

Before I conclude this pedantic rant, this is not the only thing about computability theory that puzzles me. Ever since I was un undergrad learning about the asymptotic complexity of sorting algorithms I couldn't understand how an array element access can be considered to be performable in constant time. To do that we need to read the index in constant time. But how can you read an arbitrarily large natural number, in whatever representation, in constant time? I still can't pretend I understand this.

Tuesday, 2 July 2019

A note on homogeneous integration

This is a follow-on to my earlier post on automatic application modernisation

In application modernisation the goal is to reduce a large monolithic application to a modular structure which can be deployed in containers and managed separately, e.g. with Kubernetes. The conventional approach relies on technologies such as web APIs or microservices to integrate the components. In the blog post I showed how laborious this process is, and proposed an automatic approach (research in collaboration with Queen Mary University of London). 

In this post I want to enumerate other of the advantages of the automatic approach, noting that application modernisation is not the only instance when integration is an issue. Another situation when a number of disparate application need to be integrated into a coherent whole: Internet of Things (IoT). I am particularly thinking "smart manufacturing". Of course in application modernisation we start with a homogeneous code base, so a language-specific methodology is possible. In IoT this is not always possible. However, JVMs are quite commonly available on "smart devices" so a JVM-specific approach should be often possible. Even if only some devices are JVM-enabled, integrating them should be much easier using the automatic approach rather than conventional (e.g. microservices) techniques. 

But integration is integration, whether we talk about containerised applications or about communicating smart devices. The layered approach which involves creating, sending, receiving, and decoding messages is neatly subsumed by method calls. Method calls are not only more concise, as they represent a higher level of integration, but are also more secure because of language syntax and type checking. 

In fact let us go through all the criticisms of microservice-based integration and see how mokapot solves that problem:

  • Information barriers: The automatic approach is seamless, so no barriers appear
  • Testing and development: With automatic integration, the behaviour of a multi-device integrated application is consistent to running the entire application on a single node (e.g. an integration server). So development and testing can be carried out on a single node, as if the application was monolithic.
  • Centralised logging and metrics: It is generally desirable that operational data is collected centrally rather than locally on devices. Mokapot has this facility built in. This is used to enable the next bullet point. 
  • Moving responsibilities between services: Deciding what functionality is assigned to what service is a major upfront commitment, which is almost irreversible. The automatic approach doesn't just make this a non issue, but we provide an automatic tool (filtr) which can suggest optimal deployment based on collected runtime data (profiling). 
  • Development and support of many services is more challenging if they are built with different tools and technologies: A homogeneous, automatic, language-based approach makes this again a non-issue.
  • Resilience and fault tolerance: The communication protocols used by Mokapot is flexible, robust, and configurable. Unrecoverable errors are propagated as runtime exceptions, that can be handled programmatically by other components or by an integration server.
  • Packaging and deployment: With Mokapot these steps are greatly simplified. In particular the deployment of software updates is much more secure, as inadvertent interface changes would lead directly to compilation errors, rather than costlier runtime errors.  

Understanding the issue of equality in Homotopy Type Theory (HoTT) is easier if you are a programmer

We programmers know something that mathematicians don't really appreciate: equality is a tricky concept. Lets illustrate this with a str...