I will give an answer here. This is

*an answer*rather than

*the answer*because whether you accept it or not is contingent on whether you agree or not with what I will identify as appealing when it comes to Excel.

I will start with a concrete example, and I will conclude with some brief general remarks. The concrete example is running a simple (but not trivial) simulation in Excel, then showing the same simulation in TSD, our new OCaml language extension.

For the Excel simulation I used an existing (and popular) tutorial from SpreadsheetWEB. It is quite important that you at least browse this tutorial before moving on to the rest of this post.

The idea is to construct a "profit estimator" based on historical sales data, then use a Monte Carlo simulation to predict a range of possible future profits. This latter part is the most non-trivial part, relying on a feature called

*Data Table*, part of the

*What-if Analysis*toolset of Excel. As far as I can tell these features are not available in

*Office Libre*or Mac OS's

*Numbers*. I found it rather hacky, but it gets the job done. Another way of running Monte Carlo simulations is using VBA, as seen is this tutorial.

The simulation is conceptually organised as follows:

We have cells containing Cost and Price data, along with Historical Sales, which are a list of numbers. From the Historical Sales we compute Average and Standard Deviation, which are used to generate a Random Sale, using a Gaussian distribution with said average and standard deviation. Together with cost and price this will compute a Profit, as a random variable. Running this experiment 1,000 times we process the data and display it as a histogram.

## Creating a model

In TSD we create

*cells*storing the Cost, Price, and Historical sales as follows:
let cost = cell [%dfg 8.]

let price = cell [%dfg 12.]

let hs = lift [122.; 138.; 163.; 161.; 139.; ...

let historical_sales = cell [%dfg hs]

*let*keyword introduces a variable declaration. The

*cell*function creates a cell (think an Excel cell, but identified by name rather than coordinates). The

*%dfg*tag indicates a "data-flow graph", i.e. the data stored in cells. Unlike the native data of the host programming language, data-flow consists not only of values but also of

*dependencies*on other cells. Finally, the

*lift*function creates a data-flow out of ordinary data -- in this case a list.

Up to this point the key difference between TSD and Excel is that cells are identified by names, and that any data type can be stored in a cell, including lists. This is different than Excel, where a list must be stored in a

*range*of cells.

Next, we set up dependencies between cells:

let average_hs = [%dfg average historical_sales]

let std_dev_hs = [%dfg std_dev historical_sales]

let random_hs = cell [%dfg gauss average_hs std_dev_hs]

let profit = [%dfg random_hs * (price - cost)]

The random sale based on the Gaussian distribution with historical average and standard deviation is placed in a cell, but the average and the standard deviation themselves do not need to be placed in cells. They establish data-dependencies between the cell holding the historical data and the randomly generated sale, but do not require storage in a separate cell, although we could have. The functions

*average, std_dev,*and

*gauss*have standard implementations (not shown).

## Using the model

We now have the

*model*above, except for the Profit Histogram cell, which we will implement later. And we can

*experiment*with it, Excel-style. Except that instead of the Excel GUI we use the OCaml interactive top-level environment (REPL). To load the TSD code and the PPX extension:

OCaml version 4.07.1

Findlib has been successfully loaded. Additional directives:

#require "package";; to load a package

#list;; to list the available packages

#camlp4o;; to load camlp4 (standard syntax)

#camlp4r;; to load camlp4 (revised syntax)

#predicates "p,q,...";; to set these predicates

Topfind.reset();; to force that packages will be reloaded

#thread;; to enable threads

# #ppx "tsd_ext";;

# #require "tsd";;

/Users/danghica/.opam/default/lib/tsd: added to search path

/Users/danghica/.opam/default/lib/tsd/tsd.cma: loaded

# open Tsd;;

After this we load the model:

# #use "tutorial.ml" ;;

val average : (float list -> float) Tsd.graph = <abstr>

val std_dev : (float list -> float) Tsd.graph = <abstr>

val gauss : (float -> float -> float) Tsd.graph = <abstr>

val cost : float Tsd.graph = <abstr>

val price : float Tsd.graph = <abstr>

val hs : float list Tsd.graph = <abstr>

val historical_sales : float list Tsd.graph = <abstr>

val average_hs : float Tsd.graph = <abstr>

val std_dev_hs : float Tsd.graph = <abstr>

val random_hs : float Tsd.graph = <abstr>

val profit : float Tsd.graph = <abstr>

And we are ready to experiment. First, let's use the

*peek*function to inspect what are the computed values in various cells:

# peek average_hs;;

- : float = 148.95

# peek std_dev_hs;;

- : float = 25.443024584353175

# peek random_hs;;

- : float = 137.839327531968763

# peek profit;;

- : float = 551.357310127875

Now we can recalculate the randomly generated data using the command

*step*, which recomputes the data-flow graph (DFG):

# step ();;

- : bool = true

# peek average_hs;;

- : float = 148.95

# peek std_dev_hs;;

- : float = 25.443024584353175

# peek random_hs;;

- : float = 154.58015036351415

# peek profit;;

- : float = 618.3206014540566

The function

*step*returns

*true,*which means some data was re-computed. This is because the

*random*function call, buried inside the

*gauss*cell generated a new value whenever the DFG is recomputed. As you would hope, the average and standard deviation stay the same, but the random sales and the profit are recomputed.

We can also change some cells and recompute:

# set price 120.;;

- : unit = ()

# step ();;

- : bool = true

# peek profit;;

- : float = 20345.062816955171

Note the function

*set*which raised the Price cell to 120, leading to a much higher Profit value.

## Monte Carlo simulation

This is where our idiom diverges from Excel's. We can take advantage of two useful features that Excel does not have: storing lists in cells and creating circular dependencies between cells:

let profit_series = let s = cell [%dfg []] in

link s [%dfg hd profit s]; s

*s*which is initialised to the empty list, written as '[]' in OCaml. We then create a

*dependency*by

*linking*cell

*s*to the DFG

*hd profit s.*Which can be read as "

*store in*s

*the list obtained from putting*profit

*at the head of*s". This is not quite the same as imperative assignment, because the command

*link*does not simply change

*s.*What it does it creates a

*dependency*between it and an arbitrary DFG.

Now whenever we run our DFG a new value will be prepended to

*profit_series:*

*# peek profit_series ;;*

- : float list = []

# step();;

- : bool = true

# peek profit_series ;;

- : float list = [363.304693159913768]

# step();;

- : bool = true

# peek profit_series ;;

- : float list = [206.299286780092984; 363.304693159913768]

# step();;

- : bool = true

# peek profit_series ;;

- : float list =

[335.831250389538866; 206.299286780092984; 363.304693159913768]

Finally, we can put it all together using a function that runs

*n*experiments and presents the results as a histogram:

let experiment n =

assign profit_series [];

for i = 1 to n do

step ()

done;

profit_series |> peek |> histogram 100. 10.

The

*assign*command instantly changes the value of a cell but not its dependencies, resetting the experiment. Finally

*histogram x0 n*is a function that presents the results as a histogram starting at

*x0*in buckets of size

*n.*The x |> f syntax in OCaml is a pipeline-style operator for function application, the same as f(x).

*The result is:*

# experiment 1000;;

- : int list =

[0; 0; 0; 0; 0; 0; 1; 3; 3; 11; 12; 18; 36; 30; 38; 44; 51; 69; 83; 83; 93; 60; 55; 71; 49; 47; 34; 22; 18; 13; 12; 3; 1; 0; 0; 2; 1]

## Conclusions

So what are the nice features of Excel we added to OCaml?

The first one is the use of a

*modal*programming style. TSD has two modes:

*constructing*the DFG and

*running*the DFG. The second one is related to the "running" mode: it is transparent. Changes made to some cells are automatically propagated along the DFG.

There are other programming languages that use this idiom, such as TensorFlow or Zélus. TSD is somewhere on that spectrum. It is (a lot more) functional and safe than TensorFlow, but more imperative and less stream-oriented than Zélus. Another OCaml extension which is also somewhat similar is Incremental, with the caveat that it does not allow cycles in the DFG.

## No comments:

## Post a Comment