Friday, 28 June 2019

An Automatic Application Modernisation Toolkit for the JVM

Many large corporations have legacy enterprise applications which are web-enabled, sometimes migrated to the cloud, which are monolithic giants. To fully take advantage of what the cloud can offer, these applications need to be modernised, i.e. broken up into smaller apps which can be then containerised and managed using Kubernetes or related technologies.

There are a variety of technologies for modernising legacy apps, but they all share one feature: they are laborious. Even a monolithic application is only run-time monolithic. The code itself usually is not, but is divided into Classes (in the case of the JVM) or other language-specific logical and compilation units (modules, libraries, etc). The way source code is most commonly glued together in one whole app is via method calls and via shared state. Shared state does not mean global variables, which are best avoided, but also passing objects as method arguments. In Java this means that only a reference to the object is passed, which is aliased to the original reference.

When the application is modernised the method-call composition mechanism is no longer available. The various runtime components no longer communicate by calling each other, but they need to send each other messages via a (web) API or microservices. Moreover, they can no longer share state: objects need to be serialised, sent wholesale, then de-serialised. But aliasing no longer works, because mutations do not propagate across the component boundaries. This means that the application often requires deep re-engineering to replace method calls with message passing and to eliminate all state sharing.

What our research group (Alex Smith and me at the University of Birmingham, Thomas David Cuvillier and Nikos Tzevelekos at Queen Mary University of London) has developped over the last few years is an automatic modernisation toolkit which includes:
  • a runtime library called mokapot which creates a unified state and control space across any number of JVM containers; 
  • a static analysis tool called millr which automatically instruments the bytecode of a JVM app (if needed) to ensure compatibility with mokapot;
  • a profiler and optimiser called filtr which, relying on mokapot, splits a JVM-based app into components in a way that minimises communication overheads (serialisation, deserialisation and network costs). 
Let us illustrate automatic application modernisation with a very simple application, a reminders server. The server is just a priority queue storing events. There are trivial wrapper methods to submit and extract events.
The client is similarly straightforward, submitting some events then retrieving them and printing them to the console:

Let us now see how we modernise this application so that the "client" and the "server" can be placed in two distinct containers. 

First we look at the mokapot-based approach, which is automatic. What we want to do is to instruct the client to create the server on a different node. The software server is now a different process, possibly on a different (physical or virtual) server. Server creation (line 4 above, with the call to ReadyReminderServer constructor) will become instead:

Note that the call to the constructor now is passed as an argument to the method runRemotely of a communicator object. This method will create the object at the stipulated serverAddress but otherwise treat it exactly as if it were a local object. The client code is otherwise unchanged, except for code to create the communicator, start it up, and retrieve its address (in the prelude) and shutting it down (in the postlude):
No change at all is required to the server code. The whole modernisation process takes a few minutes and consists of the addition of a small number (half-dozen or so) of LLOCs.

Let us now consider a more conventional modernisation approach using web APIs. First of all we note that there is no single approach, but a dizzying array of competing technologies and frameworks. Let us pick something that seems commensurate to the simplicity of the example, using directly the HttpURLConnection Java API. The client can be reimplemented by wrapping-up each call in a HTTP request method, so that the new version of the client looks close enough to the original:


What we add is a constant (also half-dozen or so) LLOCs in the prelude of the file. However, the wrappers themselves need to be implemented. Even something as simple as a pair of strings needs to be formatted properly to create a correct URL, create the connection, send the request, and make sure that the request was correctly processed by the server. This [external link] shows a minimalistic implementation of such functionality on top of HTTP. This is already 100+ LLOCs, so too long to inline. One could ask whether this is the most economical way, but the answer is not a simple one. Any framework would require a certain amount of configuring and setting up, not to mention training and learning. (We are happy to record better, alternative web API-based solutions if any.)

On the server side things are more complicated. Whereas in the old code the client simply calls methods on the server, in the "modernised" code the client calls have become HTTP requests which need to be handled very differently. The original ReadyReminderServer class needs to be wrapped in code which handles the requests. The code is given below:

Note that this entire class serves as a wrapper around the original server class.

In addition, we still need the code that processes the HTTP request, parsing it into key/value pairs. This code is algorithmically straightforward but it is over 300 LOCs (including comments) and can be seen following this [external link].

So here you have it. The web API approach to modernising this simple app will take a couple of solid days of full time work from a talented developer. In addition, the web API approach will lead you into the standard hazards of stepping outside the language. The mokapot approach is not only automatic but also stays within language, so many trivial mistakes (e.g. typos) will be caught up by the compiler. In contrast, all the encodings to-and-from text that the web API requires may lead to difficult to debug runtime errors.

The one advantage of the conventional approach is that it results in an "open" web API which can be accessed from any language or from command line (e.g. via curl), if that is desirable. In contrast the mokapot communicators are reserved just for its own use.

The example we chose is actually not so hard to modernise using web APIs, just laborious and error-prone. But imagine a situation when the messages contain pointers to shared structures. Because through a web API we cannot normally send object pointers the refactoring would be significantly harder, going beyond wrappers of the original code. If those shared structures are mutable then the problem of modernisation becomes almost intractable. The mokapot framework not only handles access to shared data transparently and efficiently (by eliminating the need to replicate it) but it also preserves all single-JVM behaviour, including garbage collection of useless objects.

Let us illustrate one such situation now, which is trivial to modernise via mokapot because of automatisation, but becomes virtually intractable via web APIs. Suppose that we want to send a reminder from the server to the client when an event is ready. In the monolithic application we do it by associating a call-back function with the event (lines 7-8 and 10-11):

Because mokapot creates a virtual memory and control space across all nodes, the same instrumentation as before


enabled by the same boilerplate as before will allow the code to just work across node boundaries:


No changes needed to the server.

In contrast, modernising this version of the application, which uses callbacks for reminders, using web APIs is a significant challenge, which can no longer follow the standard recipe. The application needs to be deeply reengineered and recoded. This is because callback functions contain code, which makes it impossible for them to be serialised as pure data, and sent over the net. The automatic approach works because the memory space is virtualised and includes both nodes, making any kind of method call possible in a transparent way. So we leave this as an exercise to the reader. 🙃

Some final notes and comments of additional benefits of the automatic approach:

  • If the intention is to expose some of the functionality of an app component as a web API, this is compatible with the automatic approach. Nothing prevents a mixed automatic and manual (conventional, API-based) approach to modernisation.
  • The automatic approach is non-commital. It will not lock you in any particular technology, including mokapot itself, since the re-engineering and re-factoring effort is minimal. It will also not lock you into any particular architecture, since the decisions of how a monolithic application is split into component require little re-engineering and re-factoring, compared to the manual approach. 
  • The built-in serialisation and de-serialisation of objects is very fast in mokapot, surpassing standard Json libraries. 
  • The mokapot communicators which handle remote objects can be used in much more sophisticated ways than the boilerplate-style indicated in the example, making the node-selection algorithmic (e.g. for load-balancing), all seamlessly integrated into the same virtual memory and control space of the application. 
  • Besides what we already mentioned, mokapot performs other advanced operations for managing the virtual memory space, for example automatically migrating objects at runtime between nodes in order to maintain a high level of performance. 
  • The software works both on server and mobile (Android) JVMs. On mobile it handles connections robustly, for example allowing the mobile component to switch seamlessly between WiFi and the mobile (cell) network without service disruption. 
  • Error management (node failure, permanent connection failure) is managed programmatically via special mokapot exceptions, which can be handled programmatically, for example to migrate objects between nodes at runtime. 
  • The mokapot crypto is two-factor. The code shows a hard-coded password, but a P12 file is also required along it. 

Note: All code courtesy of Alex Smith. Comments and import statements have been removed for brevity.

3 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. It sounds rather like Java RMI, but with fewer constraints.

    JRMI has a bunch of restrictions about which methods can be proxied (https://docs.oracle.com/javase/8/docs/platform/rmi/spec/rmi-objmodel5.html), which means you could not apply it to an existing application without making potentially significant changes.

    On the other hand, without explicitly marking methods as able to be proxied, it might be difficult to predict the performance and failure modes of the application, since potentially any method call could be executed remotely and might also fail.

    (I posted this previously but was unable to edit a spelling mistake, so I deleted and reposted).

    ReplyDelete
    Replies
    1. Yes, you can think of this as a semantics-preserving RMI.

      The point about performance is a valid one, this is why we provide special instrumentation for performance monitoring. But transparency is your friend! We can provide both runtime migration and optimal static deployment (based on profiling date) to minimise remoting costs. These would not be possible without transparency.

      Regarding failure modes, these can be handled programmatically via exceptions.

      Delete

Understanding the issue of equality in Homotopy Type Theory (HoTT) is easier if you are a programmer

We programmers know something that mathematicians don't really appreciate: equality is a tricky concept. Lets illustrate this with a str...