Monday, 26 September 2016

What else are we getting wrong?

I like reading history of science and, as a working academic, I am amused by theories that used to be widely accepted yet false, such as the miasmatic theory of disease or the phlogiston theory of combustion. The examples are numerous enough to have their own Wikipedia Category. It makes me wonder which of the current orthodoxies are as spectacularly misguided (this is my personal bet). But the most challenging exercise is to think about my own discipline, the theory of programming languages, and think if any of the (sometimes unstated) dogmas could be utterly wrong.

I think asking ourselves this question is a useful exercise in critical thinking even though it sounds like trolling. First of all, it is justified, if we consider the historical trajectory of mainstream programming languages in terms of improving programmer productivity. I don't have a precise definition of what programmer productivity means, but I certainly don't mean KLOC/week or something silly like that. Lets just define it informally as the ability to complete a task up to a reasonable level of correctness, where the precise interpretation of "task" and "reasonable" and "correctness" is left to the reader. 

Programming started with raw codes introduced directly into the working memory of a computer via switches, relays and buttons. Moving on to reading the code off of cards must have brought about a tremendous leap forward in productivity. Further progress from codes to assembly languages with symbolic labels must have been at least as productivity-enhancing. The next step, to FORTRAN, also must have boosted programmer productivity tremendously. The syntax was nicer -- especially the ability to write expressions and functions -- but the killer feature must have been code portability, given by machine independence. This covers progress in the 1940s and 1950s, the early days of programming. Programming languages obviously evolved greatly since. But did any new programming language ideas lead to any improvements in programmer productivity comparable to assembly or FORTRAN? Is any current language 100 times better than FORTRAN for programmer productivity, just as FORTRAN was compared to machine-code programming?  

Probably not. If a civil engineer wants to build a bridge there will be some choices. They can choose from a simple beam bridge to an extremely complex suspension bridge. They will roughly know what the choice involves in terms of cost, strength, maximum span, construction time and so on. Similarly, or even more so, in programming languages the engineer will face a broad array of languages and frameworks for building any project. Choices will need to be made. Unlike the case of bridges though, the choice will be informed by hardly any quantitative factors, but mostly by taste, ideology and other social or psychological contingencies. When should one choose, for example, functional over object-oriented programming? The question sounds like flame bait but it is a very reasonable one. First year students will ask it all the time until we educate them that such questions are impolite. It is like asking what football team, rock group or religion are better. It is dogma, and it is widely known and accepted to be dogma.
Some aspects of programming languages are scientifically solid. Compilers do get measurably better, for a given language, and the measures of improvement are quantitative: faster code, less memory, faster compile times. Software verification and testing can also get measurably better: more kinds of programs can be automatically verified, faster verification, fewer false positives and negatives. But programming language design is an issue that is rarely dealt with in a similarly scientific and quantitative fashion.

There are scientific questions we can ask about a programming language design or feature. The first one, and this is a question usually asked reasonably well, is whether it can be compiled efficiently. This question is answered constructively by building a compiler and comparing benchmarks with other languages and compilers against various measures as I mention above. The second one, and this is a question rarely asked well, is whether it improves some measure of programmer productivity. This question is of course asked a lot, but not scientifically. Because this important question is not asked scientifically, researchers do what all people do -- use whatever available heuristics are at hand. They use a community-driven sense of "what is right", of "what makes sense", of "what is elegant" to determine what a "good" versus "bad" language is. But to anyone familiar with the history of science, this is a treacherous path. This is the way false dogmas are generated: ideas that can be beautiful yet wrong. They are not necessarily wrong, but they could be. We just don't know. We do no science on them, just speculation and debate.

Is there any such prevalent orthodoxy in (academic) programing language design, seductive but not scientifically validated? The drive towards ever more sophisticated type systems seems to me potentially so. The Curry-Howard-Lambek correspondence is a beautiful paradigm which brings together aspects of language, logic, mathematics, and even physics and systems theory. Phil Wadler has a beautiful talk on the topic [video] and Baez and Stoy's Rosetta Stone paper is equally fascinating [pdf]. My own research has been heavily influenced by these ideas, which I embrace. They are persuasive. I am a fan. But I know of no scientific studies actually backing up the (sometimes implicit) claim that this types-first methodology leads to better languages, from the point of view programmer productivity. It might -- we don't know though. 

The ideology of sophisticated type systems is not merely academic, it does have a growing impact on language design. There are language such as Agda and Idris, which are embodiments of type theory at its most sophisticated. The more popular Haskell also has a type system which may surprise you with its amazing powers. This wonderful talk by Stephanie Weirich is a great introduction [video]. Finally, Rust is another new language, this time lets say "production-oriented", with a tremendously sophisticated type system. I instinctively agree with the ideology of types, and I instinctively like Agda, Idris, Haskell, Rust, and all their cousins and friends. But we should worry a little bit about our relying on aesthetics with no evidence in support.

We should worry, for instance, when Python, a language that breaks all academic language design tenets, is one of the most popular languages in the world. We should worry and we should try to understand what is going on with programming languages not just mathematically but also from a human point of view, psychologically. A couple of years ago I had breakfast with Guido van Rossum -- he almost left the table when I told him I was an academic PL researcher -- and I was quite impressed with the way he dismissed all mathematical aspects of the study of programming languages but was almost exclusively focussed on "soft" aspects of usability such as concrete syntax. I was not convinced by his arguments, but I found them intriguing.
Evidence-based usability research in programming languages is, sadly, a fringe activity. I could find no papers in our main conferences (POPL, PLDI, ICFP, etc) on this topic. There is a small community of usability researchers in programming languages (Psychology of Programming Interest Group) and some of their work is promising. But I think much more evidence-based research is needed in PL design, and this work should be much more encouraged and promoted by our community. Because programming languages are not the abstract Platonic objects of mathematics, are not even edifices bringing together form and function in a way that needs to be aesthetically pleasing. They are tools which we use to accomplish tasks. The programming language is the hammer and the plow of today's economy. They need not be beautiful, unless beauty itself is part of their function. They just need to get the job done.

Yaron Minsky, a man of tremendous accomplishments whom I hugely respect seems to think that such empirical validation is impossible [blog]. He says "there is no pile of sophomores high enough to prove anything", in reference to (the very few) academic studies of programming language usability, which tend to focus on students as subjects. I think this opinion is broadly shared in our community, and this makes me quite unhappy. I don't think we should dismiss science, facts, evidence, experiments and rely solely on aesthetics instead. Science worked out pretty well for humanity for the last thousand years or so, whereas aesthetics-driven progress has been often questionable. Giving up on science is simply giving up. 

16 comments:

  1. Have you seen this work? https://lmeyerov.github.io/projects/socioplt/paper0413.pdf

    It's very difficult to do the work you suggest for many reasons, some due to the work and some due to the community. For one, the work is hard. Designing good experiments with human subjects is very hard. Measuring and interpreting the data is hard. In addition to these activities being objectively hard, they are also activities that theoretical computer scientists are not trained in at all.

    This is partly why the community (academic PL) doesn't encourage and somewhat discourages this kind of work: they aren't in a good position to evaluate it or interpret it. PL theory is *theory*, the community wants to see logical frameworks, semantics, and proofs, not data models, logistic regressions, and p-values.

    Worse still, what happens when, as someone that works in PL theory, you discover an experiment that says that the way you build languages isn't what programmers want to work with? What do you do with that information?

    ReplyDelete
    Replies
    1. You are completely right. The only thing I can say is that it's early days for PL research, historically speaking, and hopefully some ideas we cannot imagine now will come along. It happens all the time in science.

      Delete
  2. Have you dived into the history of things like architecture? Was there a time when such disciplines resembled the current state of programming, and what happened to get out of that?

    ReplyDelete
    Replies
    1. I think architecture is different, because it is driven by aesthetics. Perhaps the history or ergonomics or industrial design would be more relevant?

      Delete
  3. When I started programming 4th generation languages where just coming out. The idea is that productivity is constant in lines of code. So if you had a language that could do a lot in a few lines of code it would be more productive. Squirrel. We got distracted by C++, object programming and the internet.

    ReplyDelete
  4. "Is any current language 100 times better than FORTRAN for programmer productivity" – Yes! Rust, modern C++, even Objective-C and – when performance is not critical – the nicer scripting languages (Python, Ruby, etc.) are tremendously better than Fortran. Fortran was basically assembly with – whoo hooo! – infix operators. Being able to define one's own data types, access to sophisticated compile-time computations, dynamism and reflection are vital and systems requiring any sort of scalability would suffer seriously of the lack thereof. You can write numeric code in Fortran faster than you could in assembly, but… woe betide you if you try building a database engine, a graphical user interface or a compiler in Fortran.

    ReplyDelete
  5. Y-combinator comments here: https://news.ycombinator.com/item?id=12831452

    ReplyDelete
  6. Some of this was tackled in "No Silver Bullet" by Fred Brooks. Brooks put early high level languages at about 10x assembly code rather than 100x, and forecast (in the late 80s) that the next 10 years would not bring a comparable improvement.

    He was right, in the strict sense that by 2000 the languages in common use did not show such an improvement. However I do think that functional languages, and especially Haskell, are now 10x more productive than Fortran and Algol were. There was not a single step that led to this point; rather this is the result of incremental improvement. The biggest single step, IMO, was garbage collection. Of course GC has existed since the 60s, but Java was the first mainstream language to incorporate it.

    Incidentally, another reason that experiments in software engineering are hard is that conclusions drawn from an experiment that takes 1 programmer 10 hours do not generalise to a project that takes 100 programmers 5 years. Different language issues dominate at different scales, and we understand these scaling issues poorly. Perhaps that is where we need to direct our attention.

    ReplyDelete
  7. I was silly enough to try and find useful research on the topic some time ago. I wasted too much time, and got almost nothing out. There is one paper with something of an interesting preliminary result (the old errors/LOC is constant and time/LOC constant, Prechelt L 2000 An Empirical Comparison of Seven Programming Languages. Computer 33(10)):

    http://ieeexplore.ieee.org/document/876288/?arnumber=876288&tag=1

    This paper is cited, but I could not find a single good paper that tried to reproduce the result or build upon it more rigorously.

    Then you have the concept of "Affordance", as described here:
    https://en.wikipedia.org/wiki/Affordance

    The affordances of existing programming languages would be a great field of experimental research, my gut feeling tells me. For some reason no one does it (to my knowledge).

    Somewhat connected is this paper about natural language metaphors: Thibodeau PH and Boroditsky L, Natural Language Metaphors Covertly Influence Reasoning, PLoS ONE 2013 8(1)

    http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0052961

    It is about natural language, but the methods it employs are definitely applicable to programming languages, especially in the context of affordance.

    Altogether, reading what you wrote, I am getting a bit optimistic that there are, after all, people who are starting to ask the interesting questions. It is only a matter of time until someone starts to put more effort in answering them.

    ReplyDelete
  8. Martin Escardo sent me this private comment which I quite liked (even if I don't quite agree):

    I think that Computer Science (as opposed to Computer Engineering) is more like Mathematics than like Physics or Chemistry. One example in mathematics of oxygen-versus-phlogiston was category theory vs Bourbaki's theory of structures. Eventually category theory "won" in the sense of giving a better understanding of structures with more applications. In the case of computer science, I would say that one thing we are trying to understand (and making very good progress as a communty) is type systems for both programming languages and proof assistants. My bet is that dependent type theories and the treatment of equality via identity types as in univalent foundations will be like oxygen. Of course, history may prove in the future that I was actually working with phlogiston in a pointless way. But currently I don't think so (otherwise I wouldn't be working on that). Also, what I say above in my previous two comments does not apply to programming languages from the point of view of Computer Engineering. This needs considerations which often are not of intellectual interest, but that are very important from a practical point of view. Luckily, in Computing we have Scientists, Engineers, and a few people with both inclinations too.

    ReplyDelete
  9. Also, good discussions on Reddit: https://www.reddit.com/r/programming/comments/5abb0m/what_if_anything_are_we_getting_wrong_about/

    ReplyDelete
  10. I have the feeling that your comments section swallowed my comment when I pressed "Preview".

    ReplyDelete
  11. I wrote an article recently that was inspired by similar thinking: http://tomasp.net/blog/2016/thinking-unthinkable/

    The overall theme is quite similar, though I expect we'll disagree on what it is that we are "not getting right" - I think that possibly the biggest assumption that we make and never question is how effective mathematics can be for understanding programming. It might be effective - but it is surprising how hard to question the assumption. It is so deeply ingrained in our thinking that it is virtually impossible to try not to follow it (even if just to see what would happen!)

    The Python case is a good example - perhaps we should not just try to understand why Python became popular, but also what is it about programming language research that makes us say that "Python breaks all academic language design tenets". Are our tenets somehow irrelevant for talking about one of the worlds most popular languages (and if so, could programming language research look differently?)

    ReplyDelete
  12. I find strong type-based languages are going to be "the future". They pave the way for "automatic code completion", being able to find any function you need in a massive database. Hackage sort of allows this already, but we need more people writing Haskell code for it to really flourish.

    Mathematics is a base of rules which we base everything else off of. Why can't we build programming languages the same way? I think using Math as a model for "abstract language" is great because it's proven to work well.

    Great article Dan. This is the kind of thinking we need in our field. I'm going to go read some of your other stuff now...!

    ReplyDelete
  13. This article generated more comments and discussions both here and on Reddit and Hacker News than I can reply to but I have read all the comments and I have learned a lot of interesting stuff.

    ReplyDelete
  14. 1. The fact that a good, controlled experiment is hard or even impossible to do, does not imply that there is nothing crucial missing: the link between language features and their actual benefits, aka a value function. That an incontrovertible empirical result is impossible does not mean that the value function can be willed into existence. Admitting that such a gaping hole exists would hopefully encourage people to get some data. If not a controlled experiment, then at least a few well-reported anecdotes. If nothing else, such an admission would teach humility.

    2. While both controlled experiments (on sophomores) and uncontrolled real-world data gathering are both severely flawed in their own ways, I think that when it comes to software, the latter is to be preferred. In the age of open-source software, there’s plenty of raw data available. There is evidence that this data can be extremely valuable, like this beautiful empirical result that ties catastrophic failure in distributed systems to local programming patterns: https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-yuan.pdf

    3. A good way to get started is to collect data that could be used for many studies, and build a standard taxonomy. Nothing would be perfect, but it will be much better than what we have now. There are two obvious taxonomies to create: a taxonomy of programs — classified by domain, size, team size, cost, kind (transformational/interactive/reactive) etc. — and a taxonomy of bugs — classified by program type, cause, cost etc.. Data could be collected from open-source projects and from companies that would volunteer to contribute.

    ReplyDelete

Understanding the issue of equality in Homotopy Type Theory (HoTT) is easier if you are a programmer

We programmers know something that mathematicians don't really appreciate: equality is a tricky concept. Lets illustrate this with a str...