Sunday, July 14, 2013

Multithreading is Hard

Over the last few years, I have repeatedly found myself working with complex, multithreaded Java applications that produce all sorts of bugs that are extremely difficult to identify, reproduce, and fix. And in doing so, I have learned a great deal about the Java memory model, the various synchronization primitives (e.g. locks, semaphores, atomic references), and useful concurrent abstractions (e.g. blocking queues, executor services). With enough knowledge and experience with multithreaded programming in Java, it is possible to design and implement complex systems that, after some time, you have modest confidence in. But not only does it take a very long time to get to that point (typically on the order of years for large projects), I would also guess that most programmers have less confidence in their programs than they wish they could. It is also not uncommon for mature projects to still be littered with minor race conditions and rare deadlocks that pop up over time when some poor engineer has to debug why their production system failed. To some degree we have just accepted that the systems we build and the libraries we write will inevitably suffer from bugs that we cannot preemptively identify and fix. I recently came across this technical report by Professor Lee from Berkeley that I feel very clearly identifies the problem, which is the concept of threads and how they are materialized in today's languages.

The main argument, which I have seen formulated in a number of ways, is that because the execution of a multithreaded program is nondeterministic, we cannot efficiently reason about correctness. In a single-threaded program, you only need to guarantee that the single, deterministic execution of the program produces the correct result; even then it can be difficult to convince yourself of correctness (since the space of possible inputs is often exponential in the size of the input), but people have become reasonably proficient at doing so. Once you introduce multithreading, the number of potential executions from interleaving operations is exponential in the size of the program, and correctness depends on all of those executions producing the correct result. So, without any structure to reduce the problem, it is exponentially harder to guarantee the correctness of a multithreaded program than a single-threaded program. The tools in Java that I mentioned are designed to "prune away [the] nondeterminism" introduced by multithreading, namely by reducing the space of potential executions and, specifically, preventing those that are known to be incorrect. But this is very difficult to do given that there are so many possible executions to reason about, not to mention the fact that only one such execution manifests itself each time the code is run. To quote Professor Lee, "A folk definition of insanity is to do the same thing over and over again and to expect the results to be different. By this definition, we in fact require that programmers of multithreaded systems be insane. Were they sane, they could not understand their programs."

At a high level, there are a number of different ways to get around the fundamental problem of nondeterminism in multithreaded programs. One is the use of immutable objects, which amounts to just never having mutable state. If there is no mutable state, then different interleavings will always read the same values and produce the same results. Functional programming emphasizes immutability and, in many cases, not having state altogether, which can make it a very natural paradigm for multithreaded programs (though it comes with its own set of concerns). Clojure is a good example of how the functional paradigm can present a distinct way of leveraging multiple processors without directly manipulating threads. Besides immutable objects, another option for managing multithreaded interactions is to assume all threads touch only local state and use a specific system for passing messages between threads. The actor model is an abstraction based on these principles which has been implemented in languages such as Erlang and Scala with some success. Unfortunately, these paradigms and abstractions have not seen the same popularity as threads, quite possibly because threads have the closest resemblance to sequential programming in popular languages. In the end, everything comes down to getting rid of shared, mutable state; if the state is not shared or is immutable, then the nondeterminism of executions becomes a non-issue, but as long as you are using shared, mutable state, multithreaded programming will be difficult.

Regarding the situation in Java, Professor Lee argues that "the mechanisms (such as semaphores) still require considerable sophistication to use, and very likely will still result in incomprehensible programs with subtle lurking bugs" and that "software engineering process improvements alone will not do the job." We can do the best we can to improve the design, review, implementation, and testing processes around building these applications, but it seems impossible to program in this way and confidently believe that there are no synchronization bugs. Given what we know about threads and the nondeterminism they introduce, it is worth considering whether it is the right programming model to build on and if there is an alternative to be found in functional programming, actors, or something completely different.

No comments:

Post a Comment