Sunday, August 18, 2013

Software Transactional Memory in Clojure

Having briefly explored Erlang and the actor model as the basis for a programming with concurrency, I spent some time learning about Clojure, a dialect of Lisp that can run in the JVM and is designed for concurrency. Clojure is also functional in nature and thus emphasizes immutability as the solution to the pitfalls of multithreaded programming. But while Erlang uses actors to coordinate state, Clojure has a minimal interface for interacting with shared, mutable state in a more traditional manner. Instead of controlling access to the state via synchronization primitives like locks, however, it employs software transactional memory (STM) to ensure that different threads can access the same state safely. For reference, this article is an excellent resource on STM in Clojure and additionally touches on the drawbacks of traditional lock-based synchronization as well as the actor model.

STM can be thought of as very similar to database transactions in terms of the guarantees it provides to clients. The standard ACID properties are respected by software transactional memory. They are
  • atomicity: either all changes are committed or all changes are rolled back;
  • consistency: the state at the beginning and end of a transaction, as visible to the transaction, will be consistent;
  • isolation: modifications within a transaction are not visible to other transactions until they have been committed; and
  • durability: once a transaction has been committed, its modifications will not be lost.
These properties greatly simplify the manipulation of shared state, as operations done within a transaction can essentially assume that none of the state will be modified at the same time. Here is an example of Clojure code appending numbers to a list with and without transactions.

To briefly explain the code, the first two lines define two global variables (using the def function), numbers-unsafe and numbers-safe. The former is simply a pointer to an immutable list, while the latter is a Clojure Ref, which is a reference that can be protected by STM. In parallel (pmap = parallel map), the integers from 0 to 99,999 are appended (using conj) to each of the two lists and we print the resulting list's length. The output will look something like this:

unsafe:  88067
safe:  100000

The important line of code is line 8, (dosync (alter numbers-safe conj n))). The dosync keyword indicates the start of a transaction, and alter applies an update function to a Ref within a transaction. Since all appends are done in transactions, they are isolated from each other by Clojure's STM, and thus the result is correct. What is nice about programming like this is that you do not need to be especially conscious of how different threads may be interacting with shared, mutable state. As long as you always remember to access such state through transactions (which should be clear by the fact that they are declared as a Ref), you are protected against unsynchronized accesses.

It is pretty cool that software transactional memory has been implemented in the core of a language, as it is conceptually appealing. Because it seems so easy to use, however, the price of synchronization must be paid somewhere other than by the programmer, and it is in the overhead and performance implications. Next time I'll go into some more detail about how STM might be implemented and the hidden costs of using it.

No comments:

Post a Comment