Saturday, August 24, 2013

STM Implementation

As described last time, software transactional memory (STM) is a concurrent programming model that hides a lot of the complexities of synchronization away from programmers by exposing a concept similar to that of database transactions. By providing strong guarantees about the behavior of memory accesses in transactions, STM allows programmers to frame operations as single-threaded computations and not worry about locking. This necessarily pushes the synchronization details into the system which implements STM, however, and it's not without a cost. I'll briefly cover two important aspects of implementing an STM which can have serious implications on its performance: data versioning and conflict detection.

Why is data versioning necessary? The answer is straightforward: transactions need to know when the memory addresses they are reading and writing are modified in order to ensure isolation. There are generally two ways to manage the versioning within a transaction, given the requirement that you must be able to atomically commit or rollback all of the transaction's operations. The first is eager versioning, which consists of updating the memory in-place (and the corresponding version) and maintaining an undo log in case of rollback. The second is lazy versioning, which consists of buffering all writes done by the transaction and applying them all at the time of commit. There is a key performance trade-off between the two: eager versioning performs better when most transactions successfully commit while lazy versioning wins when many transactions need to rollback due to conflicts.

Which leads to the second point: conflict detection. In order to determine whether a transaction needs to rollback, it is necessary to keep track of the read and write sets (memory addresses) of each transaction. Then there are two choices as to when to detect conflicts, namely at the time of the read/write (pessimistic) or when the transaction is being committed (optimistic). We see the same performance pattern as with versioning: optimistic is superior when few conflicts are expected, while pessimistic performs better when conflicts are common. It's also possible to mix the two, e.g. use optimistic reads and pessimistic writes, which makes sense because reads are generally less likely to produce conflicts. The choice of conflict detection mechanism greatly influences how often locks need to be acquired and the overall performance, as pessimistic detection results in additional work done on every memory access.

Implementing STM efficiently is not an easy task, especially when you take into consideration the fairness and forward progress guarantees that are ideal in such systems. Moreover, memory accesses will almost certainly have a significant amount of overhead associated with them, and it is this fact that leads languages like Clojure to make the distinction between normal variables and STM variables very clear. Programmers should restrict the use of STM to only the cases in which it is absolutely necessary and use immutable variables otherwise. While this may make STM more difficult to use and hinder its acceptance, the benefits may outweigh the costs in the long run. It turns out that the Akka library for Scala also has an STM implementation, which could mean that the concept is catching on, but it still has a long way to go before becoming mainstream. The transactional memory model is very powerful from the programmer's perspective, but understanding the implementation details and performance penalties is important if you ever want to use it extensively.

No comments:

Post a Comment