Thursday, June 13, 2013

Scala Closures and Vars

One of the interesting choices that the designers of Scala made is how to deal with accessing mutable local variables from closures. For example, considering the following snippet:

The code is straightforward, and it is easy to see that the value 1 will be printed. What is interesting is that the foo method has a mutable local variable (var) i which is referenced from the closure f. For those of you familiar with Java, you'll know that the equivalent is prohibited; if you create an anonymous inner class in a method (what closures are in Scala) you can only refer to final variables (vals in Scala) from the local scope. That is because any such class you create has unknown scope as it lives on the heap, which means you cannot assume that the locally scoped variable will be around for the lifetime of the class. While it may be a sensible restriction, it is also the source of much confusion for beginning Java programmers who are creating classes local to the current scope and cannot understand the error. As such, Scala lifts this restriction in order to simplify things for developers.

The problem that Java avoids is still there, though. Scala has to do something under the hood to ensure that running the closure always works even if the variable has gone out of scope. The solution is to move the variable to the heap and put it in a mutable wrapper, which in the case of integers is the IntRef class. So instead of creating an integer on the stack, the line var i = 0 actually creates an IntRef on the heap which closures then capture references to. All direct references to the variable (both reads and writes) are replaced with indirect references through the wrapper, which ensures that the local method as well as closures will all see each others' modifications. Easy enough, right?

Well, the designers of Java were probably smart enough to figure something like this out themselves, yet they did not choose to. One good reason is that shared mutable state is a very tricky thing in concurrent environments. Consider the following variation on the above code:

Now instead of running f inline we spawn a thread that runs it. All sorts of alarms and warning bells should be going off in your head when you see this code, as there are now two threads potentially accessing an unsynchronized variable. And there is no way to determine whether 0 or 1 will be printed when you call foo since you have a race condition due to unsafe accessing of the variable i. Now you might think this example is contrived, but it could be that you pass the closure to some other method which, deep in the internals of its implementation, runs the closure on some other thread. Allowing references to local variables in closures throws away the invariant that you never need to synchronize access to local variables and essentially makes them as "dangerous" as instance variables. So while Scala simplifies some natural use cases it has also potentially opened up a can of worms with the types of subtle concurrent access bugs that can now affect local variables.

No comments:

Post a Comment