While reading through the Akka documentation (the actor and remoting library for Scala), I once again came across a reference to the classic distributed systems paper from Sun discussing the difficulty of unifying local and remote programming models. I briefly touched on the content of the paper in a previous post, but it's worth revisiting in the context of concurrent programming and understanding how the actor model works in a distributed system. With Akka actors, like in Erlang, the programming model revolves entirely around message passing and asynchronous processing even for actors running within the same Java Virtual Machine (JVM). This blurs the line between whether you are sending messages to a local actor and a remote actor, but it is arguably done in the "correct" way. So it is interesting to understand what the difference is between the actor model and other attempts at such unification, e.g. remote procedure calls (RPC), which are generally considered to be less than ideal (although still widely used).
Firstly, we need to recap the four points brought up in the paper that supposedly doom a unified local and remote object-oriented model. They are latency, memory access, partial failure, and concurrency. Latency and memory access are fairly straightforward; because the object on which a method is called lives in a different address space on a remote machine, the method call is going to be much slower and memory accesses have to be managed in some meaningful way across multiple machines. Partial failure refers to the fact that there are multiple components that can fail independently of each other (e.g. the network and the remote entity), which can cause the method call to be in an unknown or perhaps even inconsistent state. Lastly, the issue of concurrency arises because there is no shared synchronization system which can mange the order of method calls on an object, so the programmer must be aware of the fact that an object can receive multiple method calls simultaneously. These points can typically be alleviated by a well-designed RPC library and being diligent in distinguishing local and remote invocations, but that is essentially the argument being made, that in practice they are not truly unified.
So then the question is how the actor model is different and why interacting with local and remote actors can be done in a mostly unified way. The problems of latency and concurrency are non-issues in the actor model because of the asynchronous design and the fact that each actor is processing a single message at a time. And assuming that actors have no shared state and communicate exclusively through the immutable messages passed between them, we can avoid confusing memory access semantics as well. Partial failure is, again, the most complex of the issues, and the actor model itself does not prescribe specific guarantees about message processing, such as idempotence, so it is still left to the programmer to figure out how to keep the system consistent. Regardless, we can see that actors, by design, are better suited for a unified local and remote model since they greatly reduce the space of potential problems to worry about. The corollary, however, is that local operations are more complex due to the way in which communicating between actors and sharing state is restricted, but I am not entirely convinced that this is a bad thing in light of the difficulty of multithreaded programming.
The more I learn about actors, the more optimistic I become about their potential to make concurrent programming less of a disaster. The Akka project is really interesting and gives Scala one more feature that is very much lacking in Java; the mere possibility that synchronized blocks and semaphores can be avoided altogether is enough to motivate me to pursue understanding Akka. Given that the future only holds more concurrent and distributed programming, I feel like programmers should be much better educated about the alternatives to traditional, error-prone multithreading.
No comments:
Post a Comment