Sunday, September 8, 2013

Actors and Error Handling

In addition to the basic model of actors, Erlang has another core language construct that is intended to help programmers build reliable systems. It has built-in support for various forms of error handling in the context of Erlang processes. For example, a single process will often rely on the assumption that various other processes that it depends on are available and will stop functioning properly if any of those processes fail. In many other languages, because the concept of disjoint processes is not fundamental, handling such failures is generally ad-hoc and often ignored by programmers until something unexpected happens. But in the actor model and especially when blurring the line between local and remote processes, it is necessary to treat external process failures as an expected outcome that can ideally be recovered from. As such, Erlang provides three primitives with which to build robust systems of processes: links, traps, and monitors.

Links are a way of specifying a hard dependency between two processes: if one of them dies, the other cannot function and thus should die as well. They can be used to set up groups of processes which correspond to some meaningful entity in the system that requires each of the individual processes to be running properly. When processes are linked and one dies, it sends a specific 'EXIT' message to the other that isn't received or caught by the normal messaging and exception handling constructs (instead killing the receiving process) unless some specific measures are taken. This leads us to the concept of traps, which are a method by which you can explicitly handle an exit message from a linked process. If two processes are linked without either of them trapping the exit messages, they will die together. But if, for example, you have a process which is dedicated to monitoring a group of other processes, this process can decide to receive exit messages and act accordingly when they are received, e.g. restarting the entire group of processes. Links and traps allow programmers to specify entities in a system that continue operate after failures.

The final piece of Erlang error handling is the ability to monitor other processes. This is similar to links except that monitors are unidirectional and stackable, meaning that you can compose them more easily and that they work better with libraries. Moreover, you do not need to trap the 'DOWN' messages that the monitored process sends; they are received as normal messages since there is no implicit process kill involved (unidirectional means there is no tight coupling between the processes). Monitors are a more flexible way for processes to communicate their errors to each other without linking their fates together. Between these three simple primitives for error handling, it is possible to construct systems that respond well to failure and can self-recover, leading to claims such as Joe Armstrong's (the creator of Erlang) nine 9's of reliability in the Ericsson AXD301.

As with the actor model, Scala's Akka library provides some similar functionality to Erlang's error handling, which is the concept of supervision. In some sense, Akka supervision is a stricter version of links and monitors, where every actor must be supervised by some other actor, its parent, so that the set of all actors forms a hierarchy (the root actor is provided by the library). The supervising actor can employ different strategies for monitoring the lifecycle of each of its child actors, which often involves restarting failed actors. While not quite as flexible as Erlang error handling, the way Akka enforces the actor hierarchy leads programmers to naturally structure their systems as a hierarchical tree of components that can either function independently of failures in other parts of the hierarchy or propagate/handle errors as necessary. Building concurrent and distributed systems requires serious consideration of errors and failures, but it's often the case that there are no good tools for doing so. Erlang's ideas and Akka's adaptation of them are examples of how this problem can be tackled at a first-class level.

No comments:

Post a Comment