Thursday, January 17, 2013

Concurrency with REST

Representational State Transfer (REST) is one of the main architectural models that web services use to communicate with each other. While it is not tied to HTTP, it is most often used on top of HTTP for various web APIs (e.g. Amazon S3, Stripe, Netflix, Zendesk, Greendizer). A very common pattern when writing REST APIs is the management of entities through the four operations of create, read, update, and delete (CRUD). For example, these entities could be buckets in Amazon S3, customers in Stripe, or tickets in Zendesk. The CRUD operations map very naturally to HTTP verbs (POST, GET, PUT, and DELETE, respectively), so you end up with an interface that is clean and easy to understand for developers integrating with your service.

One problem that inevitably arises because we are working in a distributed setting is how to deal with concurrency. What kind of semantics should be guaranteed for clients of the API who are concurrently operating on the same entities? Reads are innocuous here because they don't modify state; creates and deletes are not particularly worrisome either as they can only happen once each. This leaves us with just the update (i.e. conventional write) operation to think about. It's easy to see how two unsynchronized updates can clobber each other in ways that are undesirable, leading to the lost update problem. But remember that HTTP is stateless, so locking a resource is quite dangerous when the client who obtains the lock does not successfully issue another request to release it. If we can't lock, then how do we solve this problem? Enter optimistic concurrency control.

Optimistic concurrency control is based on the assumption that in most cases there is not contention for a resource (hence the "optimistic"). It relies on versioning the entities in the system such that every time an update is made to an entity its version changes, e.g. a counter that increments on each update. When a client gets an entity, he is also given the current version information for that entity; if he then wishes to update the entity, he must provide the same version information that was received. Before performing the update, the server checks to make sure that the version provided by the client matches the version on the server (i.e. nobody else has performed an update) and only proceeds with the update if that is the case. These simple steps ensure that no updates are lost and that the system is robust to client failure; the performance is also good provided the resource is not highly contended.

HTTP has a built-in feature for this version information called the ETag header which is used for caching and conditional requests. When a client gets an entity from the server, the ETag header in the response will be populated with the version information. In practice, the version information is typically the result of applying a cryptographic hash function to either the data itself or some unique version identifier. If the client subsequently decides to update that entity, he should provide the version information in an If-Match header in the update request (a conditional PUT). This allows the server to do proper validation of the client's update as per the optimistic locking protocol and return the appropriate status code if the version no longer matches. If the client gets a response that indicates a version mismatch, he can simply get the entity again and perform the update as desired, the important point being that the other update is acknowledged.

The use of the ETag header for concurrency control is not necessary in all REST APIs, but it can be an easy way to relieve clients of worrying about complex synchronization themselves.

No comments:

Post a Comment