Thursday, October 10, 2013

Google Go (Part 2, Concurrency)

Last time, I wrote about some of the interesting and unique design decisions regarding dependencies and imports in Google's Go programming language. This time, we're back to my favorite topic: concurrency. Because essentially all machines have multiple cores/processors now, every new language has to address the problem of concurrent programming in some way and Go is no exception. Recall that the guiding principle behind the design of Go is its focus on optimizing the process of building large software systems, which includes things like developer productivity and tool support. The result of this is that Go adopts a communicating sequential processes (CSP) approach to concurrency, which "has the property that it is easy to add to a procedural programming model without profound changes to that model." This is a little disappointing when you consider the options out there, but given Google's preference for familiarity and the fact that they want to be building real systems quickly using Go, the choice is not too surprising. That said, it is still worthwhile to explore how it integrates into the language and what its strengths and weaknesses are.

Concurrency in Go is built around two concepts: goroutines and channels (note that the code examples below are roughly taken from these links). A goroutine is a fairly simple concept: it simply allows you to spawn a function that runs concurrently. For example, the statement go list.Sort() spawns a goroutine that sorts the list. The nice thing about this is that Go abstracts away the details of creating and managing threads while still efficiently utilizing the CPU. It is able to identify when a goroutine is blocking on I/O, for example, and ensure that the OS thread that is executing that goroutine is freed up to execute a different one. A world in which functions run concurrently and independently is great, but the interesting stuff comes when they need to communicate, which leads us to channels. It turns out that they are not very complex either; a channel is more or less just a blocking queue that multiple goroutines can take from and put into. It allows them to synchronize at certain points and send messages to each other. For example, if you wanted to wait for a goroutine to finish (e.g. the equivalent of joining a thread in Java), you would do something like this:

In the above code snippet, the c <- 1 line sends the value 1 (arbitrary) to the channel, while the <-c line blocks on receiving a value from the channel (which is ignored). By default channels have no buffer (imagine a blocking queue of size 0) so sends and receives wait for each other, but they can be given a capacity as we will see next.

Let's look at a slightly more involved example with a channel being used as a semaphore. In the code below we simulate a system that is processing requests from a queue; we would like to process requests in parallel, but limit the maximum number of simultaneous requests being processed (i.e. a typical semaphore use case).

The number of elements in the channel indicates how many permits the semaphore has available, so getting a permit is receiving from the channel and releasing a permit is sending to the channel. We spawn a goroutine for each request in the queue (which is itself a channel) that processes the request and releases a permit.

The more interesting thing about this example, however, is that it contains an insidious bug. You might expect the closure that the goroutine executes to capture the current request that is being dispatched, but unfortunately it doesn't. In this case, the language closes over the reference to req rather than the value at the time of dispatch, which means that you get undefined behavior over which requests are actually processed. Each goroutine will process whatever request the req variable points to when it executes instead of when it was created. It is unfortunate that such a common pattern has a pitfall like this, and the pain is evidenced by numerous mentions of the problem (e.g. here and here) that suggest rather ugly solutions like defining another variable in the loop body or passing the value explicitly as a parameter to the goroutine.

As far as simple and familiar models of concurrent programming go, CSP based on goroutines and channels is not bad. The danger really lies in the fact that Go has a memory model similar to that of C++ (i.e. pointers and references) which can lead to bad code with the use of channels. The article explains it as, "Go enables simple, safe concurrent programming but does not forbid bad programming." This feels accurate to me and may be a practical choice for the language, but given our understanding of how easy it is to "program badly," one might wish that Go took a more progressive approach in addressing the problems around concurrency.

No comments:

Post a Comment