Sunday, December 23, 2012

Sieve of Eratosthenes in Scala

The Sieve of Eratosthenes is one of the most efficient ways to find all of the primes numbers below some limit (e.g. 100 million) and an essential tool for any programmer interested in various algorithms contests. Recently, I decided to start doing some Project Euler problems in Scala (my new favorite language) and before long I had to write the sieve.

One tricky thing about working with Scala for those of us coming from Java/C++ backgrounds is that some of the nice constructs can lead to pretty bad performance. More details on Scala performance will be saved for another day, but in this case I was able to come up with a pretty concise and "idiomatic" (quotes because I'm still a Scala newbie) way of implementing the sieve which has quite good performance.

Here, we take advantage of Scala's Stream and Range classes for our implementation of the sieve. The outer Stream foreach does the job of picking every prime that we should check divisibility against, basically combining a while construct (takeWhile) with a conditional (filter). The inner Range foreach is a simple replacement for a for loop, marking the multiples as composite. Note that Streams in Scala are lazily evaluated, which means that the filter actually takes into account the current state of the prime array at each step rather than filtering uselessly upfront.

The foreach construct is one example of a language feature that can sometimes give you unexpectedly bad performance, but the developers of the language/compiler have done a good job to make sure that foreach on a Range is very fast (the overhead on Stream is nontrivial, though, as far as I can tell) which means that our inner loop still performs adequately. All in all, this example runs about 50% slower than a Java implementation using for loops, clocking in at about 1.5 seconds on my laptop, which is not bad at all.

This is probably the first of many posts about Scala since it's a really interesting and expressive language that I also happen to use for my job, so look forward to more interesting discussions on the language!


  1. Wouldn't one want to do Stream.from(2).toIterator as the Stream class is itself computing a list of all the primes? It'd get GCed since you don't capture the output, but using the iterator means that only the primes array is hanging on to your output which seem to be what you intend. Although having a list of the primes themselves is useful.

  2. i'm doing some project euler problems in scala which is why i came back to revisit this post :)

  3. Sorry for the late response, but yeah, that's right. As long as you hold onto the head of the stream, none of it can get GC'ed. And using an iterator gets around that (I actually ran into this issue in some real code a while back).

  4. Is there any benefit here in using Stream with takeWhile instead of a for loop that I think is easier to read:

    for(i <- 2 to sqrt(N.toDouble).toInt; if prime(i); j <- i to N/i) {
    prime(i*j) = false