Saturday, December 29, 2012

Static Initializers and HotSpot Optimization

HotSpot optimization is one of the core features of the Java Virtual Machine (JVM) that allows JVM-based languages like Java and Scala to have great performance even in comparison to languages that compile directly to machine code. It enables dynamic, run-time optimization of bytecode in various ways, including "true" compilation into machine code, called just-in-time (JIT) compilation. As a result, writing code that is more conducive to HotSpot optimization can lead to huge performance gains; conversely, not doing so can produce unexpectedly slow code. Consider the following two simple Java programs:

They are almost identical, except that the former has the body in the main method while the latter has it in a static initialization block. Running the two programs produces the same result. The key difference, however, is performance: executing the first program takes about 5 ms, while executing the second takes over 40 seconds. With a performance gap of over 1000 times for two pieces of nearly identical code, this is surely worth investigating a bit.

Thanks to this blog post, I discovered two interesting flags you can pass to the JVM called -XX:+PrintCompilation and -XX:+LogCompilation which will cause it to print out all sorts of information regarding the JIT compiler, such as when methods are JIT compiled and when they are recompiled. Here is a snippet of the output for the second program (from log compilation):

I verified that in both cases the JVM tried to JIT compile the main method and the static initializer in the two examples, respectively. Furthermore, in both cases, the compiler performed the key optimization of inlining the call to run(). The discrepancy lies in the snippet, which is repeated in the log for the second program's execution many times. In short, it says that it successfully compiled the static initializer and then hit a trap because the class had not yet been initialized, which forced it to reinterpret the method rather than using the compiled version. This means that the call to run() is not inlined, leading to significantly worse performance.

Since the static initializer is executed at classloading time, it is indeed true that a class has not yet been initialized before static initialization happens. Why this means you have to reinterpret the method is unclear to me, but I'm sure the JVM writers have a good reason for it. So where does this leave us? It seems that in general static initializers cannot be JIT compiled since by definition the enclosing class has not been initialized yet. This, among other reasons, is why you should never perform expensive operations in static initializers.

Finally, I'll provide a Scala example of why this is not a completely trivial case to worry about.

Scala makes it look very natural to put code into the body of an object. Unfortunately, these examples compile into bytecode that closely resembles that of the Java examples above. The for-comprehension is not implemented by a simple for loop but rather by a call to foreach, which takes a closure and calls a method on it (like the Runnable). Most importantly, the body of the second example is run in a static initializer, which causes the same problem that we discussed and the same abysmal performance.

Tuesday, December 25, 2012

Almost Surely

The concept of probability when dealing with infinite spaces can be a little confusing and counter-intuitive. One "paradox" that came up while I was watching lectures for a computational finance course on Coursera was the idea that an event which has probability zero of occurring can actually occur. For example, if you choose a random real number between 0 and 1, the event of choosing 0.5 exactly can certainly occur, but any reasonable calculation of the probability will lead to a value of zero. Naturally, I assumed people had formalized this notion, so I did a bit of searching online.

This led me to the concept of events that happen almost surely, which is to say they occur with probability one; alternatively, events that happen almost never have zero probability of occurring. There's nothing too crazy here; an event that happens almost surely simply has an infinitesimally small chance of not happening. So for mathematical purposes it only makes sense to assign the probability as one. To quote the Wikipedia article, "the difference between an event being almost sure and sure is the the same as the subtle difference between something happening with probability one and happening always."

Examples of events that happen almost surely are: flipping a coin an infinite number of times and getting tails at least once, picking a random real number between 0 and 1 and getting an irrational number. Examples of events that happen that happen almost never are: a random variable drawn from a continuous probability distribution taking a particular value, picking a random real number between 0 and 1 and getting 0.5. Moreover, the complement of an almost sure event happens almost never and vice versa. So, to sum things up, an event that happens with probability zero might not never happen, just almost never happen.

Monday, December 24, 2012

Mixin Ordering

Mixins are a feature of Scala that allow for more complex class composition than what you see in Java (essentially multiple inheritance with better rules to avoid ambiguity). In Scala, "classes" that you mix in are called traits. They let you do some nice things such as mix in utilities that a lot of classes use, e.g. a Logging trait which has a method that lets you print to the console with the class name.

Creating an instance of the Foo class prints the following:

While in this case we could have just made Logging a class and had Foo extend it, when you have a real class hierarchy and Foo has a real superclass, it's not easy to have this logging functionality in a single place. Allowing us to mix in the Logging trait into any class gives us a much better way to reuse code.

As with many features of Scala, mixins are both powerful and potentially dangerous. In particular, the ordering of mixins has an effect on the behavior of the program beyond the standard "later traits override methods of earlier traits" contract.  Consider the following example which is intended to print out a fact about a meter.

It turns out that the output of this program is not what you would expect, but rather

So why is this? Because this example is relatively simple, it's pretty easy to see that the getFact method is being called when the PrintFact trait is initialized, which you might guess happens before the ConversionConstants trait is initialized (and you'd be right). So the value read is 0 instead of the expected 100. It should also be clear that reversing the order of the traits will actually fix the problem since ConversionConstants doesn't have a (indirect) dependency on PrintFact. Moreover, omitting the PrintFact trait and putting the print statement in the body of PrintFactAboutAMeter also fixes the problem because the class constructor is called after all of the traits are initialized.

While the particular example described here might seem contrived, I ran into this problem recently while working on a real project, and it led to a very strange debugging session. So it is important to be careful when using mixins, specifically when doing anything in the initialization of a trait (as well as in a superclass constructor), because you might inadvertently add a dependency going in the wrong direction along the mixin ordering. This is a bit reminiscent of circular dependencies in static initializers in Java.

Sunday, December 23, 2012

Sieve of Eratosthenes in Scala

The Sieve of Eratosthenes is one of the most efficient ways to find all of the primes numbers below some limit (e.g. 100 million) and an essential tool for any programmer interested in various algorithms contests. Recently, I decided to start doing some Project Euler problems in Scala (my new favorite language) and before long I had to write the sieve.

One tricky thing about working with Scala for those of us coming from Java/C++ backgrounds is that some of the nice constructs can lead to pretty bad performance. More details on Scala performance will be saved for another day, but in this case I was able to come up with a pretty concise and "idiomatic" (quotes because I'm still a Scala newbie) way of implementing the sieve which has quite good performance.

Here, we take advantage of Scala's Stream and Range classes for our implementation of the sieve. The outer Stream foreach does the job of picking every prime that we should check divisibility against, basically combining a while construct (takeWhile) with a conditional (filter). The inner Range foreach is a simple replacement for a for loop, marking the multiples as composite. Note that Streams in Scala are lazily evaluated, which means that the filter actually takes into account the current state of the prime array at each step rather than filtering uselessly upfront.

The foreach construct is one example of a language feature that can sometimes give you unexpectedly bad performance, but the developers of the language/compiler have done a good job to make sure that foreach on a Range is very fast (the overhead on Stream is nontrivial, though, as far as I can tell) which means that our inner loop still performs adequately. All in all, this example runs about 50% slower than a Java implementation using for loops, clocking in at about 1.5 seconds on my laptop, which is not bad at all.

This is probably the first of many posts about Scala since it's a really interesting and expressive language that I also happen to use for my job, so look forward to more interesting discussions on the language!

New Blog

As 2012 closes out, I'd like to start a new blog in order to motivate myself to pursue personal projects and edification more aggressively. My intention is to primarily discuss programming and computer science, but hopefully I am sufficiently engaged in other topics to warrant posting about them as well. Here's to 2013 as a great blogging year!