Saturday, December 29, 2012

Static Initializers and HotSpot Optimization

HotSpot optimization is one of the core features of the Java Virtual Machine (JVM) that allows JVM-based languages like Java and Scala to have great performance even in comparison to languages that compile directly to machine code. It enables dynamic, run-time optimization of bytecode in various ways, including "true" compilation into machine code, called just-in-time (JIT) compilation. As a result, writing code that is more conducive to HotSpot optimization can lead to huge performance gains; conversely, not doing so can produce unexpectedly slow code. Consider the following two simple Java programs:

They are almost identical, except that the former has the body in the main method while the latter has it in a static initialization block. Running the two programs produces the same result. The key difference, however, is performance: executing the first program takes about 5 ms, while executing the second takes over 40 seconds. With a performance gap of over 1000 times for two pieces of nearly identical code, this is surely worth investigating a bit.

Thanks to this blog post, I discovered two interesting flags you can pass to the JVM called -XX:+PrintCompilation and -XX:+LogCompilation which will cause it to print out all sorts of information regarding the JIT compiler, such as when methods are JIT compiled and when they are recompiled. Here is a snippet of the output for the second program (from log compilation):

I verified that in both cases the JVM tried to JIT compile the main method and the static initializer in the two examples, respectively. Furthermore, in both cases, the compiler performed the key optimization of inlining the call to run(). The discrepancy lies in the snippet, which is repeated in the log for the second program's execution many times. In short, it says that it successfully compiled the static initializer and then hit a trap because the class had not yet been initialized, which forced it to reinterpret the method rather than using the compiled version. This means that the call to run() is not inlined, leading to significantly worse performance.

Since the static initializer is executed at classloading time, it is indeed true that a class has not yet been initialized before static initialization happens. Why this means you have to reinterpret the method is unclear to me, but I'm sure the JVM writers have a good reason for it. So where does this leave us? It seems that in general static initializers cannot be JIT compiled since by definition the enclosing class has not been initialized yet. This, among other reasons, is why you should never perform expensive operations in static initializers.

Finally, I'll provide a Scala example of why this is not a completely trivial case to worry about.

Scala makes it look very natural to put code into the body of an object. Unfortunately, these examples compile into bytecode that closely resembles that of the Java examples above. The for-comprehension is not implemented by a simple for loop but rather by a call to foreach, which takes a closure and calls a method on it (like the Runnable). Most importantly, the body of the second example is run in a static initializer, which causes the same problem that we discussed and the same abysmal performance.

No comments:

Post a Comment