Thursday, March 7, 2013

Dependency Management

Managing dependencies properly in software is a problem that I've run into time and time again regardless of what project I'm working on. By dependency management, I mean specifying which internal and third-party dependencies your code needs. There are many systems for doing this, such as Ivy and Maven in the Java world and SBT in the Scala world (I'm sure other languages have equivalent facilities). These work pretty well in general; for example, if I'm using Maven as my build tool and I need to depend on Google's Guava library, I can find it in Maven Central, add it to my configuration file, and be done with it. The existence of such repositories is one of the greatest developments in software engineering and has probably reduced the amount of code that you need to write by a few orders of magnitude. Inevitably, however, I always encounter situations in which dependency management causes major headaches, in particular due to inconsistent version requirements.

For example, suppose you are working on a project that requires third-party libraries A and B. Both of them are on Maven Central, so you happily put them into your build configuration and are on your way. Well, not quite. Maven is smart in that it recursively pulls dependencies so that you actually have all of the JARs that you need to run your application. But it turns out that library B was last updated five years ago while library A is brand new, and they both depend on a third library C which has had several releases over the last five years. So B specifies version 1.1 of C as a dependency while A uses the latest-and-greatest version 2.0-alpha. To make things even better, the authors of library C decided to break backwards-compatibility with the major version change from 1.x to 2.x, so now you're stuck with two incompatible versions of the same library on your classpath. Maven will most likely complain because of this, but even if you can coerce it to compile your application (which technically will work), upon running the code you will see scary things like NoSuchMethodErrors and IllegalAccessErrors.

So if you find yourself in such a situation, what can you do? One course of action is to decide on either version 1.1 or 2.0-alpha of library C, find the source of A or B, and build your own version of it after changing the dependency on C. This is quite error-prone because you are not familiar with the third-party code and consequently don't have a full understanding of the scope of the dependency on C. It is also time-consuming to dive into the details of implementations that are supposed to be abstracted away from you and mess with bits and pieces of the internals. I have gone through this process a couple of times (when dealing with ubiquitous libraries like Apache HttpClient and Apache Commons), and it's never been fun.

The problem is that there is nobody who is at fault here; the authors of the libraries are all handling their releases in reasonable ways, and you may very well be the first one who has wanted both libraries A and B in the same application. When nobody does anything wrong and you end up with such a nasty situation, something seems wrong. I recently came across a practice that somewhat mitigates the pain: the authors of the Apache Commons Math library used unique package names when they went from version 2 to 3 (org.apache.commons.math vs org.apache.commons.math3) which means that you can safely have them both on your classpath and think of them as completely separate libraries. But it's not clear whether the generic problem of dependency management is actually "solvable" without having all developers adhere to some rigorous backwards-compatibility standard -- certainly an impossible task.

P.S. There are a few other things that I may as well complain about while I'm on this topic. If you run your own continuous integration system and your code is modular to any degree, you'll most likely run into incompatible versions of your own JARs when certain builds break. This is very annoying and a big killer of developer productivity. Additionally, Scala takes dependency management to a whole new level by having libraries be associated with a specific version of the language, so Maven will complain if you have some dependencies built against Scala 2.10.0 while others are built against Scala 2.9.2. Since people don't always update their libraries in a timely fashion, the process of upgrading your own version of Scala can be painful.

No comments:

Post a Comment