Monday, November 24, 2008

GC is for Goodstuff Collector

I have over the past few months noticed that there is a fairly common fear of creating objects in Java. When I query the authors, it always seems to boil down to an attempt to create more performant code through avoiding garbage collection.

So why would one want to create more objects, anyway? Well, one good reason would be to get more readable code, e.g. through encapsulating an algorithm or using appropriate helper objects to express an algorithm more clearly.

Even when the code does get put into a helper object, there seems to be a tendency to keep that instance around and reuse it, to avoid garbage collection so that the code performs better. My first comment is always "You should measure that". If I wanted to put it more sharply I should say "If you can't prove it's a performance benefit, then it probably isn't". I have worked with enough optimization to know that what's true for one language will not be true for another. Even using the same language, something that gives a performance benefit on one machine may be a detriment on another kind of machine (or even the same kind of machine with different "tweaks" like page sizes and such).

If you create a temporary object that lives only as long as you need it you gain the following benefits:
  1. Your object is always in a pristine state when you want to use it.
  2. Your code is a big step closer to being thread safe.
  3. Your code is more readable and easier to analyze.
  4. Your garbage collector gets to do less work.
"What?", you say, "The garbage collector does less work by collecting more garbage?".

Indeed it does. When we hear "garbage collector" we tend to think of the work involved in clearing out an overfilled store room, all that work to haul all the garbage out. But the garbage collector doesn't even know the garbage exists, the very definition of garbage is that it can no longer be reached from anywhere. What the garbage collector really does is create a whole new store room and move the stuff you want to keep over to it and then it lets the old store room and all the garbage disappear in a puff of smoke. So all the work done by the garbage collector is really done to keep objects alive, i.e. the GC is really the "goodstuff" collector.

This is obviously a somewhat simplified view and I don't think it holds completely for objects with finalizers (which is probably why finalizers are really bad for performance). Every single measurement and microbenchmark I've done confirms that creating and destroying objects is never worse and often much better than trying to keep objects around. I've done a few, first because I didn't trust it, then because others were so convinced of the opposite that I had to research it. That isn't an absolute proof that it will be true for all scenarios, but I think it's a pretty good indication of where you should be pointing when shooting from the hip.

From an IBM developer works article, we get the numbers that allocating an object in Java takes about 10 machine instructions (an order of magnitude better than the best C/C++ allocation) and that throwing an object away costs, you guessed it, absolutely zero. So just remember that GC stands for "Goodstuff Collector" and you'll be on the right track.