Monday, March 31, 2008

Java Thread-safe State Design Pattern

Thread-safe State


Apart from the things mentioned here, this pattern also conveys the intent and benefits of the GoF State pattern, being an extension thereof. The term "behaviour" is interpreted more strictly so that an object is always considered to change its behaviour when its state changes.

Intent


Make sure that an object is observable only in a valid consistent state and that transitions between valid consistent states are atomic and side-effects occur if and only if the associated transition occurs validly.

Motivation


Consider a class where behaviour depends on more than one field and/or side-effects occur as a result of certain state-changes. Such behaviour can be very difficult to analyze for thread-safety and the safe classic approach is to synchronize all methods. Such behaviour can also be difficult to analyze for logical correctness, bringing the motivation for the state pattern into play.

The key idea is to leverage the state pattern and store the current state in an AtomicReference. Current state should be represented by an immutable object. State transitions are performed by a compareAndSet (CAS) operation and a retry on failure. For transitions with side-effects, the CAS is performed to a boilerplate blocking state, the side effects performed, the resultant state set and the blocking state released, whereby each thread released from the blocking state retries the attempted operation on the resulting state. The CAS always compares with the "this" reference of the state instance being transitioned from and can only succeed for one thread, all other threads attempting a simultaneous transition have to retry on the new state.

Applicability


Any object that has a method that has a more complex interaction with the object state than one of either a simple get or a simple set (in which case just marking the field volatile is simpler and more performant).

Structure (given as implementation examples)


Delegate all calls to an immutable state instance:
class Implementation implements Interface {
  private final AtomicReference state;

  public ... doSomething(...) {
      return state.get().doSomething(...);
  }

  private class StateA implements Interface {
    private final ... valueX;
    private final ... valueY;

    public ... doSomething(...) {
       .... valueX ... valueY ...
    }
  }
}

Simple state changes performed by CAS or retry:
class Implementation implements Interface {
  private final AtomicReference state;

  public void setX(y) {
      state.get().setX(y);
  }

  private class StateA implements Interface {
    private final ... x;

    public void setX(y) {
       if (!state.compareAndSet(this, new StateA(y))) {
         state.get().setX(y);
       }
    }
  }
}

State changes with side-effects CAS to a blocking state first:
class Implementation implements Interface {
  private final AtomicReference state;

  public void printAndSetX(y) {
      state.get().printAndSetX(y);
  }

  private class StateA implements Interface {
    private final ... x;

    public void printAndSetX(y) {
       BlockingState guard = new BlockingState();
       if (state.compareAndSet(this, guard)) {
         try {
           print(x);
           state.set(new StateA(y));
         } finally {
           guard.release();
         }
       } else {
         state.get().printAndSetX(y);
       }
    }
  }

  private class BlockingState implements Interface {
    private final CountDownLatch latch = new CountDownLatch(1);

    public void printAndSetX(y) {
      try {
        latch.await();
      } catch (InterruptedException e) {
        // Don't care.
      }
      state.get().printAndSetX(y);
    }

    private void release() {
      latch.countDown();
    }
  }
}

Participants



  • Interface - defines the public interface.

  • Implementation - maintains an AtomicReference to current StateImplementation, implements the public interface.

  • StateImplementation - the implementation of the public interface for a particular state.

  • BlockingState - specialized boilerplate StateImplementation that blocks all calls until released and then delegates to the current state.

Collaborations



  • Implementation delegates relevant (usually all) calls to the current StateImplementation.

  • StateImplementation performs state transitions by a state.compareAndSet(this, nextState) or retry the transition on the current state (which is different from "this" since CAS failed) so that each state object can only be transitioned from once.

  • BlockingState blocks all calls until released, thereupon retries on new current StateImplementation.

Consequences



  1. Interactions with the object always occur with a consistent state of the object, the immutability of the state delegate guaranteeing a consistent result of the interaction and each separate interaction is therefore in effect atomic.

  2. State transitions are explicit and atomic.

  3. Performance seems to be as good as or better than a straightforward non-thread-safe implementatin with mutable fields. Performance is better than that of synchronization (compare "optimistic locking" versus "read/write locking") for applicable objects if some interactions do not entail state changes, especially when the object is reachable from more than one thread (even if access is never contended).

Sunday, March 23, 2008

Performance and thread-safe state handling

In my previous post I introduced using the state pattern and an AtomicReference to help maintain thread safety. My reason for starting to play with the pattern was really to make it easier to analyze the code and prove correctness, but obviously the question of performance inevitably comes up and it is interesting to know the trade-offs.

I set up a benchmark where ten threads would simultaneously try to execute the same code and each thread would just hammer away for an obscenely large number of times. First up for the test was the simple lamp:

public interface Lamp {
boolean isOn();
void setOn(boolean on);
}

The code to execute was:

b = lamp.isOn();
lamp.setOn(!b);

I had an implementation with just a volatile field, an implementation with both methods synchronized, a hybrid implementation where the field was volatile and only the setter synchronized and an implementation using the state pattern. I also ran the bad non-thread-safe implementation. The results with the average time for one run through the code: volatile (669 ns), bad (697 ns), state (985 ns), hybrid (1570 ns), synchronized (2535 ns). Interestingly enough, the state pattern kicks butt. Actually the state pattern was even faster than volatile in the first tests I ran, but that is probably because I ran too few iterations to pay the full cost of garbage collection.

So how about a predicted worst case, the CoffeeMachine where the state changes on every method call?

public interface CoffeeMachine {
void fillBeans(int beans);
void fillWater(int water);
Coffee makeCoffee(int water, int beans) throws OutOfWaterException, OutOfBeansException;
}

And the test code, same rules as above, ten threads whacking away:

Random r = new Random();
....
int w = r.nextInt(5);
cm.fillWater(w);
int b = r.nextInt(3);
cm.fillBeans(b);
try {
cm.makeCoffee(w,b);
} catch (OutOfWaterException e) {
oow.getAndIncrement();
} catch (OutOfBeansException e) {
oob.getAndIncrement();
}

This was a straight shoot-out between a synchronized implementation and a state pattern implementation. Result: synchronized (5021 ns), state (5333 ns). A slight victory for synchronization, yeah!

So how about adding some getters into the CoffeeMachine, how does that change things? The test code was changed to:

Random r = new Random();
....
int w = r.nextInt(5);
if(cm.checkWater() < w*10) cm.fillWater(w*100);
int b = r.nextInt(3);
if (cm.checkBeans() < b*10) cm.fillBeans(b*100);
try {
cm.makeCoffee(w,b);
} catch (OutOfWaterException e) {
ons.getAndIncrement();
} catch (OutOfBeansException e) {
offs.getAndIncrement();
}

With the getters present it makes sense to try a hybrid approach again. Result: state (2547 ns), hybrid (3675 ns), synchronized (4930 ns).

Conclusion: not only does the state pattern with an AtomicReference to an immutable internal state representation make it easier to manage code complexity, it is also beneficial from a performance point of view, being just slightly slower than synchronization in the worst case and twice as fast in normal cases.

Addendum: My keen colleague, the informed sceptic, pointed out to me that perhaps some more testing is needed before it can be generally said that the state pattern is better performing. He claimed that normal usage is that threads do something else and then hit the shared resource, which results in the locks being mostly uncontended. I suppose he has a point. But how to test it?

First attempt, just run my Lamp benchmark single-threaded: volatile (10 ns), synchronized (10 ns), bad (19 ns), hybrid (22 ns), state (42 ns). Oops! State pattern is 4 times slower than synchronized. Interesting also that plain old non-thread-safe code is slower than volatile and synchronized. The hybrid approach is twice as slow as synchronized. But what is going on here? We've run a long warm-up and then we're just whacking away at our Lamp from a single thread. Obviously HotSpot is able to do some nice things with the synchronized methods in this case (good to know).

Next attempt, run a minimal warmup and a short time: volatile (10 ns), bad (18 ns), state (44 ns), hybrid (63 ns), synchronized (87 ns). So now we're not hitting it hard enough for HotSpot to do all its magic synchronized is dog slow again.

Third attempt, hit the class hard from just two threads: bad (120 ns), volatile (121 ns), state (187 ns), hybrid (338 ns), synchronized (752 ns). But we're still hitting it hard. We need to figure out how to simulate the non-contention scenario with many threads.

Fourth attempt, put in a sleep in each loop. No good, there's too much noise coming in from the calls to sleep(). For what its worth, the numbers are very similar, differences are less than 1% (duh!). We need another angle of attack.

Just for a change of pace I decided to run the CoffeeMachine benchmark in one thread: state (114 ns), hybrid (160 ns) and synchronized (212 ns). Hmm, I realize I didn't really run single-threaded because there were two threads that had access to the CoffeeMachine object. One thread was just waiting for the other to finish, though, so all locks were uncontended. Trying that with Lamp I get: volatile (13 ns), bad (20 ns), state (42 ns), hybrid (65 ns) and synchronized (89 ns). It seems that uncontended locks are just as expensive as contended locks and HotSpot was able to do some powerful magic when the object was only accessible from one thread. Sanity check on CoffeeMachine running truly single-threaded: state (110 ns), hybrid (113 ns), synchronized (208 ns). Surprise! HotSpot couldn't do so much this time! I just have to check the performance against a standard non-thread-safe implementation: standard (123 ns) versus state (110 ns)!!!!

Conclusion 2: The thread-safe state pattern is excellent to use as a standard pattern. As my eager colleague points out so often, the important thing is to be able to analyze the correctness of the code and performance is usually not so important. Well, the state pattern was originally evolved to make it easier to handle varying behaviour depending on state and transitions between these behavioural states, so that requirement is solidly fulfilled. For anything but the simplest of classes, the thread-safe state pattern also seems to be most performant way of achieving thread safety and actually performs better than an ordinary non-thread-safe implementation in at least the one case tested here.

Monday, March 17, 2008

Thread-safe state handling

In my previous post I noted that handling mutable state in multi-threaded programs is a problem that should preferably be avoided. Sometimes it may be too complicated to avoid mutate state and I will in this post look into some ways of handling it, including how the classic State design pattern can help, with a little twist. A good, if old, introduction to threading issues in Java can be found here.

Consider the following code:

public class Lamp {
private boolean on;
public boolean isOn() {return on;}
public void setOn(boolean on) {this.on = on;}
}

No threading problems, right?

Wrong! I had read about it but learnt it again the hard way when one of my unit tests started failing spuriously. I could tell from other measuring points that there was no way that assert could fail, yet it did, sometimes. The issue is that, especially on a multi-core machine, there is no guarantee that other threads will ever get to see changes. To guarantee that, either the same synchronization monitor must be used for both reads and writes, or the variable needs to be volatile.

With that lesson in mind, let's look at another example:

public class WaterDispenser {
private volatile int water;
public void fill(int amount) { water += amount; }
public void get(int amount) throws InsufficientWaterException {
if (water >= amount) water -= amount;
else throw new InsufficientWaterException();
}
}

Unfortunately, += and -= are not atomic operations so two threads filling up at the same time may result in only the one fill being effective. Also there is a gap between the check and the decrement so the class may end up in an invalid state if two threads get water at the same time.

It is always an option just to synchronize the methods and it may well be the best option in this case. It is when you have frequently used methods that just query the state that you want to avoid synchronization because it will give an overhead on every query.

Just for illustration, let's use an AtomicInteger here, which has the same memory consistency guarantees as volatile and a bit of extra functionality. The addition in fill is now atomic, but in the get we also check that we have enough water so it's not good enough to just do an atomic subtraction after the check since that precondition may not be valid when we get to the subtraction, we have to atomically check that the value didn't change with a compareAndSet or redo.

public class WaterDispenser {
private AtomicInteger water;
public void fill(int amount) { water.addAndGet(amount); }
public void get(int amount) throws InsufficientWaterException {
int currentWater;
int nextWater;
do {
currentWater = water.get();
nextWater = currentWater - amount;
if (nextWater < 0) throw new InsufficientWaterException();
} while (!water.compareAndSet(currentWater, nextWater));
}

It becomes more and more complex to analyze and avoid race conditions the more you need to do, consider for example extending the WaterDispenser to become a CoffeeMachine, when you have to use coffee beans as well as water.

The State pattern is used to make it easier to analyze and express state-dependent behaviour. Not surprisingly, I suppose, it can also be used to simplify concurrent state-handling, although perhaps with a stricter interpretation of "state-dependent behaviour".

public class CoffeeMachine {
private AtomicReference<CoffeeMachineState> state = new AtomicReference<CoffeeMachineState>(new CoffeeMachineState(0,0));

public void fillBeans(int beans) {
state.get().fillBeans(beans);
}
public void fillWater(int water) {
state.get().fillWater(water);
}
public Coffee makeCoffee(int water, int beans) throws OutOfWaterException, OutOfBeansException {
return state.get().makeCoffee();
}

private class CoffeeMachineState {
private final int water;
private final int beans;
CoffeeMachineState(w, b) {
water = w;
beans = b;
}
void fillBeans(int moreBeans) {
if (!state.compareAndSet(this, new CoffeeMachineState(water, beans+moreBeans)) {
state.get().fillBeans(moreBeans);
}
}
void fillWater(int moreWater) {
if (!state.compareAndSet(this, new CoffeeMachineState(water+moreWater, beans)) {
state.get().fillWater(moreWater);
}
}
Coffee makeCoffee(int w, int b) throws OutOfWaterException, OutOfBeansException {
if (water < w) throw new OutOfWaterException();
if (beans < b) throw new OutOfBeansException();
if (!state.compareAndSet(this, new CoffeeMachineState(water+moreWater, beans)) {
return state.get().makeCoffee(w,b);
}
return new Coffee(w,b);
}
}
}
Again it would probably be better to just sychronize the methods in this case, but it serves to illustrate the pattern: delegate calls to the current state, which either manages to change the state or delegates to the new state if another thread got in first.

Back to the lamp example, which is a classic for the state pattern, but we'll add some functionality. The lamp is placed in a room and when we turn it on we should illuminate the room and when we turn it off we should darken the room. Now we have a gap between the state change and the room operation, a gap where another thread can come in and wreak havoc. The solution is to introduce a transitional state that holds other threads at bay while you complete the transition.

class Lamp {
private AtomicReference<Lamp> state = new AtomicReference<Lamp>(new Off());
private final Room room;
public Lamp(Room room) {this.room = room;}
public boolean isOn() {return state.get().isOn();}
public void setOn(boolean on) {state.get().setOn(on);}

private class On extends Lamp {
public boolean isOn() {return true;}
public void setOn(boolean on) {
if (on) return;
synchronized (Lamp.this) {
TransitionGuard guard = new TransitionGuard();
if (state.compareAndSet(this, guard) {
room.darken();
state.set(new Off());
} else {
state.get().setOn(on);
}
}
}
}

private class Off extends Lamp {
public boolean isOn() {return false;}
public void setOn(boolean on) {
if (!on) return;
synchronized (Lamp.this) {
TransitionGuard guard = new TransitionGuard();
if (state.compareAndSet(this, guard) {
room.illuminate();
state.set(new On());
} else {
state.get().setOn(on);
}
}
}
}

private class TransitionGuard extends Lamp {
public boolean isOn() {
synchronized (Lamp.this) {
return state.get().isOn();
}
}
public void setOn(boolean on) {
synchronized (Lamp.this) {
state.get().setOn(on);
}
}
}
}

Synchronizing would be easier, but with this pattern we only pay the synchronization cost during complex state transitions, although we do get a certain cost for object creations, but this is generally very low.

One final word of caution: if using an object involves querying more than one of its stateful values, that is also a potential race and you should consider returning an immutable object representing a snapshot of all the queryable state instead.

Thursday, March 6, 2008

Is synchronisation expensive? Is that the right question?

Sometimes I find that I come across a lot of pieces of information which together end up pointing in a certain direction. If you've ever read "The celestine prophecy" you might term this "synchronicity", which, by coincidence (or synchronicity), fits very well with the current topic :-).

The trigger point for this post was that a colleague of mine got into a discussion with a speaker at jFokus about what happens when you "synchronize" (the speaker even blogged a bit about it). Now I know that my colleague usually knows what he is talking about, so what is it? Is synchronization expensive or not?

When I asked my colleague he said that grabbing an uncontended lock is on the order of 10-8 seconds. As I had been benchmarking a hash set implementation, I had everything set up to test (and confirm) this. On my machine the overhead for grabbing an uncontended lock (putting the synchronized keyword on a method) was 70ns. Now that sounds pretty cheap, doesn't it? (I have subsequently found out that there are strange issues when running benchmarks. Running again in a supposedly more correct way, the cost of synchrnization would seem to be more like 7ns, or rather, with other noise involved, between 1 and 30, mostly under 10. So my colleague was dead on.)

Well, considering that I'd just been very happy about reducing the execution time of the method from 30ns to 20ns, the prospect of adding a 70ns overhead was not appealing. So synchronization is expensive, then?

Turning it around again, how much does it cost you to get stale data in some threads (the data may actually never be updated for some threads)? How much does it cost you if your object gets into an inconsistent state? In that perspective synchronization is usually very cheap.

Swing tries to do without synchronization by mandating that all access to display state needs to be done from the EDT. I sometimes see this expressed as all modifications need to be done on the EDT, but because how the java memory model publishes modifications, I am fairly sure that even querying the state of components needs to be done on the EDT. I'm beginning to think that the Swing threading model causes a viral spread of things to be done on the EDT which can only be broken by having synchronization at some point. This model is perhaps so subtly difficult to handle that it should be considered broken.The conclusion must be that you have to do what you have to do to protect yourself against the bad effects of unprotected access to shared mutable state and asking whether synchronization is expensive does not necessarily lead you in the right direction. A better question to ask would be "Can I do this in some other way than by accessing shared mutable state?"

Accessing shared mutable state is really the issue to look at (and where my synchronicity is leading me) and it is fraught with more dangers than just avoiding race conditions. Spot the bug (and I don't mean the obvious one that the access to lineNumber isn't protected):

final int[] lineNumber = new int[1];
while (in.hasNextLine())
{
final String line = in.nextLine();
lineNumber[0]++;
EventQueue.invokeLater(new Runnable()
{
public void run()
{
statusLine.setText("" + lineNumber[0]);
textArea.append(line);
textArea.append("\n");
}
});
}
(Background to this example). An example of avoiding access to shared mutable state can be found in my first blog post.