Then a colleague and I spent two days debugging. We hacked around issue after issue, just to prove basic workability before trying to solve the issues.
The final obstacle was that our socket channel was not being selected for reading on the selector. So I hacked around that by creating a new thread to loop infinitely around reading the channel. Which made things work fine up to the point where actual data was coming in and the server blew up with a segmentation fault.
After reflecting while travelling on the bus home, I remembered that perhaps our JNI code needed a direct byte buffer, with data starting at position 0 and the whole backing array available for use. Inconvenient in this case, but another little hack and everything worked like a charm.
Back to the previous question: why no reads? Perhaps my server socket in non-blocking mode actually returned a socket configured for blocking mode instead of for non-blocking mode? It did, and explicitly setting it to non-blocking fixed it.
As I backtracked through the issues, it struck me that every single problem we'd run into had to do with assumptions.
I could of course have checked my assumption about the blocking mode. But I also have another assumption, which is more valid: incorrect usage of an API should not fail silently. This turns out to be correct, because a SelectableChannel throws an IllegalBlockingModeException.
Unfortunately, the "helper" framework that we have in place inadvertently masked that by running the register call in a FutureTask that had a boolean "success" return value that nobody had found the need to check, because "true" was the only possibility. Well, that is, assuming no exceptions are thrown.
Perhaps there is also a flaw in the assumption that a helper framework that obscures the standard API is actually helpful.
Certainly, the direct byte buffer assumption mentioned above should probably have been asserted somewhere, it's easy enough to throw an IllegalArgumentException if buffer.isDirect() returns false. Obviously, the programmer who created the JNI call was not assuming we had a direct byte buffer, he knew we had one. But that's the trick of maintainable and re-usable object oriented code: you cannot rely on any knowledge outside the class you are currently in. From the point of view of the class, such knowledge is an assumption.
Another issue I had hit on the way concerned the UserIdentifier class. It is really just a wrapper around a string, but because it has a specific semantic meaning it was correctly exposed as a separate value class. To limit the new functionality by groups of users, I found it convenient to construct the user identifier slightly differently. The code did not work as expected.
At another point in the code, a programmer had used his knowledge of how the user identifier was constructed, which introduced a hidden assumption about the structure of the user identifier. The correct code would have been to obtain the user identifier from a single authoritative place.
The root cause of the user identifier hack was a design assumption that two things were independent. When they were not, a co-dependency web was created. In a sense, this is also a case of classes having too much knowledge: they knew about too many different other classes that they needed to collaborate with. To keep your code clean and honest, such things need to be refactored (re-designed).
Consider the small amount of care and time it would have taken to avoid the assumptions in the first place and compare it to the four man-days of lost productivity that was caused. We are always under time pressure, but that will be the case next week, month or year as well. If you don't pay the price now, you pay it with interest later.
The more things your classes know about your system, the harder it is to change or re-use the code. Make sure that the knowledge you put into a class is appropriate and doesn't create a dangerous web of assumptions.