Tobe expressed

Saturday, July 28, 2018

Java enums are not constants

Every now and then I come across a person who points out that my enum values should be written as all upper case with underscores because in their minds an enum is a constant. I find myself disagreeing but haven't previously managed to explain why.

Historically in java we would use constants where we in other languages would have used an enum, so it doesn't seem unreasonable to consider enums to be constants. And yet they are not.

Consider the following code where we have tacos on a Friday as we like to do in Sweden:

class DinnerPlanner {
  Menu createMenu() {
    if (today.equals(DayOfWeek.FRIDAY)) {
  void buyIngredients() {
    if (today.equals(DayOfWeek.FRIDAY)) {
      buy("taco shells", "tomatoes", ...);
  void cook() {
    if (today.equals(DayOfWeek.FRIDAY)) {
      oven.put("taco shells");
      fryingPan.put("mince").put("taco spices");

After some time we get more and more influences from the US and we want to make tacos on Tuesday instead.

And now it should be evident: DayOfWeek.FRIDAY is not a constant, it is a hard-coded value. We could introduce a constant:

  static final DayOfWeek TACO_DAY = DayOfWeek.FRIDAY;

Now we can just reassign the constant TACO_DAY to the value DayOfWeek.TUESDAY.

In many other languages, it is possible to say that an enum has a value, e.g. DayOfWeek.FRIDAY might correspond to the integer value 5 and we can use FRIDAY or 5 as we see fit, but not so in java where FRIDAY is the value in the APIs and we are actively discouraged from thinking about its ordinal value in the list.

Consider also my previous post where instead of switching on an enum value to determine the code, we can actually implement the code in the enum.

Still think a java enum is a constant?

Sunday, March 18, 2018

WTF-debugging: the case of the unfortunate design choices fooling perception

A few of my former colleagues at Google really love golang so I decided to start playing with it. I highly recommend the interactive tour to get a quick sense of it. It's a fairly nice language, simple but with the parts you really need, feels nice and "javascripty" in object creation but still structured and typed strictly enough.

However, there is at least one case where the desire to avoid too prescriptive syntax results in an unfortunate combination of design choices leading to WTF-debugging.

Consider (run it, it prints 1,2,3,4). Now change line 12 (which could have been defined much further away, even in another file) by adding an asterisk in position 8 before Payload, to read:

func (p *Payload) UploadToS3() error {

Run it again and observe 4,4,4,4! Note that the loop where this happens is unchanged, but the loop variable "payload" has magically been changed from a value to a pointer. Spooky action at a distance now causes a different part of your program to be wrong. Keep staring at the loop and you will never figure it out.

    for _,payload := range payloads {
        go payload.UploadToS3()

In java we would always know what is a value and what is a reference and we are of course also saved by the fact that variables used in closures have to be final (or effectively final). And in a functional language the variables would be immutable so this would never happen there either. In javascript, though, we deal with this all the time, so a javascript programmer might be more confused that the first version actually worked. One of the problems in go is that we can have either values or pointers, but we don't have to be explicit about it because the compiler is too helpful. Another problem is that it is unclear what code is executed now and what code is executed later. The word "go" is (at least not yet) a strong signal to my mind that the code after it is actually executed later. I have to go into deep analysis mode before I have a chance to perceive that. Consider the difference if we had been forced to write a little more, would it have been clearer?

    for _,payload := range payloads {
        go func() {

I think that is slightly clearer, but I think it still indicates the problem with inline closures for asynchronous computing. In java I would tend to recommend to never use anonymous inner classes, but take the time to make it a named inner class, defined elsewhere, so you have a better chance of realizing that the code will execute at another time. I have seen otherwise excellent coders stumble on this temporal misperception.

Interestingly, the perceptual difficulty of when code executes can also go the other way. When using Mockito, the below code doesn't immediately signal your brain that the bar() method actually gets called.


Friday, February 23, 2018

WTF-debugging: the case of the obscure configuration

Ever stared at a piece of code without understanding why in the world it does not work? Or why it actually works at all? I'd like to call this phenomenon WTF-debugging and I've been remarkably free of it since I've been doing framework-free backend java. But now I have a new job and we use all the popular frameworks, for better and for worse. Well, really only for worse, IMO, but I will write more on that when I understand my aversion better. So far, I have discovered that the tendency to want to write frameworks is very strong because it is the ultimate intellectual masturbation. The tendency to want to use frameworks is more puzzling, but I think we are all attracted to magic to some degree and there is a powerful illusion that frameworks provide a lot of value, automagically.

We are working with JSON in Java and using Jackson, and I had a little problem where one of the fields of the main object could be a different type depending on what the object represented, as indicated by a type name in another field. So I had to work out how to configure Jackson to handle it, which turned out to be a little challenging. After an hour or so I hit upon a fruitful phrasing of the search terms and found a solution.

@JsonDeserialize(builder = Attachment.AttachmentBuilder.class)
public class Attachment {

    private final long id;
    private final String attachmentType;

    public interface ExcelSheets extends List<ExcelSheet> {}

    private static class ExcelSheetsImpl extends ArrayList<ExcelSheet> implements ExcelSheets {}

    @JsonTypeInfo(use = Id.NAME, property = "attachmentType")
    @JsonSubTypes(value =  {
            @JsonSubTypes.Type(value = ExcelSheetsImpl.class, name ="excel")
    private final Object typeSpecificData;

    Attachment(long id, String attachmentType, long reportId, String filename, Object typeSpecificData) { = id;
        this.attachmentType = attachmentType;
        this.typeSpecificData = typeSpecificData;

    public long getId() { return id; }

    public String getAttachmentType() { return attachmentType; }


    public Object getTypeSpecificData() { return typeSpecificData; }

    public AttachmentBuilder builder() { return new AttachmentBuilder(this); }

    public static class AttachmentBuilder implements Builder {
        private long id;
        private String attachmentType;
        private Object typeSpecificData;

        public AttachmentBuilder() {}

        AttachmentBuilder(Attachment attachment) {
            this.attachmentType = attachment.attachmentType;

        public AttachmentBuilder withId(long id) {
   = id;
            return this;

        public AttachmentBuilder withAttachmentType(String attachmentType) {
            this.attachmentType = attachmentType;
            return this;

        public AttachmentBuilder withTypeSpecificData(Object typeSpecificData) {
            this.typeSpecificData = typeSpecificData;
            return this;


        public Attachment build() {
            return new Attachment(id, attachmentType, reportId, filename, typeSpecificData);

Now I could be happy with that and sing the praises of Jackson and "look how elegantly it got configured". But should I?

Even when I have this solution before me, I still can't quite figure it out from the documentation (WTF?), so what will happen in six months time when I have to modify this code?

And here comes an even bigger WTF: change the type "Object" for typeSpecificData to "ExcelSheets" and deserialization no longer works! (What I really wanted to do was to introduce a marker interface, TypeSpecificData, but, as you can surmise, that didn't work either.)

Even though Jackson is (sadly) perhaps the easiest way to handle JSON in Java, I think there may be good reasons besides the above to avoid using it. I won't go into that in detail in this post, but I will leave with this thought: Jackson or any other automagic data-binding framework entices you to create Java objects that match the serialized JSON, but your serialization format is not your internal model, at least not forever, because the two have different reasons to change. So even after you get a deserialized object from Jackson, you should probably write lots of code to transfer the data into your internal representation. Then what did you gain?

Wednesday, January 5, 2011

Prove your assumptions, but remember Murphy was an optimist

Continuing on my unremarkable coding task, having gotten it to work, it was now time to clean up the code.

Given the large load expected, I couldn't allocate a new direct ByteBuffer for every piece of data I wanted to handle. But code that accepts a ByteBuffer is equivalent to code that accepts a byte array, a starting offset and a length (or a limit), right? So I just set the position and the limit on the boundaries of the data and I'm good to go.

Now I ran into some of my own previous assumptions. Luckily, the failure resulted in a log message that made it clear where something was going wrong. It was one of those places where a deviation from the happy path either cannot happen, should not happen or we didn't really care too much. A place where I used to code an empty block "{}", or when I was feeling diligent, with a comment "{/*cannot happen*/}". Now I'll code it as "{throw new AssertionError("Cannot happen.");}" or at least "{LOG.fine("Don't care.");}".

Anyway, a quick analysis showed a likely cause. I say likely, because it's an assumption until proven. I wrote a quick unit test that failed, fixed the code and the test passed. But the program still failed, with the same log message as before. Imagine what I might have been doing if I hadn't proven the failure and the fix with the unit test, I'd still be staring at the same place and possibly ripping my code to shreds in desperation. Now I could immediately move on and fix the second error.

OK, on a roll now, all assumptions proven by assertions, I'm just about done. Except that the code still doesn't work and no sign of why. Murphy's law in full action.

Finally I find it. There's a trap in ExecutorService. What's the difference between calling "execute(Runnable)" versus "submit(Runnable)"? Nothing much, when the code works. But "submit(Runnable)" should have a big red warning sticker. It returns a Future, with no result. You don't bother to "get()" nothing. The devastating side-effect is that all exceptions get preserved until "get()" is called, so this is a hidden equivalent of "catch(Exception e){}". Next task: change this everywhere and add a rule to FindBugs.

Saturday, January 1, 2011

Your assumptions are dangerous, you know too much.

I have just completed a rather unremarkable piece of code. The system was designed to allow this type of addition, so it just took a couple of hours or three to write the code and touch up the parts to selectively enable the functionality by user.

Then a colleague and I spent two days debugging. We hacked around issue after issue, just to prove basic workability before trying to solve the issues.

The final obstacle was that our socket channel was not being selected for reading on the selector. So I hacked around that by creating a new thread to loop infinitely around reading the channel. Which made things work fine up to the point where actual data was coming in and the server blew up with a segmentation fault.

After reflecting while travelling on the bus home, I remembered that perhaps our JNI code needed a direct byte buffer, with data starting at position 0 and the whole backing array available for use. Inconvenient in this case, but another little hack and everything worked like a charm.

Back to the previous question: why no reads? Perhaps my server socket in non-blocking mode actually returned a socket configured for blocking mode instead of for non-blocking mode? It did, and explicitly setting it to non-blocking fixed it.

As I backtracked through the issues, it struck me that every single problem we'd run into had to do with assumptions.

I could of course have checked my assumption about the blocking mode. But I also have another assumption, which is more valid: incorrect usage of an API should not fail silently. This turns out to be correct, because a SelectableChannel throws an IllegalBlockingModeException.

Unfortunately, the "helper" framework that we have in place inadvertently masked that by running the register call in a FutureTask that had a boolean "success" return value that nobody had found the need to check, because "true" was the only possibility. Well, that is, assuming no exceptions are thrown.

Perhaps there is also a flaw in the assumption that a helper framework that obscures the standard API is actually helpful.

Certainly, the direct byte buffer assumption mentioned above should probably have been asserted somewhere, it's easy enough to throw an IllegalArgumentException if buffer.isDirect() returns false. Obviously, the programmer who created the JNI call was not assuming we had a direct byte buffer, he knew we had one. But that's the trick of maintainable and re-usable object oriented code: you cannot rely on any knowledge outside the class you are currently in. From the point of view of the class, such knowledge is an assumption.

Another issue I had hit on the way concerned the UserIdentifier class. It is really just a wrapper around a string, but because it has a specific semantic meaning it was correctly exposed as a separate value class. To limit the new functionality by groups of users, I found it convenient to construct the user identifier slightly differently. The code did not work as expected.

At another point in the code, a programmer had used his knowledge of how the user identifier was constructed, which introduced a hidden assumption about the structure of the user identifier. The correct code would have been to obtain the user identifier from a single authoritative place.

The root cause of the user identifier hack was a design assumption that two things were independent. When they were not, a co-dependency web was created. In a sense, this is also a case of classes having too much knowledge: they knew about too many different other classes that they needed to collaborate with. To keep your code clean and honest, such things need to be refactored (re-designed).

Consider the small amount of care and time it would have taken to avoid the assumptions in the first place and compare it to the four man-days of lost productivity that was caused. We are always under time pressure, but that will be the case next week, month or year as well. If you don't pay the price now, you pay it with interest later.

The more things your classes know about your system, the harder it is to change or re-use the code. Make sure that the knowledge you put into a class is appropriate and doesn't create a dangerous web of assumptions.

Friday, August 28, 2009

Clarity of code is clarity of thought

I remember at my first job when we introduced the concept of code reviews. I don't think anybody really looked at anybody else's code before pressing the "approve" button, I know I didn't. Reading code is boring and it can be hard and it felt like a waste of time. I had my own code to write and why shouldn't the author of the code have written it right in the first place?

When I moved to another job, they talked about how, in theory, code reviews was the number one most effective way to increase code quality. But they had given up on it, because they felt it came down to a discussion of where to put the dots and commas (or, rather, semi-colons and parentheses).

Quite aside from the issue of code reviews, I had come to realize that I spent much more time reading my code than I spent writing it. Every debugging session is spent reading code over and over. Every time you have to add a feature or change some functionality you have to read the code, and re-read it to avoid breaking stuff. Don't tell me tests will do it for you. Now don't get me wrong, tests are great and I strongly advocate test-first coding, it's a great way to achieve focus and clarity of thought. But when a test fails, you're thrown into debugging mode, which means reading code.

So I concluded it was worth spending a little extra time typing longer variable names, and taking the time to find descriptive names. It was worth spending a little more time breaking down those long methods and simplifying those complex structures. Whenever I was reading code that made me stop and think, I would usually refactor it to be clearer (although the term refactoring hadn't been invented yet). I would also change existing code to make a new feature fit in better, in a more readable and more logical way.

In "The Pragmatic Programmer" the distinction is made between "programming by coincidence" and "programming by intention". We all have to do it occasionally, a little trial-and-error programming, because we're not quite sure how things work. That's "programming by coincidence". Before you check your code in, you want to make sure you understand what each statement does and that all unnecessary statements are removed. Then you've transformed the code from coincidental to intentional.

But that's not enough. Your very functional and intentional code is going to lose you valuable time unless you also transform it to readable code, which clearly displays your intent.

A much-touted wisdom is that you should document and comment your code. Fair enough, that works, but it has many weaknesses. Your energy is far better spent making the code explain itself. Comments often lie, but the code is always pure truth, so prove it in code. Only comment on "why" you are doing something, and that only if you cannot make it evident in the code.

I like the following sound-bite from "Refactoring": "Any fool can write code that a machine can understand, it takes a good programmer to write code that a human can understand."

Test-first "anything" is efficient and focused because it sets up the criteria for success and the means to measure it up front. So what's the best way to test if your code is readable? Get another person to read it, i.e. a code review.

I'm very grateful to those who review my code carefully and pick on every detail, it makes the code better and it helps assert that my thinking was clear. That gratitude gives me the energy to return the favour by reviewing their code equally mercilessly.

You will sometimes, but rarely, find bugs by just reading code (only because everybody has a brain-fart now and then). But the real value of the reviews is in the "dot and comma" discussions and especially in picking good names. In addition to making sure that the code is easy to read, it will sometimes bring a real little nasty bug to the surface.

An example: An index into an array of values is stored into a variable called "value". When the reviewer makes you change the name to "valueIndex" instead, some parts of your code may start to look weird (the bug was exposed).

Clarity of code really is clarity of thought.

Thursday, December 25, 2008

Using Java concurrency utilities

The inspiration for this post comes from Jacob Hookom's blog and I can only second the recommendations he gives. Although, as always, I would caution to test any such implementation properly, that it works well and actually provides a benefit. There are lots of pitfalls and concurrency is tricky even with the excellent utilities provided in Java.

To summarize the interesting problem: parallelize the execution of lengthy tasks in a web request, without creating many threads for each request, but also ensuring that the thread pool is not starved by one request. The idea is to have a reasonably sized thread pool and to limit the number of tasks executing in parallel to a number small enough to allow the expected amount of concurrent requests to share the pooled threads.

Essentially, limiting the number of tasks executing in parallel can be done in two ways: limit the number of tasks submitted at one time or limit the number of workers that execute a set of tasks. Jacob takes the first approach, I will take the second approach, which seems to make it simpler to manage time-out issues.

Here's some code:

<V> Queue<Future><V>> submit(int numberOfWorkers, Queue<Callable><V>> tasks,
long timeout, TimeUnit unit)
throws InterruptedException, TimeoutException {
Queue<Future><V>> result = new ConcurrentLinkedQueue<Future><V>>();
List<WorkerTask><V>> workers = new ArrayList<WorkerTask><V>>(numberOfWorkers);
for (int i = 0; i < numberOfWorkers; i++) {
workers.add(new WorkerTask<V>(result, tasks));
List<Future><Object>> deadWorkers
= executor.invokeAll(workers, timeout, unit);
for (Future<Object> obituary : deadWorkers) {
if (obituary.isCancelled()) {
throw new TimeoutException();
return result;

And the code for a WorkerTask:

private static class WorkerTask<V> implements Callable<Object> {

private Queue<Callable><V>> tasks;
private Queue<Future><V>> result;

public WorkerTask(Queue<Future><V>> result, Queue<Callable><V>> tasks) {
this.result = result;
this.tasks = tasks;

public Object call() {
for (Callable<V> task = tasks.poll(); task != null; task = tasks.poll()) {
FutureTask<V> future = new FutureTask<V>(task);;
if (Thread.interrupted()) {
Thread.currentThread().interrupt(); // Restore interrupt.
return null;

Note that it is important to have thread-safe collections for tasks and result, we should actually make sure that the tasks are in a thread-safe collection, but I'll ignore that for now. Note also the check if the thread has been interrupted in the call() method of WorkerTask. That is vital to be able to cancel the task when you don't want to wait for it any longer (i.e. on time-out). If possible, the submitted tasks should also handle interrupts. Note the careful restoration of the interrupt status so that the caller of the method may also be notified.