Tobe expressed

Exploring the error handling concepts for programming languages

2025-08-24T07:34:00.000-07:00

This is a follow-up to my previous post about concepts in programming languages, expanding the analysis to explore concepts around errors: raising, detecting and handling them.

It turns out that error handling isn't necessarily a separate concept, there is only one concept necessary for error handling:

Selection - Purpose: Allows a choice of which units to process and which to ignore. Operational Principle: If you specify the selection criteria, the matching units will be processed.

The failing code selects whether to error or not and the calling code selects on the result how to proceed.

When execution of some code can result in an error instead of a result of the expected kind, it is extremely useful for programmers to know about it, so another important, though not strictly necessary, principle is:

Documentation - Purpose: Allows communicating aspects of the design and purpose to a future reader of the program. Operational Principle: If you specify the relevant information, a future reader will receive it when needed.

There are many different reasons for reaching an error condition, as will be explored below. It is probably desirable to distinguish between them, in particular it is probably desirable to distinguish the types of errors that indicate the program itself is incorrect, so perhaps we can use a verification mechanism.

Verification - Purpose: Allows checking correctness criteria of the program. Operational Principle: If you specify a criterion and run the verification procedure (which may be built in), a warning/failure is issued if the program does not fulfil the criterion.

The implementations of verification that I looked at then were tests, contracts, types and proofs. But I think verification is more general than that and also obviously extends to assertions. Following that train of thought, raising or throwing an error is also a verification, by always issuing a warning/failure when that code path is followed.

Obviously, verification can and is also used to ensure that errors are handled correctly.

While the concepts for raising and handling errors are not new, the implementations of them for errors may be different than the implementations previously explored.

In this article, a new highly useful concept will be introduced:

Checkpoint - Purpose: Saves a known program state Operational principle: When a line of computation starting from the checkpoint is abandoned and returned to the checkpoint, all changes made in that line of computation will be forgotten and not impact any continued computation.

With complete or partial rollback to a known checkpoint, reasoning about state becomes a lot simpler.

Existing error implementations

Error codes, result values and exceptions are all just different implementations of ways to enable selection on error versus normal return.

Error codes returned on the side can easily be forgotten. Sum type result values are better at documentation and must usually be handled or explicitly ignored (by type verification).

Exceptions allow selection to happen further up the calling scope, checked exceptions force explicit handling by verification.

While the erroring computation is in effect abandoned with no result, it is usually left to the programmer to restore the state, completely or partially, to a known checkpoint.

Checkpoint

The first thing that comes to mind is that checkpoints are a part of transactions, but on further examination, it turns out that checkpoints are everywhere.

You make backups to allow rollback to a previous checkpoint. The code versioning system allows rollback to a previous checkpoint.

Immutable persistent data structures are checkpoints. No matter what computations and derivations are done, you will always rollback to that known state when you get back up the execution stack.

I want to argue that idempotency is essentially a checkpoint mechanism. Not that you actually rollback, but you can reason as if you had, as long as you retry the idempotent operation until it is confirmed to be successful.

The error handling philosophy of Erlang to let the process crash and restart is a form of rollback to a checkpoint. This is necessary, because when an error happens, nothing can be assumed to be known.

An interesting vision of checkpoints is the "worlds" of Alan Kay

The Midori error model

Joe Duffy has written about the experience from the Midori project, including the error model they arrived at and why. The goal was to have an error model that was usable, reliable, performant, concurrent, diagnosable and composable. The feeling was that they largely succeeded and that programmers were very happy with it.

They chose exceptions as the basic implementation because they do not cause overhead on the normal "hot" path.

Abandonment

One key observation was that bugs aren't recoverable errors, so any condition that arose from a possible bug should lead to abandonment of the process, without any ability to catch the error or attempt any form of recovery.

When a bug occurs, the program is in an unknown state that the code has not been equipped to handle, so it would actually be impossible to be sure about any state after attempted recovery. This corresponds with the Erlang philosophy and Google coding practices.

Abandonment is annoying enough that the code will eventually get fixed, while error logs can be ignored for months.

This practice led to very stable code and a sense of safety for the programmers, whose trust in the code increased.

Note that this complete abandonment is often considered too disruptive to the system as a whole, because most computations could still function properly. The mitigation is to have microprocesses that are automatically and swiftly restarted when abandoned, which then effectively creates a rollback mechanism to the starting state checkpoint.

Recoverable errors

Once the things that are probably bugs are out of the way, there is a small amount of recoverable errors left. (Abandonments outnumbered recoverable errors by 10 to 1)

The decision was to implement them as checked exceptions, but to simplify to basically only one exception type (with an option to create others if deemed absolutely necessary). This is interesting because it reduces the detail level of documentation. The success of this idea implies that there is such a thing as too much documentation.

Also, a thing that was very helpful was to force the use of a try keyword on every call to a function that could fail. This made it much easier to reason about the code (increased documentation).

While result values that are either a result or an error could have worked equally well, exceptions only have an execution cost on the exceptional path, while result values have an execution cost on the successful path as well.

There was a way to convert between exceptions and result values when a more dataflow type of syntax was desired.

Other interesting patterns

Having undeniable exceptions that could only be suppressed by a catch block holding the right token turned out to be a useful pattern when you needed cancellation of a computation (or aborts as they called it). The token prevented any intervening code in the callstack from catching the exception. Note that a checkpoint would probably be desirable here.

Opt-in try APIs: Some cases where abandonment was used could in rare cases be a case where you want to attempt the calculation and fallback to do something else if it fails. For those they identified, a separate opt-in API for trying the call was provided. (This is the non-error validation case mentioned above)

"Keepers" were a pattern where an object was set up to be able to "fix" certain errors on the spot, for example providing a fallback file if the desired file could not be found. Instead of unwinding the whole stack to catch the exception, they were made available to be called at the point of throw.

Different types of conditional failures

It might be worth exploring this from the angle of what different types of conditional failures exist and what they mean, in order to determine suitable implementations. The best analysis of conditional failure types that I have seen so far is in the Google Guava documentation.

"The code I'm testing messed up"

This type of failure would normally not be raised at runtime, but rather in testing or analysis stages before the code itself runs. Typical implementations here would be tests and type checks. Also, typically, the failures will not be handled in code, just reported to the programmer/user.

In cases where tests are integrated in the code, they could be automatically run before starting the program, inhibiting running a failed program. Pyret does this.

"You messed up (caller)"

This can sometimes be determined through static analysis of types, but for more powerful checks this would generally need to be checked at runtime. It corresponds to the precondition part of a contract specification.

If a failure is issued for this reason, it would tend to indicate a programming error, so it might not be useful to be able to detect and handle these errors in code.

On the other hand, if you want to call a procedure only when the precondition is satisfied, otherwise do something else, should you, the caller, be forced to duplicate the precondition check or could you just "catch" the failure as a selection signal?

"I messed up"

Quite obviously programming errors, so abandonment would probably be the only strategy.

Postcondition checks of a contract are the first thing that comes to mind. Also invariant checks and other assertions about what the programmer believes is true at that point in the code.

Dereferencing a null pointer is another typical example.

"Someone I depend on messed up"

Very similar to "I messed up", but instead of verifying that your code worked, you check that someone else's code, a dependency, (still) works in the way you expect and require.

Abandonment is probably the only reasonable action, but it may be useful for debugging purposes to distinguish the two.

"What the? the world is messed up!"

Very similar to "I messed up" and "Someone I depend on messed up", but this distinguishes impossible things that should not be able to happen at all, according to our model of how things work.

Abandonment is probably the only reasonable action because you cannot really reason about anything at this point. Again it might be useful to distinguish for debugging purposes.

Background: at Google, second-rate hardware is used successfully with the understanding that you may need to check this kind of thing and just bail out and try again (usually somewhere else) if they happen.

"No one messed up, exactly (at least in this VM)"

Finally we reach the only type of error that you may need to be aware of and handle with a backup strategy. This is the case where things did not work out the way you expected, but the conditions are too complex to control for, or completely out of your control. Examples:

A file did not exist where it was supposed to
A web service did not respond as expected
The input data was incorrectly formatted and could not be interpreted.

My learnings from this

Performance

You don't want error handling to affect the performance of normal processing.

However, that does not necessarily affect syntax and semantics, as long as the compiler can distinguish the cases. Also, dynamic branch prediction will reduce a lot of the overhead.

Errors

By error, I mean a condition has been detected that indicates that the code itself is flawed (or the setup/infrastructure in which it runs, such as memory allocation).

Whenever an error happens such that the logic of the code is no longer certain because the state of things is unexpected, the program or process needs to return to a known checkpoint.

Preferably enough information is gathered to understand how to fix the problem.

From experience, there is nothing more damaging to a codebase than ignoring an error signal/log, because it demoralizes the engineering team. Likewise, don't keep a backlog of things you probably will never do. If something is serious enough to fix, the problem will be rediscovered. So make sure the error log is clean, either fix an error or stop reporting it as an error.

Since it is too easy to ignore error logs, especially in development, I prefer that the program just crash, which makes a loud enough noise that the error will be fixed. The Midori experience seems to confirm that, as do Erlang programmers.

Note that crashing the whole program is often considered too severe for huge monolithic programs, but we probably shouldn't build those anyway. There is after all a reason why people keep coming up with independently deployable subsystems.

Non-error failures

There is a greyzone where a result is not quite a success, but not really an error either, or at least it is not unexpected that a computation will fail.

There is a often a need to define fallback strategies in the face of failing. I'm still unsure of how valuable it might be to configure this in an outer context instead of just having a local fallback strategy.

In existing programming languages, these cases often end up at least partially attached to the error system, which they probably shouldn't be.

Fruitless search or unknown value

This is where null and friends come into play. Even though Tony Hoare calls out his billion-dollar mistake to allow null everywhere by default, we can't get away from having to handle the absence of a value and null is easy to grab for.

It was very easy for programmers to miss handling the null case, which is a program flaw, so modern type checking will force its declaration as an optional value and force handling of it.

Most of the time we don't want to do anything with a missing value and I have previously written about the usefulness of simply not emitting any value at all in those cases.

Validation or test and fallback

There are times when the programmer cannot know beforehand whether data is valid or not and needs to check it first.

For optional values, there have arisen a number of convenient ways to reduce the boilerplate of checking, such as the elvis operator and the null-coalescing operator.

When parsing a string to a number, for example, it is reasonable to expect that some strings may not be numeric. Should the programmer be forced to code up a test when the desired test already exists in the parser?

Previously it was common to just piggyback on the error system and the programmer would add error handling for it.

What about the cases when the programmer knows the string is numeric? Should testing or error handling be forced anyway, with an "impossible case" declaration (I've done this a fair amount). Or should the call just proceed and abort the program on failure to signify a bug?

There should be a more lightweight way to utilize existing precondition checks when the uncertainty is known.

Aborts and cancellations

An infamous case is Java's InterruptedException, which looks and smells like an error, but definitely isn't and should be handled completely differently. It is just a way to propag that an interrupt flag has been set on the current thread, that is, a cancellation has been requested.

In the case of cancellation from the outside, there is a need for a way to pass in a signal that the executing code can check.

Whether by cancellation or detection of another condition, the thread of execution should be abandoned and returned to a safe checkpoint.

Usability

There seems to be a tension between too much and too little of a property, which reminds of the cognitive dimensions of notation.

Optional handling

Allowing error codes to be optionally handled or ignored leads to low viscosity, but also generally low visibility. Handling is often forgotten and errors can be very hard to debug.

The experience with C does not encourage this.

Explicit handling

Explicit local handling, such as sum result types or checked exceptions (or effects), increases visibility, making the logic easier to analyze, but also increases viscosity, making the code harder to change, and can even act as an abstraction barrier.

Java's checked exceptions are disliked for this reason.

Option types are a little less tedious because the None type can be monadically bound to None results. They don't contain any information about the error, though, so not quite error handling, just allowing a "no result" result. In Tailspin that can be done by simply not returning a result at all.

Maybe Midori's use of one exception type strikes the right balance. The same could be done with a standard result type.

Local handling

Having to handle errors locally after each call give good visibility but also high viscosity with lots of repeated boilerplate up the call stack.

Go's error handling is disliked for this.

Remote handling

A catch statement (or effect handler) is essentially a COMEFROM, but even lower visibility because you can't even tell where the execution thread came from. To mitigate this, an exception will usually contain a stack trace to show where it came from.

Locally marking all possible sources, such as the obligatory try keyword on calls that could fail in Midori, helps increase visibility.

The ability to ignore an error at lower levels gives low viscosity for changing the code at those levels.

Summing it up

While I'm not entirely sure how to get all this into Tailspin, I have a pretty clear picture of how I want to handle errors and failures (which aligns well with how it already works):

Anything that smacks of a programming or configuration error should be a hard, uncatchable abandonment of the process. Probably an assert or abandon statement is desired to augment built-in checks.
There needs to be ways to signal intent to broaden the built-in checks, such as the elvis operator for missing values, or the "type bounds" that Tailspin already has on comparisons. I think I want a try operator for proceeding only with a function call if the precondition checks are successful and a fallback otherwise. Note the restriction to precondition checks, or errors of the "You messed up (caller)." kind. I don't want a catch-all for any error. Probably could introduce a reject statement for this case.
I think function entry points would be good checkpoints, and I want a way to rollback to them and at the point of call define an on rollback strategy. There is probably a need to be able to commit parts of an operation, like calling a web service. I'll have to think further on how the resulting partial rollbacks should be visualized.

When code doesn't communicate enough

2025-05-23T21:48:00.000-07:00

You have been assigned a code review, in Typescript, using Slonik to create semi-typed injection-safe SQL fragments. The purpose is to allow selecting rows where the foo column has a NULL value.

The following code:

function getFooCondition(values: string[]) : FragmentSqlToken {
  return sql.fragment`foo = ANY(${sql.array(values, 'text')})`;
}

is being changed into this:

function getFooCondition(values: (string | null)[]) : FragmentSqlToken {
  const parts: FragmentSqlToken[] = [];
  if (values.includes(null)) {
    parts.push(sql.fragment`foo IS NULL`);
    values = values.filter((v) => v !== null);
  }
  if (values.length) {
    parts.push(sql.fragment`foo = ANY(${sql.array(values, 'text')})`;
  }
  return sql.join(parts, ' OR ');
}

You check the tests, they look good, they add data to a testContainer database and verify that the condition works as expected. Test coverage is 100%.

Stop and consider what criticism, if any, you might have of the above change.

Looks pretty good to me!

Except that this code was a near-miss production incident as it risked disastrous leakage of data under some conditions (which luckily did not end up actually happening).

The situation

Here is a sketch of the code that is using the above function:

connection.query(sql.type(resultSchema)`SELECT ... FROM ...
WHERE ... AND ${getFilterConditions()}`);
...
function getFilterConditions() : FragmentSqlToken {
  const conditions = [
  ...
  getFooCondition(fooValues),
  ...
  ];
  return sql.join(conditions, ' AND ');
}

The fundamental issue is a mismatch between the assumptions of the calling code that conditions are "simple" (or at least joinable by AND), and the assumption of the modified code that it was fine to return a complex condition with an OR in it.

An oversimplification of the problem would be to dismiss it as just "bad assumptions". In hindsight, maybe the assumptions could have been avoided in this case. After all it wouldn't really matter if both the caller and the callee wrapped conditions in parentheses.

But some kinds of assumptions must always be made, so a lot may be learned by considering "how can those assumptions be communicated?".

The three levels of correctness

My friend Jimmy Koppel, founder of Mirdin, teaches that correctness of code can be evaluated on three levels.

The first level is that the code works on a particular run with a particular input. This level can be verified by a test case.

The second level is that the code works for all valid input. This is the level where the code could be shipped and there may be valid trade-offs for not going further, for example if the code will be thrown away. The computer scientist's dream scenario for verifying this level of correctness is through a formal proof.

It is often too much work to complete a full proof, but we can get some benefits from a type system. The stronger the type system and the types you define, the better the "proof", but there is probably a sweet spot beyond which stronger types become too restrictive to use daily.

Naming can help offset some deficits in the type system.

Usually, a probabilistic argument is made from the tests that are run, to make up for the deficits in the type definitions.

Yet another possibility, which sadly isn't integrated in most languages, is to define contract checks on preconditions required on inputs and postconditions required of outputs. They can be used to define more precise conditions than is practical or even possible with a type system. And they run on every call to the function, testing all real inputs, not just the fictional inputs in tests.

For completeness, contracts also usually include invariants on loops and classes, but IMO they start to border on too much effort for the value. We do have to consider these things, but formally specifying them feels at least an order of magnitude more difficult than informally reasoning about them. Maybe it gets easier with practice, and I have to admit I have not tried using them in real code.

In practice, type inference combined with some clever test cases is usually good enough for this level of correctness. Stricter typing and more tests do not seem to reduce bugs very much, but formulating them may help the programmer and reviewer think more clearly to better understand where bugs may be lurking.

The third level of correctness is that the code works for all future modifications of the code. An interesting observation here is that a formal proof doesn't necessarily say all that much about the next modification of the code. In order to achieve this level, we can say that the code must successfully communicate its design to the next programmer (which, as always, may be yourself in the future).

That isn't entirely satisfactory, because what constitutes design? And how does one communicate, really?

Peter Naur wrote an essay beginning with examples of the next set of maintainers failing to utilise the design despite ample and clear documentation. His suggestion is that it isn't the design itself, but the theory of the design, the reason the design is the way it is, that needs to be communicated. That is difficult, because it is lodged deep inside the original programmers' brains and cannot be easily described.

Improving communication in the example

What could have communicated the problem in our example? How could a signal have been given to the programmer or the reviewer?

Tests are one way of giving signals. In this case, it would be unlikely to be able to create a focused automated test for this problem because it would require setting up extraneous data that would be returned when the expectation was violated in precisely this way. If the programmer knew to set up a test to pre-empt this problem, surely they would have just fixed their expectation instead? Exploratory testing on a large enough dataset could possibly have found the error, but that is another story.

There is a small clue in the name, getFooCondition, perhaps, if you would interpret the word "condition" as meaning just a simple comparison, but I think an OR-expression also usually qualifies as a "condition". In the postgres manual, a "search_condition" is any value expression that returns a boolean value. I'm not sure the name could be improved much, maybe it could be called createFooComparison instead? Would that have been a stronger clue?

Considering the use of types, while the FragmentSqlToken return type of getFooCondition has some advantages over a plain string, it doesn't reflect the expected algebraic properties of the returned value, namely that it should be a simple condition that can be AND:ed with other conditions.

A good pattern when serializing data is to create values that keep all the required properties right up until the single point where all the values are encoded at the same time.

Actually, I misrepresented the original code slightly: getFooCondition was part of that final step and the input was actually a tagged union of FilterCondition. I suppose that was a better clue to the expected output of the function, but it doesn't provide a signal when violated, so it is easy to miss.

type FilterCondition =
{ operation: 'eq', value: string }
| { operation: 'in', values: string[] };

The reason for having separate functions for different filterable entities was that they were represented differently in the database, some in columns, some in json blobs. Maybe it still would have been better to just leave the code inline so that the final sql.join was easy to see in the flow. Especially since the name getFooCondition is not very descriptive, nor is the FragmentSqlToken return type.

If there is a strong desire to break out the functions creating the individual conditions, the following might work for the return type to explain the desired properties:

type ConditionFragment =
{ type: 'comparison' , fragment: FragmentSqlToken }
| { type: 'or', fragments: ConditionFragment[] }
| { type: 'and', fragments: ConditionFragment[]};

One further note on this example is that the input type was changed from string[] to (string | null)[]. While expedient, it doesn't accurately represent that only one null value is relevant, which may lead to confusion later. It certainly complicated the code a tiny bit, while a better type would have pushed the complication upstream instead, along with the increased clarity.

Communicating consciously through code

naming

Naming things in code is the most important way of communicating things about the code that are not immediately discernible from the operations of the code itself.

Names can be used not only to say what a value or function is, but why it is.

Spend a little extra time on deciding which things need a name and what the best name is. Even small nuances can matter. If it is very difficult to find a good name, it may be a clue that the design of the code should be tweaked.

A compiler or similar pre-processor will usually check that the names align. This is a very handy way to get signals that can be used as reminders to self while coding.

automated tests

Consider that the main purpose of an automated test is to communicate the requirements and the theory of the design to the next maintainer. When the test fails, a signal is sent. In the best case, the signal is very clear and has a relevant meaning.

All too often, I see tests that have a big input blob of data and a big output blob expectation, with no clear guidance as to how they relate. Or the test is full of mocks and verifications that a certain mock was called in a certain way. These kinds of tests only signal "the code changed" and it is often not even possible to get anything more meaningful out of them.

Make sure each test explains exactly why it needs to pass, otherwise it is probably not worth the cost to keep it.

Make sure to clearly show how the relevant parts of the input relate to the expected output. One useful technique is to extract variables for the parts that should be the same, or show by a transformation how the output is derived from the input.

Tests that aren't run don't give any signal at all. Run them as often as you can. Make sure they are fast so that you are willing to do so.

types, preconditions and postconditions

Specifying the requirements on the expected input and the expected output of a function is very helpful as documentation both as to how it should be implemented and how it should be used.

Types and contracts only give a signal if something is checking them, so languages with type checking communicate better than languages without.

Types, contracts, tests and names all interplay. A variable named the same as its type is a possibility to improve communication by renaming the variable, for example to show its role in the algorithm.

code structure

The order and layout of statements in your code can also be used to communicate things and make the theory of the design more apparent.

Putting variable declarations close to their use minimizes the amount of data needed to be kept in memory while reading the code. In general, put things close together that need to be understood together.

The choice between recursion, loops, streams and even what kind of loops conveys different things about how the algorithm is designed.

Making an if-else statement conveys equal importance to the branches, perhaps with a slight preference for the first. Making a guard-clause conveys an exceptional case.

Grouping of functionality in a file with clearly marked sections can be helpful and more lightweight than breaking out a separate file for each.

comments

Thoughtless comments are a waste of time for everybody. Unfortunately their prevalence has made many developers blind to comments in general.

Brief, thoughtful comments, carefully written to explain why the code is as it is or other aspects that weren't possible to communicate any other way can be a life-saver.

To sum up

To fully utilize the design of the code and to make sure code remains correct under future modifications, the future programmer needs to understand the theory of the design. This is difficult, perhaps inexpressible, as it is lodged deep in the original programmers' brains.

The best we can do is to leave clues in the code, guide the reader of the code, and, even better, make sure that signals are fired when the original theory of design is violated.

I have written previously about this subject and about meaningful tests, perhaps there is more to be learned from reading them. In a similar vein I have even longer ago written about clarity of code, assumptions and assumptions again.

Try jj vcs without risk in your git repo

2025-03-23T12:53:00.000-07:00

I have been happily using jj vcs instead of git for 6 months now and I will never go back.

The reason I started was because jj was supposed to be better at handling stacked PRs. Spoiler alert: it is.

With jj I don't have to create branches until I am ready, nor do I have to create commits or commit messages until I am ready. I just work with sets of changes and I move them around as I please. When I have to modify something at the bottom of the stack, maybe from a review comment, the change propagates automatically up the stack. Any conflicts are just saved in the files for me to deal with when I am ready. The good news is that you can try jj in your existing git repo without damaging anything and you can use git commands whenever you feel you need to and just go back to git if that is how you feel, no harm done.

The best thing is that my team-mates are all still using git and it just works.

OK, let's give it a go!

Follow the install instructions from the jj docs
Go into your cloned git repo and run the command jj git init --colocate
You will probably get prompted to run jj bookmark track main@origin. You can also do the same for any other remote branches you want to track.
Then do jj new main to create an empty change on top of main and start writing code!

To view what files have been changed, run jj st. To see where in the tree you are working, run jj log

Feel free to also see how things look from git's perspective. Note that jj mostly works in a detached head state. This might feel a bit strange, but all your git commands work as they should and you can stop using jj whenever you want.

Interacting with remote

Get updates from remote with jj git fetch. If you need to, jj rebase -d main.

When you are ready, create a description (a commit message) with jj describe -m "feat: my changes"

To commit to main, jj bookmark move main --to @. To create a new branch for a PR, jj bookmark create my-bookmark

Then just jj git push. You need to add --allow-new if you created a new bookmark.

Is that it?

Pretty much. Obviously there are more commands for use when needed, but jj is both much simpler than git, by only working with changes and bookmarks, and much more powerful than git by allowing you to move and partition your changes as you like without the painful details of how git works.

When you do jj log, you will see an @ for your current working change, a change ID consisting of letters with the unique prefix highlighted, a commit hash and possibly a bookmark (branch name). A change can be referred to by the change ID prefix (or longer), the commit hash, or the bookmark. There is a whole revset language for specifying sets of changes and changes relative to specific changes.

You don't have to work at the top of a branch, you can edit any change in the branch, or insert a new change anywhere in the tree.

Give it a whirl, it takes less than a week to get used to. I think you will like it, but if you don't, just go back to using git commands.

If Agile isn't working, it's your fault

2024-07-28T11:27:00.000-07:00

Everybody is doing Agile development these days, or so they claim. Especially Scrum seems to be ubiquitous. Despite that, software development in general is not much more successful than it was 20 years ago. Surely that proves that Agile, and particularly Scrum, isn't working? Not so fast.

Let's take a look at what the Agile manifesto says about the values Agile is based on:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

Let that sink in while we take a look at what research says about teamwork in general. In particular, research has tended to focus on the difference between real teams, pseudo-teams and groups.

Real teams are distinguished by doing all of the following things:

The members claim to belong to a team
The members of the team have a common goal.
The members are dependent on each other and have to communicate and coordinate their work.
The team regularly reflects over the way they've worked and change things to improve their way of working.

Pseudo-teams claim to be a team, but do not do all the other three things, while groups do not claim to be a team (even if they may do the other things). The interesting thing to note is that real teams perform much better than pseudo-teams. Another interesting thing to note is that groups perform almost as well as real teams. I suspect that is because groups don't assume things will magically work out, they take the steps necessary to coordinate the work and achieve success.

The single most effective tool for organizational learning is the After Action Review. It's no coincidence that it fulfils point 4 above. There are a few similar things that also come in under point 4, like Post-mortems, or the Scrum "Inspect & Adapt" principle, a.k.a. "the retro". Taking it all the way back to the Agile manifesto, it favours "Individuals and interactions over processes and tools".

Clearly there is no point in reflecting about the work unless action is taken to make things work better going forward. If you spend your retros complaining, celebrating and blame-throwing, you've missed the point.

If you can't change the process, you're not doing Agile. If you don't change the process, you're the only one to blame. You're the software development professional, and as Uncle Bob puts it, you're supposed to profess how software can and should be developed. You don't tell the manager how to run the business or the product owner how to decide which features are most important, neither should they tell you what you can develop, how fast, or how you should do it.

What process should you have?

The only agile answer is that you should have the process that helps you do the work. And that process should change as circumstances change. Does the daily stand-up help you have a more pleasant and productive day? If it does, keep it, if it doesn't, change the way you do it or abolish it.

The first priority is "Working software over comprehensive documentation". One thing I like to do is to write down or draw an outline of a technical design, in just enough detail to understand it and discuss it. I feel that helps me develop working software. I will not maintain the technical design document as the software changes. Unless the team decides the document does help to continue developing working software, in which case we all commit to keeping it up to date.

A more controversial topic is perhaps the sprint plan. Does the sprint plan help you create working software? If not, it's just useless paperwork. Does estimating your stories help you deliver working software? (I'd say that's a no, but YMMV)

Hang on, you say, the manager needs the plan to do their job. This is where we pull out "Customer collaboration over contract negotiation". The manager is in a sense a customer receiving the plan, so what can the team provide instead? It turns out that the manager really needs a forecast, not a plan, and it has been shown that just counting tickets in Jira (or lines on the spreadsheet or whatever you're using) is at least as accurate for forecasting as any estimating and planning.

If you add more tickets as you learn more, that's still fine, good managers also know the value of "Responding to change over following a plan". What they hate is being told that day before they thought you would be done that it's going to be a few more weeks.

When it comes to collaborating with the manager and product owner on what can be delivered and when, there are four factors in play: resources, quality, time and scope.

Increasing resources, like adding people or using better tools, usually helps in the long term, but it will be detrimental in the short term. You may have heard of teams going through the stages of "forming, storming, norming and performing" and as soon as theteam changes, they go back to square one. It takes time to get back to efficiency. Likewise new tools can have a long rampup before they can be used better than the old ones. Often it is probably not even worth the trouble.

Decreasing resources can't usually be helped, but the negative effect on team productivity needs to be recognized.

Skimping on quality is a dangerous route, but can be done to some extent. Deliberately not handling edge cases or error cases in the short term to make a deadline is often doable. It may also be acceptable to do something the quick way instead of the right way. The trick is managing the technical debt later. The "wall of technical debt" may be a way to do it. Make sure that it is a deliberate decision to take on the debt.

That leaves time and scope that can be played with more freely. In the spirit of collaboration, the tech team needs to be honest about what can be done and when, but should also be able to come up with alternative solutions. Often the customers' biggest needs can be met in simpler ways, cutting down scope. At other times the manager just needs to accept that it will take longer.

Risks also need honest communication and evaluation. When tasks are no longer estimated for time, there are still two types of tasks: those the team understands how to do and those the team doesn't understand how to do in enough detail. The tasks that are not understood are a risk and risks need to be defused, so the team will need to spend time to dig into them more and plan them out better.

There are lots more potential topics to dig into, like the technical development practices, or the myriad of ways management can sabotage teamwork by performance reviews or issuing individual rewards, but I think this is a good place to stop, having outlined a decent basis for an agile development process.

Don't just listen to me, though, also take a look at what Ron Jeffries, one of the original signatories of the Agile Manifesto, has to say: Scrum is not Agile, but it's not a bad place to start.

'Entity' is the wrong idea

2024-05-14T12:16:00.000-07:00

Joe Armstrong, creator of the Erlang programming language, once said

You wanted a banana but what you got was a gorilla holding the banana and the entire jungle

He meant it as criticism of Object-oriented (OO) programming languages but ironically Erlang is one of the most OO languages that exist, having independently rediscovered the actor model which is the essence of what Alan Kay meant by Object-oriented (computers all the way down).

The criticism is still justified even if it is not really about OO. Rather it is about the idea of "entities", which I suspect is just as easy to fall prey to in a Hindley-Milner product type favoured by some functional languages as it is when representing data as objects. (Although I will concede that perhaps the object idea is easier to confuse with the entity idea).

Now I am not talking about the abstract idea of an Entity as being something that has an identity, which may or may not be associated with various attributes. The problem is rather when an attempt is made to model an entity as a data type with all possible attributes and relations it can have to other entities.

Consider a Car entity, it is of a certain make, model and year. So far so good, but then you might need to reflect that it has an owner. Maybe an owner is just represented as a name at first. Later it might become a Person entity but it is still logically represented as a part of or belonging to the Car entity. And then it actually has to be a list of owners, and it just goes on and on, creating logical entanglement.

Whenever you get a Car entity, you get a representation of everything that could be known about a car.

What is actually needed to know depends on the situation and use case. What is really needed is a way to represent the relation between different pieces of data so that they can put together in arbitrary combinations as needed. Now the Lisp programmers will slowly nod their heads because they do that all the time, its just a list of things. But that's not necessarily what I'm advocating, there is still a value in defining what each relevant combination of data is and have it type-checked.

When there is a need to reflect the ownership relation of the car, the entity trap manifests both if there is an owner entity attached to the car or if there is a car entity attached to a person (owner). Either way creates a dependency between any user of one of the entities to the other entity. The correct way is to always keep those entities separate from each other and, when needed, represent the relationship as a data value object containing both the car and the owner (maybe an OwnedCar or a CarOwnership if you have name it).

So if you've ever wondered why it's such a horror to maintain and extend a system built with an ORM, it's mostly because of "entities". Of course it doesn't help that all the annotations and whatever else that is needed to describe the entity and its relationships by necessity will form a language that is a buggy and incomplete implementation of SQL but in a proprietary syntax that nobody knows.

In some NoSQL databases it's called a document instead of an entity, but it tends to get bogged down in the same way.

Query builder libraries at least mostly avoid the entity trap, but they still suffer from being a buggy and incomplete implementation of SQL in a proprietary syntax that nobody knows.

So the best you can do is to embrace the relational idea of data and get the tools that will amplify your SQL usage. In typescript or javascript, Slonik is a good choice. In Java, have a look at Wrapd. For Go there is SQLC.

Perhaps you might want to take the relational idea even one step further and go full Datomic, leaving SQL for Datalog. That uses the term "entity", but in the abstract sense as just an ID that is related to an arbitrary set of attributes, so all is still good.

Usability in programming language concept implementations

2024-01-20T12:34:00.000-08:00

UPDATE 2025-08-24: Introducing the checkpoint concept. Also check out the error handling concepts.

Why does it feel easier and more joyful to write in one programming language than another? Most of the work is coming up with the algorithm, without considering language. Then translating that to a programming language is not usually the biggest problem (unless doing it in J 😉). Puzzling out how to make it work in PostScript (or J) can even be a reward in itself. But then there are those times when you just want to get an answer.

Prequel

As usual I have been doing adventofcode both because it is fun and it is an opportunity to use my Tailspin programming language. I also take the opportunity to try out a new language (or several) and as the mood takes me I might use an old favourite.

In 2017 I wanted to improve my javascript and also ended up learning SequenceL, in 2018 I wanted to see how SQL could work for programming and also started designing Tailspin as "how would I want to write this code", in 2019 I had just developed Tailspin so that was it, in 2020 I wanted to learn Julia, in 2021 F#, in 2022 Smalltalk (although admittedly I should probably do more with that. I really want to give Newspeak a try). Now and then I give Dart or Java a whirl. I've tried more languages, but you really need to do more than a couple of programs to get a good feel for it.

This year I wanted to see how Pyret worked, as it is designed to be a good beginner language. Some things I particularly liked was the way currying is done and the example tests for functions. Also the table data structure was very nice, even if I only used it once. Pyret is quite pleasant to work with, so this is no criticism of Pyret, but as I went along it just felt a little easier and more pleasant to reach for Tailspin than Pyret, particularly on the days I felt I was a little stressed or short of time.

I realized that the same thing had happened previous years. One thing that struck me is that being able to emit an arbitrary amount of result values from a transform/function is really a superpower. I have previously written about the power of emitting nothing at all, but emitting many values is just as liberating. While that is a nice insight in itself, is there a more general way to analyze this?

What is "better"?

Obviously what is "better" can depend on your purpose, so I should start by defining what I think is better for this analysis:

Clearly (and, secondarily, concisely) showing what is being done (rather than how)
Allowing easy restructuring and recombination to fit a slightly altered purpose
Helping to avoid errors (or at least making sources of errors easy to find)
Making it easier to follow best practice and harder to be "clever"

These are the principles I applied when designing Tailspin, so it should come out looking good in this analysis.

Looking at underlying concepts

Last year, I evaluated Tailspin on the Cognitive dimensions of notation and one way of trying to find usability differences would be to evaluate some other languages the same way. I think that would be useful, but I don't really want to go into so much depth on each language. So I had the idea of looking at concepts, Daniel Jackson style.

I think all programming languages mostly have to implement the same concepts, but the way they are implemented and exposed to the programmer may result in big differences in usability (and certainly I could keep the cognitive dimensions in mind here). Also, muddling and overloading of concepts, forced synchronizations between concepts, or a proneness to erroneous expression, could explain arising resistances.

I propose the following concepts:

Repetition - Purpose: Allows repeating similar operations with slight variation. Operational Principle: If you provide the parts that vary, the algorithm is repeated with those variations.
Aggregation - Purpose: Collects separate related units into one unit. Operational Principle: If you provide the parts and specify how they relate to each other, an aggregate unit is created.
Projection - Purpose: Creates a view of parts of an aggregate. Operational Principle: If you specify the location(s) within the aggregate you want to access, those parts are extracted into a smaller aggregate (or single value)
Selection - Purpose: Allows a choice of which units to process and which to ignore. Operational Principle: If you specify the selection criteria, the matching units will be processed.
Verification - Purpose: Allows checking correctness criteria of the program. Operational Principle: If you specify a criterion and run the verification procedure (which may be built in), a warning/failure is issued if the program does not fulfil the criterion.
Documentation - Purpose: Allows communicating aspects of the design and purpose to a future reader of the program. Operational Principle: If you specify the relevant information, a future reader will receive it when needed.
Checkpoint - Purpose: Saves a known program state Operational principle: When a line of computation starting from the checkpoint is abandoned and returned to the checkpoint, all changes made in that line of computation will be forgotten and not impact any continued computation.

Repetition

I suppose the most primitive expression of this concept would be the GOTO. The biggest problem with the GOTO is that it becomes very easy to create complicated execution trees, as we know Dijkstra pointed out. Enforcing and clearly showing more structured repetition like loops and procedure calls is advantageous. It is possible that there exists some algorithm that is most clearly represented by GOTO, but I don't remember any convincing example.

Loops

The classic here is the integer count from a start to an end FOR i=1 TO 10. This is often used to do something a specific number of times, but despite the beautiful clarity of something like 10 times: [...], I think that it perhaps doesn't improve enough over the FOR to be worth it and there is still a need to start at an arbitrary point and perhaps use a different increment. On the other hand, the C-style loop with arbitrary initialization, test and "increment" clauses is far too flexible to be easily understood at a glance, at least in the more exotic cases.

Mostly the for loop is used to get an index into an array to process each element. The for each construct, e.g. for value in values, is a clear improvement both readability-wise and for avoiding indexing errors. It also allows processing each element of an unordered collection. When we combine the for each with a way to construct a collection of index values, e.g. for i in range(1,10), we can do away with the old counting loop. With good constructors of index iterators we can even get back some of the flexibility of the C version in a more readable way, e.g. for i in powersOfTwo(). A possible downside of the for each is that it often relies on mutation (see below) to construct the result.

Functional languages popularized the higher-order map, filter and flatMap functions to abstract away the iteration itself, applying the repetition concept again to re-use the iteration with a modified operation to be done at each step. Unfortunately this means that the simple iteration code is redefined with innumerable variations, each with a different name and parameter list in order to handle different types of operations. Consider the following tortuous code to map to an optional value: values |> map maybeInt |> filter somes |> map unwrapSome. Since it is fairly common, F# defines yet another variation to handle it, values |> choose maybeInt. The small differences in exactly which type of processing is being done hardly seems relevant for the cognitive load investment needed to keep track of all the variations and the for each covers all of them much more clearly in a single construct.

Tailspin simplifies the for each further by simply streaming values into a pipeline, values..., or a counted range 1..10. This isn't really recognizable as a loop any longer even though it works exactly the same. The operations or transforms to be performed are placed along the pipeline without higher-order function wrappers. To enable filtering, transforms can withhold emitting a value and to enable flatMap, transforms can emit more than one value. This ends up being extremely composable and refactorable.

It might be worth considering a special construct for getting a value and its index, like Go's for index, element := range someSlice, even though the "range" keyword there doesn't sound quite right to me. Tailspin has such a construct as well, adding an "array deconstructor" to an inline transform, someSlice -> \[index](...\). This is effective but a bit of a syntax oddity. Perhaps a better approach would be to just have an iterator constructor that combines an index with the value and use the standard "for each"

The while loop is a really wild beast and I'll get back to that later.

Recursion

Any function or procedure call is also an application of the repetition concept. Recursion calls the same function repeatedly with slight modification, which ends up being equivalent to an iterative loop. The advantage of recursion is that it forces the mutation of the iteration index, the current value and the mutation of the result to happen at the same logical point in the code, which simplifies the logical structure.

A typical recursion implementation will have an entry function with a recursive helper function that must contain a test for whether to recurse more or return a result. The helper function is most clearly implemented as a match (or switch) statement listing the different possible conditions and what should be done for each. It may be worth noting that in the Guarded Command Language (GCL), which is intended to be easier to prove correctness for, a loop contains exactly a switch/match which is essentially randomly ordered.

In Tailspin, each templates (function) is by default expected to be an array of matchers (guards), no match keyword, it just is. A value can be sent back to be matched by a #. An optional prelude can set things up before recursing internally on the matchers. Example of how to flatten a list. While this doesn't enforce good practice, it guides it and makes it easier to follow.

Mutation

Mutating a variable is equivalent to calling a function at that point with a parameter holding the new value. Erlang does mutation of state that way. So mutation is also an implementation of repetition. Like the GOTO, rampant mutation can make it very difficult to unravel the execution tree.

It is in the light of mutation that we can examine the while loop because it is completely dependent on mutation to change the condition needed to break the loop. With that in mind, it should be clear that recursion is much preferable to while loops.

Despite the possibility of complications, mutation makes the expression of some algorithms, like folds or aggregations, much simpler. Mutation is also quite easy to understand in a local context. Even mutation affecting code in another part of the program can be extremely useful to relay information, but needs to be packaged and shared in a way to make the intent clear, channel in Go being a good example. But that starts digressing into other concepts that I haven't analyzed.

Tailspin allows one mutable variable inside a function, but it can be an aggregate (see the Aggregation concept below). Since functions can be defined inside functions, this can get complex but it can't escape its confines, unless you work hard at it by storing a closure in an object.

Tailspin also provides object constructs (called processors just to try and free the mind). These allow holding of mutable state between accesses.

Vector operations and other repetitions

Fortran90 introduced vector operations, where whole arrays could be used in arithmetic for elementwise evaluation. Hugely convenient, but also performant if you had a machine with pipelined circuits.

In Julia, a . can be used to turn any function into a vector function.

The Normalize-transpose mechanism of SequenceL automatically vectorizes any function call.

I haven't felt comfortable adding something like this to Tailspin yet, but there is one Projection that has a somewhat magical repetition (perhaps too magical). A call like $({z: §.x + §.y}) will repeat through any level of array nesting and transform each record at the bottom as specified.

Tailspin also has a built-in construct for producing cartesian products: [by 1..3, by 1..3] will produce all 9 pairs in a stream.

Inline-defined functions (lambdas)

Theoretically there is no difference between a function defined inline or a function defined separately, but there is a curious effect I have observed. This is not really about the repetition concept but rather asynchronous programming, which I am not currently analyzing.

Using small inline-defined functions to modify behaviour in immediately-executing code like map and filter is handy and doesn't cause any problems. But when an inline-defined function is passed as a callback, it can cause temporal confusion.

The observation came from writing Java code with callbacks defined as separate task objects, which was a great way to write asynchronous code. The object provided a clear bound and signalled that the execution context was different. When Java got lambdas, some callbacks got defined as lambdas and a few curious and difficult-to-analyze bugs started appearing. Those bugs all were because the brain too easily assumed that the code it was reading inline also got executed in the same time and context.

I'm a little on the fence here, but Tailspin currently does not allow inline-defined functions in a parameter, only in pipeline stages. There is less need because Tailspin doesn't need map, filter and friends. A function being passed as a parameter is obviously a crucial component of your algorithm, maybe it is worth giving it a name? In Tailspin, parameters are also named, so this could result in a function call like sort&{by: nameAscending}

Aggregation

The simplest form of aggregation is probably addition. If I were splitting hairs, I would then also argue that subtraction is a form of projection, which is the inverse of aggregation. Anyway, I think we can agree that arithmetic is most clearly represented in the normal infix way as 5 * (3 + 1) rather than something like 5 3 1 add mul. This, I think, sets the tone for this section.

Lists/arrays

The most fundamental and flexible way of providing aggregation is undoubtedly cons. This might be a little too flexible, though, and definitely a bit onerous when constructing longer list values. On the other side of the room, old languages closer to the metal require that you allocate arrays to a fixed length before you start filling them with data. If you ever worked in a language where you had to allocate space for strings you know the pain.

On a point of order, I suppose there is a distinction between lists, being ordered, iterable and variable length, and arrays, being indexable and usually fixed in size. I want all those properties at once and tend to be a little loose in which term I use. In Java, I don't know if there exist any cases where you would use a LinkedList instead of an ArrayList (although I could find uses for a Node class with pointers to next).

Modern languages provide a more direct and declarative array literal syntax, like ['apple', 'pear', 'orange']. For calculated values, list comprehensions are the way to go, e.g. [[i + j for j in 1:3] for i in 1:3], re-using the for each construct. The only thing to object to is the reversal of the repeated clause and the for.

Tailspin provides the literal list constructor, but since the square brackets just capture a stream of values, the list comprehensions read just like normal code, [1..3 -> $def i: $; [1..3 -> $i + $]! $], where an inline-defined function is used to capture the i for use in the nested pipeline. All types of expressions can just be combined inside the list literal, e.g. [5, 9, 1..3, 17, 4..6 -> 3 * $, 79]. This composability means there is no need for a concatenation operator, just do [a1..., a2...]

Text (strings)

In this multi-cultural world which is also full of emojis, text can no longer be considered to be an array of anything, except perhaps glyphs. Since glyphs can be made up of several concatenated unicode codepoints, a glyph array is probably not suitable as a basic representation.

Computer programs don't need to be interpreted by lines like some remnant of punch cards, so text should be multiline.

String templates are a good idea, but positional parameters to them are not, because word order often needs to change when translated to another language. String interpolation is a good idea, and perhaps an interpolated template should be easy to capture as a function. With string interpolation, no concatenation operator is needed. It should probably be possible to enter unicode codepoints numerically and be able to transform to and from byte or integer encodings.

Tailspin treats text strings as immutable entities produced by string literals with interpolated expressions. As a result of this analysis, Tailspin will be getting first-class string templates, perhaps :'Hello §.name;' would fit syntax-wise.

Structures/records (and function parameters)

While lists can be used generally to associate different pieces of data, it is often not a great idea, even if you call it a tuple. Having a type name constructor and typed values is a little better, but not much. What is a Rectangle(1, 3, 5, 9)? Is it a Rectangle(Point(1, 3), Point(5, 9)) or a Rectangle(Point(1, 3), Dimension(5, 9))? Even then, the structure of Point isn't entirely clear, nor how the Point relates to the Rectangle.

Again, a more literal declarative approach is superior. A literal {left: 1, top: 3, width:5, height:9} or named function parameters, Rectangle(left=1, top=3, width=5, height=9) may feel like too much to type sometimes, but it wins in the long run. This gets even better when considering the Documentation concept below, but even taking the cognitive load off constructing the aggregation should be beneficial. Anyone who has switched between Java and C# has run into the following: What does "abcdefg".[sS]ubstring(2, 4) give? In Java, it is "cd", in C# it is "cdef".

Needless to say, Tailspin supports the literal structure syntax and requires named parameters (in fact, a set of parameters is represented as a literal structure).

Other datastructures

Whether as built-ins or libraries, it is useful to have Maps (Dictionaries) and Sets available. Some languages (I know of Pyret and Ballerina) also have a Table construct that is a bag or set of records, with record fields treated as columns. Tables allow for interesting projections and relational operators like join.

Tailspin does not have Maps or Sets, but it has Relations, which are sets of records with relational algebra operators, like Tables. They can be used as Maps or Sets, but are more flexible because every column could be considered a key (although Tailspin's relations currently aren't very performantly implemented and replacing a value entails removing and adding)

Projection

Most fundamentally, a Projection is the inverse of Aggregation, so for cons that means car and cdr (or head and tail), but projections can be so much more. Array slices were introduced already in Fortran90 and are now happily present in more and more languages. One might argue that slices are an Aggregation of a Repetition of a simple Projection, but I feel that a Projection could be anything showing a smaller or transformed view of the aggregate object. Keeping the substructure in the projection rather than just streaming out the values is probably a good idea because it can be difficult to reconstruct, while it is still easy to add a separate streaming step when needed.

Fields of a record are usually accessed by name, often with the dot-syntax record.field. Record deconstruction by patterns whereby record fields can be projected onto local variables is very handy and now even available in Java.

LINQ is a great way to create projections, by selecting subsets of records, filtering, aggregating, grouping and ordering. Languages with Tables provide similar operations on those.

On a side-note, projection specifications can be first-class values: Haskell has lenses, which essentially are values that specify which part of an aggregation to return (i.e. which projection to apply).

Tailspin has many ways to project data:

arrays can be sliced by ranges but also be selected by index arrays, so ['e', 'h', 'l', 'o'] -> $([2,1,3,3,4]) gives ['h','e','l','l','o'].
Individual fields can be selected by dot-syntax, $.field, or by key-projection, $(field:).
Records can be transformed, e.g. $.from({x:,y:,z: §.z - $drop}), which copies the x and y fields of the from record and subtracts drop from the z field.
Array slicing and key-projection can be performed in several "dimensions", digging ever deeper, e.g. $@.space($.from.z; $.from.y..$.to.y; $.from.x..$.to.x) (dimensions separated by semi-colon).
Tailspin projections can be captured as lenses, e.g. :(to:;x:), which selects the x field of the record in the to field when applied.

Many of these things, as well as use of Relations and the matching (semi-join) operator occur in my solution to day 22 adventofcode 2023.

Tailspin also has a group-aggregation projection, e.g. $(collect {total: Sum&{of: :(score:)}} by {player:})

Text

When the idea of text being an array of characters is discarded, the question arises of how to project parts of text strings. Transforming to UTF-8 bytes or an integer codepoint array should be supported, but it doesn't help in proper unicode text manipulation, as seen in the rosettacode exercises for reversing a string or redacting words. Splitting into an array of glyphs that are themselves text objects works for these, but glyphs vary in size so there may be issues with that approach as a general solution.

Regular expressions are pretty standard and ubiquitous now, so I think we should just embrace them for extracting and replacing parts of text. It might be beneficial to have some form of parsing syntax to project text data into structured data, with bonus points if they can work reversibly as output formatters.

Tailspin does allow splitting into glyphs by streaming a string, e.g. ['abcde'...] gives ['a','b','c','d','e']. It also allows getting the text as UTF-8 bytes or a codepoint array. Most useful is the parser syntax that allows declarative specification of a data structure to parse the string into, e.g. { hand: <'\w+'>, (<WS>) bid: <INT"d"> } to parse '32T3K 765' into { hand: '32T3K', bid: 765"d" } ("d" is the unit). I haven't tried to make them reversible yet, but suspect that will only be possible with some restrictions to the current functionality.

Selection

One of the first selection mechanisms in higher-level languages was the arithmetic IF which redirected execution to one of three different locations depending on whether the argument was negative, zero or positive. Nowadays, the boolean IF has taken over.

Most of the time, the real state space contains more than just two cases. While it makes perfect logical sense to create a decision tree, and sets of boolean flags are easily represented in binary computers, nested if statements/expressions are not so easy for the human brain to grapple with, because each level down means more to keep in mind.

Flattening the cases to if...else if...else if...else..., more nicely represented as match (or switch or CASE), helps. Even then, since evaluation of cases is top to bottom, a human reader must keep in mind the implicit "and not any of the above". Perhaps the GCL is correct in (essentially) evaluating the cases in random order, enabling a human to consider each case on its own, and being more robust to code reordering.

On some level there really is no distinction between a Selection and a Projection, both being choices made according to specifications. A map operation is a projection of each value of a list, while a filter is a selection of values in a list. But filter could equally be considered a projection of the list itself.

Perceptually, perhaps a Projection selects data while a Selection selects code paths, but this gets muddied because a code path can simply result in data. Are null-conditional access operators (?. and ?[]) or the elvis operator (?:) Projections or Selections? What about if-expressions?

In Tailspin, templates (functions) by default just consist of a list of matchers (guard conditions) with a corresponding block to execute for each. Matchers are evaluated top to bottom and the block of the first true condition is executed. Matchers do not nest, although inline-defined templates can contain their own matchers.

Subtype polymorphism and typestates

In OO programming, code can become much simpler by utilizing the dynamic dispatch of subtype polymorphism instead of explicitly performing a Selection based on the type or state of the object. An object itself can be organized according to the State pattern to delegate to an internal state object, with different state subtypes, rather than checking state variables in each method. This also organizes the different behaviours of a particular state together, which is how humans tend to think about the world.

Typestates is a similar idea, but the interface is allowed to change with the state so as to only expose methods valid in that state.

Function overloading similarly uses the dispatch mechanism to choose the implementation based on the argument type(s). The difference lies in where the functionality is defined. Overloaded functions are usually all defined together for all different types, which mostly doesn't make sense because the implementations are unrelated except for having the same interface description (relation between input and output).

Some languages allow overloads to be defined anywhere, presumably where it makes most sense to the programmer. This ends up being even worse because it becomes almost impossible to find the implementations or even to know which of them have actually been defined and included in the program.

Tailspin allows to organize objects (called processors) into internal states. There is currently no way to find out which state is active, unless explicit query methods are added by the programmer.

Verification

How can you feel confident that the program works as expected (at least most of the time)?

Tests

The simplest way to try to gain some confidence in the program's correctness is to run it on a known case, or several. Unfortunately, tests only guarantee that a correct program will pass, but many incorrect ones may pass as well. More tests only help if they eliminate a hypothesis.

In days gone by, testing was only done manually, even if test-first methodologies were already being applied here and there. Then testing became more automated by special test runner programs and tests are defined using a test harness library.

More modern languages realize that tests are important enough to be built-in to the language and that they should be written next to the code they test. I really appreciated Pyret's where clause at the end of a function to specify examples (tests, really). Pyret has additional ability to specify tests not directly related to a specific function.

In Pyret, the tests are run every time the file is interpreted. I think this is probably the right thing to do, tests are useless if they aren't run.

In Tailspin, tests are defined with the code, and it is easy to replace (mock, stub, fake) resource-demanding parts of the program. No automatic running, but probably something to consider.

Contracts

A contract means that if you do your part, I will do mine. In programming contracts, the preconditions required by a function or method can be specified and they are checked on every call. Preconditions check that functions are called correctly. On the other side of the bargain, the postconditions that are guaranteed by the function can be specified and are also checked on every call. Postcondition checks are like tests that are run on the actual input used, to verify that the function is implemented correctly.

Related to contracts is also the ability to specify class invariants and loop variants and invariants. It seems a little too much work, but maybe those are things you have to think about anyway, in which case having to think clearly enough to specify them and having them checked at runtime could be beneficial.

Contracts are central to the Eiffel programming language and rumour has it that it produces stable and reliable software, but evidently this has mostly not resulted in purchases of Eiffel developer seats. I haven't found any hard research proof, but contracts may be the sweet spot between types and tests. The tl;dr; is that preconditions allow more precise definitions than types do and postconditions are fantastic as the built-in asserts for property-based testing.

Kotlin contracts seem to be something entirely different, not contributing to verification, but are rather ways of working around the compiler and type-checker.

Tailspin does not have contracts yet, but it is on the roadmap.

Types

There is actually no consensus on what types are or what they are for. On a practical level, it is useful to keep track of fixed-point and floating-point numbers so the correct machine operation is applied. On a theoretical level there is a similarly-named but different thing in logic and mathematics, and then there is the Curry-Howard correspondence whereby programs are proofs of their type signatures.

One purpose of types is to serve as really anemic contracts to verify that your program was put together sensibly. By limiting what can be expressed, it becomes easier to check the types than to run the program, so it can run as a sanity check during development. Also, type inference can be used to make this mostly automatic, although explicitly writing the types serves as an even better sanity check.

A proven benefit of types is to serve as lightweight documentation, but clearly contracts could do that better. Only explicit typing will do for this, type inference doesn't help at all. As discussed under Aggregation, names probably beat types here, but a combination might be even better. This was a digression into the Documentation concept, though.

Types can be given interesting properties to help verify and enforce correctness, like Rust's ownership rules. Linear types that must be used exactly once are another example, as are modes in OCaml. Instead of putting all these things in the same type system, they could be separate orthogonal mini type systems.

In Tailspin, types are currently checked at runtime, and they can be defined to have any property expressible in the language. Tailspin enforces that record fields with the same name have the same type, as per domain-driven design principles. Tailspin will also automatically create tagged types from strings or numbers, so that country´'Georgia is different from first_name´'Georgia' or state´'Georgia', and part_id´9 is different from shoe_size´9.

Proofs

Formal proofs of correctness are a bit of a Holy Grail in programming. From my limited understanding, it is possible to some extent but rapidly becomes harder as the program increases in complexity. Anecdotally I have heard that working with a prover can suck up aeons of time.

Having integrated specifications in the language, like Dafny, is probably helpful. The question is if it feels worth the effort.

Documentation

Since documentation (or, really, communication) was also my main "goodness" criterium, I have already touched on it under the other sections, so I won't repeat those details here. I have also previously written some thoughts about how programming structures can be used in communication.

It may seem plausible that having many different ways of expressing code would be helpful to communicate nuances, but more ways of writing means that readers need more knowledge. Scala is infamous for teams not being able to understand each other's preferred subsets of the language. In contrast, Go is designed with the goal of having only one way of expressing a particular algorithm.

In the Repetition section, I questioned whether the map, filter et al. functions conveyed any meaningful difference or were simply "mechanics". An example of this is perhaps the day 9 adventofcode solutions I wrote in Pyret and Tailspin. Despite being the identical algorithm, the Pyret solution just seems to be more obscured by "mechanics".

On the other hand, being able to choose whether to express the algorithm in terms of the input (e.g. by matchers in Tailspin) or the output (by literal constructor) can make a huge difference in understandability. See the roman numerals example in section 8 of Jeremy Gibbon's "How to design co-programs". In Tailspin, encoding roman numerals is naturally expressed in terms of the output, while decoding them naturally becomes a repeated projection of the input.

Checkpoint

Checkpoints bring transactions to mind, but essentially any mechanism that allow you to reclaim a previous known state would be a checkpoint.

Immutable data structures are a great example of checkpoints which guarantee that when your code gets back up to the same point to try another alternative, the starting state is known to be the same.

Some languages like Go and Javascript allow creating closures over mutable variables, which causes some difficult bugs. Essentially I think that is a violation of the checkpoint principle (or the programmer's expectation of a checkpoint).

Checkpoints have good uses also in error handling.

Final thoughts

It turned out to be easier than I first thought to come up with concepts to analyze. I think it was definitely worth doing and I'll have to try do this for error handling and concurrency as well. For my choice of "good", I think my design choices for Tailspin are supported by this concept analysis.

Using programming structures to communicate

2023-09-03T12:15:00.003-07:00

When we first learn to code, we struggle with the basic use of the programming language and just getting the program to work. As we gain more practice, we can start to explore "better" ways to write the code while it still does the same thing.

"Better" code is really all about communication with other humans, yourself in a little while, yourself later, or some other human who needs to work with the code.

When the communication is clear and efficient we get maintainable code that is a joy to work with. When the communication is obscured or missing, the code turns into an unintelligible mess.

So what means of communication can we use?

Comments

The first communication technique we tend to learn is the use of comments, to communicate hints to ourselves how the code works so we can understand it later. Since we are initially still struggling with basic syntax, the comments might tend to be something like the below (I have actually seen this in production code!):

// Add 1 to i
i = i + 1;

A few years down the line, the code might look something like this:

// Add 1 to i
i = i + 2;

An incorrect comment is worse than no comment. In this case, easy to ignore, but when the comment is more usefully explaining the algorithm it becomes downright devastating if it no longer applies and may send you on an hours-long goose chase.

So we learn the hard way that comments are untrustworthy and we need to find other ways to explain the code.

Comments can still be very useful to describe things that are NOT in the code. A todo-note, for example, explains what code has not been written even though it might need to be written in the future.

Code disposition

The order the code is written in can make a huge difference to ease of understanding. If the steps to achieve each important part of the algorithm are grouped together, the code can be understood piecemeal. Even better when a group of connected parts can be extracted into its own named procedure (method, function) or assigned as a named value explaining what it represents.

The optimal way to order the code is the one that minimizes the amount of things the reader needs to keep in mind at each point.

Layers of abstraction

Essentially, we create a new language when we create named procedures and values. If we start with the programming language itself as language L0, then we form language L1 from the values and procedures defined in L0. After that, we can form language L2 by combining things from L1 into more complex procedures. And so on. Finally, the program itself is expressed in the top level language we created, which serves as an explanation of what the program does without getting bogged down in too many details.

More loosely speaking, it is often said that you should break out a named entity when the "level of abstraction" changes, which happens when "the code goes from explaining what it is doing to how to do it". That's not a very clear explanation, but rephrased in the layered languages above, it means that when you dip down into concepts from a lower layer of language, you should probably create a new concept on the language layer you are using.

Of course things are never that ideal. A more practical way might be to consider what things need to be kept in mind at each point in the program. If the reader needs to remember more than about 4 things at any one time, there is a risk of cognitive overload, which makes it substantially more difficult to understand. Grouping some concepts together into new richer concepts may help. Another way to look at it might be that code should take up space in proportion to its importance.

Data structures

Layering concepts obviously also apply to data so that at one level it might make sense to talk about an x-coordinate and a y-coordinate, while at a higher level we would talk about a point and at higher levels still we might work with lines and rectangles. This is related to types, but types have larger implications and will be discussed separately.

Beyond pure representation of data, there is another aspect of structuring data in a way to both simplify and explain the program. In fact, the easiest way to understand any code is to start looking at the data structures. This was expressed in 1975 as:

Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowchart; it'll be obvious. - Fred Brooks, The Mythical Man-month

And repeated in more modern form in 1997:

Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won't usually need your code; it'll be obvious. - Eric S. Raymond, The Cathedral and the Bazaar

When designing an algorithm, you need to make decisions about how you will represent relationships between the data and how you will traverse the data elements. If there is an ordering between the elements and there can be duplicates, use a list. If there should not be duplicates, use a set. Duplicates without order, a bag. A queue implies elements on the same level, a stack implies a hierarchy. A tree can be arranged in ways to enable efficient searching. If elements can be associated with a consecutive numeric index, an array affords immediate access, for other associations a map is your friend, while for checking existence, we're back to a set.

Whether to recurse or iterate is related to data structures and can give similar distinctions as queue versus stack, although they are equivalent and iteration can be mechanically transformed to recursion and vice versa.

One important aspect of datastructures is whether they are immutable or not. An immutable data structure is just one thing to keep in mind. Otherwise all the separate things that can be changed in the data structure are things that need to be kept in mind. Mutability effects can be mitigated by clear ownership of the data.

Data ownership

Which part of the code that is responsible for the data is important to communicate in order to keep track of when the data might be changed (if mutable) and when the data is no longer needed (so that memory can be reclaimed), and if the memory has been reclaimed to know that the data is no longer accessible.

Some languages side-step the memory reclamation issue by automatically reclaiming it when it is no longer used, either through garbage-collectors or reference-counters.

Rust has mechanisms to specifically track and express ownership in the code. In other languages you should probably document it.

Naming

Picking accurate names is critical to correct communication. It can also be quite difficult.

In one review of code that was supposed to compare images by comparing pixel values, I saw on closer inspection that the values named "pixel" actually turned out to be indexes into the image and the program was just comparing that indexes were equal, which would always be true.

Names are known by the computer so references to names will in most languages be checked and provide you with a clear error when you reference an undefined name. Often this check will even happen before the code actually starts running. This checkability can be used when you need to revise every usage of a particular value, just change the name and the computer will remind you of all the references you haven't revised yet. This property also applies to types, to be discussed later.

Language differences

There are of course differences in exactly how things are expressed in different languages, especially between different programming paradigms, but by and large the same kind of things will be communicated in similar ways.

In a procedural language, you have both nouns (data) and verbs (procedures) and you could say that to calculate the variance of a collection of numbers you would first subtract the mean from each element, then add together the square of every element, count the number of elements and then divide the sum by the count of the elements.

In an object-oriented language, an object has state and can respond to requests, so you would ask the numbers to subtract the mean and to square themselves, then ask the collection to add up all the elements, ask the collection how many elements there are and then ask the resulting sum to divide itself by the count.

In functional languages, even functions are things, so there are no verbs. You would have to say that the variance is the quotient between the sum of the squares of the difference between the elements and the mean, and the count of the elements. That can be a bit of a mouthful so a more understandable grouping and ordering is often achieved by let-statements, to for example define the squares, their sum and the count separately before obtaining the quotient. Another option is by a pipeline (or a threading macro) of transforms to turn the list into a list of differences, then a list of squares of differences, then into a sum and a count and then divide them.

A note on "declarative"

A program is said to be "declarative" when it specifies what is to be done, rather than how to do it. That's exactly the idea of the "levels of abstraction" mentioned above! Every level of the program should be as declarative as possible for most efficient communication.

Languages such as SQL and HTML are considered to be declarative. Some people want to claim that functional languages are generally more declarative and it is indeed difficult to explain how to do something when you only have nouns at your disposal. But with more than a few "of the" in a row, it really looks a lot more like a list of instructions (or, I suppose, how to define things). On the other hand, no "declarative" claims are made about OO, even though objects were created to be the most declarative way of modelling things in the real world.

Choosing a paradigm

In most modern languages you can choose between procedural, functional and object-oriented modes of expression. The trick is to determine which way the separate aspects of your program are best described.

As a guide, I like to think of a triangle of code styles as follows:

At the top of the triangle you are working procedurally with general (possibly abstract) datatypes imperatively and finally arrive at a result that you can interpret satisfactorily.
When you focus on what your data IS, and create datatypes that are more and more specific to your program, you move down the left side into functional programming, with a clearly-defined specific input and a function call that produces a clearly-defined specific output.
If you instead focus on what your objects DO, and create objects that do things more and more specific to your program, you move down the right side into OOP. Messages/calls produce actions and reactions until you get your answer (part of which may be encoded in the state of the system)

Which of these styles correspond best to the way the desired functionality is most naturally described?

Automated tests

Fundamentally, a test, when run, communicates whether the specified conditions still hold or not. A breaking test is, or rather should be, an alarm bell that the code no longer does what it is supposed to.

Unfortunately, many tests "cry wolf" and merely signal that the code changed, which you probably already knew because you changed it.

Requirements can be communicated through tests, even when they are not run, if the tests are written as describing the requirements.

If you write the test before you write the code, the test is more likely to reflect requirements and can communicate to you when your code is done. Just formulating the test can help clearly communicate to yourself what needs to be done. You already should have an idea of that, so why not try to make it clearer?

In the layered language view, tests are one layer above your code and can therefore be used for communication about the code. Thinking about the layers might also be helpful to avoid digging too deep when making assertions.

One thing to consider is that writing and maintaining tests is an extra cost and it may not be worth it for trivially analyzable code.

For further thoughts about tests, I wrote an article on writing meaningful unit tests.

Contracts

Essentially, a contract is about the guarantees made by the code.

If you hold up your end of the deal when calling a procedure (method, function) by obeying the preconditions stipulated, the procedure will gurantee to hold up its end of the deal and fulfil all the postconditions. If, on the other hand, the preconditions are not obeyed, anything can happen.

Other types of contracts concern "invariants", things that are guaranteed to always be true both before a call is made (or a loop iteration entered) and after the call (or loop iteration) is completed.

Thinking about contracts can be very beneficial to making your code correct. According to the Design by Contract method you should do so before you start writing the actual code. As with tests, you probably have an idea about it already and formalizing the contracts can clarify thinking, as well as communicating that to a future maintainer.

The documentation (that you, like me, probably didn't write) for every piece of the program should contain information about the contracts, that is really the whole point of documentation. But documentation is like comments, something other than the code and must be kept in mind, so it would be preferable to enforce contracts in code for automatic reminders. A very successful example is runtime bounds checking on array access which helps catch large amounts of bugs and prevent malicious attacks.

Some languages have built-in support for specifying contracts, but it is quite possible to write contract-checking code yourself. Most important is to check preconditions because otherwise you have no clue what might happen. Postconditions are more easily spotted as bugs and they tend to be partly covered by tests. Postconditions also tend to be more expensive to compute. A common strategy is to turn off contract checking in production with the rationalisation that the most common problems will have been discovered in staging and development environments.

A failing contract is a serious condition signifying that you don't really know what's going on. The safest way to handle it is to just exit the program as quickly as possible. If you need to keep other things working, having a monitor process automatically restart your program is useful. Don't forget to make sure some human becomes painfully aware of what happens if you want it to ever be fixed.

Types

Types in programming languages can generate almost religious discussions. Type systems also have deep connections to mathematical and logic theory. In practice, the use of types in programming languages is all about communication.

A piece of data can be classified as belonging to a type. Different pieces of data of the same type are interchangeable because they have the same characteristics, whatever that means. Data of different types should not be confused with each other. Naming and typing can be used in somewhat overlapping ways for this information, depending on what's available in the language.

The use of types can help to show which pieces of a program are related to each other and how the data flows. This information can then be reported back to you by the computer when you make changes that don't match up. Here's an article about using types to help refactor a large program.

Specifying types on input parameters and return values has been shown to be beneficial as a form of documentation. These specified types also capture aspects of the pre- and postconditions for the procedure/function/method.

Typescript contains interesting transforms that can be performed on types to show how a related type corresponds to it.

Type inference is useful for some of these things, but not as effective for documentation and contract specification.

A case study comparing an algorithm in Ada vs C shows the effectiveness both of being able to specify many different types and having a language that applies strong typing. But as the article points out, you have to actually do some work to gain the benefits:

"During the design of the software system, the developer must select types that best model the data in the system and that are appropriate for the algorithms to be used. If a program is written with every variable declared as Integer, Float or Character, it will not be taking advantage of the capabilities of Ada."

While types written in comments, or in a language that mostly ignores or automatically converts types, do communicate some things, the biggest value comes in having them strongly enforced by the computer.

Since it is often easier to determine the resulting type from an operation than actually performing the operation, types can to some extent be statically checked before the program runs, in less time than it takes to run the program, which can be helpful.

Unfortunately, the definition and manipulation of types forms a separate language from your programming language and it can become very complex, which can impede communication. Once the type-specification language becomes Turing complete, which happens all too easily, it is no longer possible to generally guarantee or predict that the type-check ever finishes.

Some things like array bounds checking are generally impossible to do at compile time.

Like with contracts, a failing type check is a serious condition where the program is not doing what you think it is. Some languages gratuitously try to convert types for you, which might make sense in some contexts or to some degree, but can make bugs very difficult to find. Note that the idea of truthiness is an automatic type conversion, as is allowing equality comparisons of different types of values.

Like with tests, it is possible that having a type system that is too complex or prescriptive may not be worth the overhead in comparison to the benefit received.

An example of a very successful use of types to ease usability of an API is IMO AssertJ. The assertThat function returns an object type with only the assertion methods relevant to the input type.

Modularity and information hiding

What you don't say is sometimes more important than what you say. Modularity is the ability to hide complex "machinery" of how things are done and only expose the ability to use what it does. As long as the exposed interface and contract remains the same, you can change whatever you like inside or even replace the whole thing with an equivalent that also obeys the same interface and contract. For tests, it doesn't even need to obey the contract fully, just emulate it well enough for the particular test.

Information hiding comes in many forms. The one that's most commonly mentioned is OO encapsulation in objects/classes. Exposing a property via getters and setters hides the fact that it's a field. Not even having getters and setters and just performing services for the caller hides even more information. Another way to hide information is to define helper functions inside the function that uses them (in java you can even define a class inside a method).

Miscommunication

The clearer the communication, the easier the code will be to maintain. There is a tension between code that is easy to read and code that is easy to write and unfortunately there is often too much focus on the writing.

Overspecification

Tests often suffer from specifying how the code works rather than what it is supposed to do. Faced with a test breakage it is difficult to know whether to fix the test or the code.

Passing in a large compound data structure (or "entity") into a function that only uses one or two fields definitely makes the function harder to reuse. Code reuse is generally somewhat overrated, however, so the question is what communicates the intent best? Should that function know about the structure of the entity or is it the surrounding code's job to deconstruct it first?

ORMs really make a mess by working with huge entities. Most of the time you're just interested in getting a few pieces of information like a list of friends' names, you don't need the whole social graph. Treating the database as representing relations between facts rather than representing entities will make things much more manageable. Embrace SQL and why not 6NF along with it?

Opaque languages

Decorators and annotations may certainly be declarative ways of adding functionality, but there is no practical way to find out what they do without learning their language. Unfortunately, that language is often missing functionality as well as being underdocumented. Frameworks fall in a similar category, making easy things trivial and difficult things impossible, because you have to fight with the framework.

Domain specific languages are sometimes created with the idea that the domain experts should be able to use them. Again they tend to be underpowered and underdocumented and, what's worse, too difficult for the domain experts anyway, so the programmers now end up coding in the DSL as well as maintaining it.

Note the difference with the layered languages formed by your code, where you can easily go down a level to understand exactly how each layer works and you can easily refactor and reshape it as needed.

A related difficulty can arise when a section of code has too many dependencies, perhaps only using a small part of each. The "lower-level" language formed is just too large and unwieldy. A solution could be to organize a module that exports only what you need and encapsulates the other dependencies.

Hidden dependencies and unknown assumptions

Hidden dependencies arise when two different pieces of code depend on the same knowledge, for example an assumption that all files are in a specific directory. The remedy is to make sure there is a single source of truth for every fact needed by the program.

The assumptions made in a module may not always be explicitly stated (and the programmer may even be unaware that an assumption was made), which causes problems when the user of the module has a different set of assumptions (or facts) to deal with.

An example of insufficient communication of assumptions occured in the Julia language, where functionality can often be magically shared between packages unknown to each other. Unfortunately, this can also cause obscure bugs and problems when there are unknown assuptions to deal with. In particular, arrays used to be indexed from 1 to the length of the array, until someone figured out how to make offset arrays that could start anywhere, which broke the assumptions of most other packages. (I still think Julia is worth trying, BTW)

Implementation reuse and pre-emptive generalization

Sometimes two pieces of code can be identical and you feel the urge to merge them together. Unless they are meant to do exactly the same thing in all future versions of the program, resist the urge. A guide may be if it is difficult to provide a name that sounds great for all usages, resist the urge, because then they are logically different things and they will evolve in separate directions. Communicate that logical difference, don't make a future maintainer try to guess which usage was which when trying to tear them apart.

Related to that is when you think a particular functionality you are writing could be much more generally applicable. Resist the urge, because the generalized expression is likely to be more complex and less obviously applicable to the current use. Also, it could be that the code in future needs to be generalized in a slightly different direction. Don't make a future maintainer have to try to understand that your generalization is not generally used and that they can undo it (and must undo it).

Inconsistent usage

A concrete example is the use of the word final on java classes which prevents creation of subclasses. The standard String class is marked final so that it can be used for secure credentials without risk of sneaky subclasses stealing the data. In that context final means "do not override, bad things will happen if you do".

Some people feel that you should defensively mark a class final if you haven't given any thought to how it can be subclassed. This is a much weaker and less useful communication and you are also removing the user's right to replace your class with a different implementation.

The worst case is when you have programmers of both schools in a codebase, so the word final (or the absence of the word final) carries no discernible meaning at all.

Consistency is key to maximizing communication.

Prescriptive practices

Any practice that is mandated obviously has no communicative value.

Common mandates are to create interfaces corresponding to every class, or to always organize the code in specific layers or groups of files.

It needs to be carefully considered if the practice itself has enough value to outweigh the loss of communication.

Conclusion

There are many ways to use the code to communicate things about the code beyond it merely working. Things you would want to communicate are the requirements and assumptions, preferably set for automatic reminders. I hope it's an interesting and useful perspective and I think that code will be better if it is written deliberately for the purpose of communicating.

Evaluating the Tailspin language after Advent of Code

2022-12-31T07:46:00.002-08:00

My very short (and very biased) opinion is that Tailspin is an excellent language, at least for Advent of Code.

Consider day 5 which consisted of parsing a diagram of a stack of crates and then some instructions for moving them. Generally this problem was considered a parsing nightmare, but I had virtually no trouble in the Tailspin solution. Parsing is one of Tailspin's strong points, just look at line 22-24 which is how you convert an instruction text line such as "move 5 from 1 to 3" into structured data:

composer parseInstruction
  {move: (<='move '>) <INT"crates">, from: (<=' from '>) <stack´INT>, to: (<=' to '>) <stack´INT>}
end parseInstruction

Parsing the crate diagram is only slightly more complex, I explain it in more detail in the video of me solving the problem.

Then, on line 30, the "by" operator is used to spawn several single crate moves from each instruction, which are then easily executed by appending to the "to" stack the deleted last value of the "from" stack. Finally, make a string containing the last value from each stack. The solution for part 2 is even easier, just moving the whole chunk at once:

source solutionPart1
  @: $stackState;
  $instructions... -> {from: $.from, to: $.to, by 1"crates"..$.move -> (move:1"crates")}
    -> ..|@($.to):^@($.from;last);
  '$@(first..last;last)...;'!
end solutionPart1

source solutionPart2
  @: $stackState;
  $instructions...
    -> ..|@($.to):^@($.from;last-$.move::raw+1..last)...;
  '$@(first..last;last)...;'!
end solutionPart2

But anecdotes like this are a little random, can the evaluation be more systematic?

Evaluating usability of a language

The goal of any programming language is to be usable. Since "usable" might mean slightly different things for different languages and for different contexts, it can be hard to come up with objective measures and comparisons. Instead, it can be worth looking at different aspects of usability and evaluating those aspects in the context of doing the activitity the language is meant for. One framework for doing this is the "Cognitive Dimensions of Notations", which you can learn more about in this article on the cognitive dimensions applied to user interfaces.

Caveats

Trying to be a little scientific, I have to consider threats to the validity of this "study":

I am the only one evaluating and on top of that, I am surely very biased because I created Tailspin. On the other hand, the dimensions aren't necessarily always about "good" or "bad", it depends on the context.
Advent of Code is very different from the usual "data shovelling" that constitutes most professional programming, which is probably why it is so appreciated as a distraction. The suitability of Tailspin for Advent of Code might not imply much about suitability for more serious use. In particular, the Advent of Code solutions comprise very little code, all in one file, as opposed to hundreds of files with dozens of callers of a function.

Quick description of Tailspin

The first thing to note about Tailspin is that it looks very different from any other programming language. Surprisingly there is no cognitive dimension relating to applicability of prior knowledge. With usability of user interfaces, there certainly is a benefit to having your interface work similarly to everyone else's interface. For programming languages there is the notion of a "strangeness budget" where the idea is that the more your language looks like C, the easier it will be accepted. Tailspin really blows the strangeness budget, but that is intentional, to not get stuck in the same old rut. There is a proposed addition to the cognitive dimensions of "Useful awkwardness", where you are forced to step back and think about the task before proceeding. So let's say that's what Tailspin has. There are also features that I've designed to be "helpfully annoying".

Another thing to note about Tailspin is that it is slow, about 1000 times slower than Java, so you can't brute-force your way out of things, you need to come up with a decent algorithm. The interpreter could be made much faster if anyone had the time and the inclination. It could perhaps be adapted to the Truffle framework on Graal VM for another performance boost. Even better might be to make a dedicated VM.

The fundamental idea in Tailspin is that program structure follows data structure. Mostly you describe the different cases for the input data and choose your actions based on that, but sometimes you want to define the output structure and create actions for determining the contents. And yet other times you need to do a bit of both and meet in the middle.

The basic construct in Tailspin is an "assembly line" where each value in a stream of values is transformed at each stage until the final result is achieved. Along the way, a value can expand into several separate values, or it can just disappear so that no further stages are applied on its account. Stages are separated by arrows "->".

Creating values is mostly familiar, "[" and "]" creates a list of values, while "{" and "}" create a structure with values named by keys, e.g. "my_key:". Numbers are mostly just numbers, although they could be identifiers, "my_id´5", or have units/dimensions "3"x"". Ranges (streams of numbers) are created by a start and end value (possibly excluded) with an optional stride, e.g. "1..5:2" to give "1 3 5".

Accessing data is also mostly familiar, with dots for accessing keys in structures and array/list indices after the variable accessor (but with parentheses, not square brackets), although more interesting projections are possible. The use of the "$" prefix to get the value of a variable is a little more strange, although not entirely uncommon. A little more strange might be the use of a bare "$", which simply means the current value to be transformed (in Kotlin it would be called "it"). The "@" sigil is even more unusual and refers to the one mutable variable you can have in a block. When you see an "!", that is where the assembly line ends and the value either gets emitted into the calling context or gets swallowed by the sink specified.

"templates" is the name for a complex transform, essentially a function with one input. The basic structure of templates is a series of matchers, each within angle brackets, "<" and ">". The first matching block is executed. The symbol "#" means that a value gets sent to be matched. A "composer" is a type of templates for parsing strings into structured data and a "processor" is essentially an object containing mutable state. For more details and tutorials, look at the main Tailspin site.

Evaluating the dimensions

Viscosity

Viscosity is about how difficult it is to make changes, particularly if a change requires a number of subsequent actions that distract you from what you were trying to achieve.

The Advent of Code challenges are perfect showcases for this dimension with part 2 usually introducing a small modification to the part 1 requirements. This has gone very smoothly and has generally been very easy to do. That may mostly be due to lucky choices informed by experience, but when it keeps happening it seems likely the language has some part in it.

Consider the day 14 solution where there is sand dripping down and coming to rest. In part 1, sand disappears at the bottom, while in part 2, sand collects on an infinitely wide floor at the bottom. The part 2 solution is basically a copy-paste of the part 1 solution, with addition of a line of space and a line of rock (on lines 35 and 36), adding matching rules that extend the grid horizontally when needed and then retries (lines 39-44), removing the rule dealing with falling into the void (line 26), and finally adding a rule to plug up the hole at the top when the time comes (line 48).

Even simpler was the day 11 solution where monkeys were playing with the items from your pack. To support part 2, I just had to add a "then:" parameter to the operations (lines 38-52) and inject the appropriate function during monkey object creation in each part.

Even when you change your mind about the fundamental solution style to use, the code still needs to follow the data (in and out). I implemented day 7 in three coding styles by just copy-pasting and adapting. The basic structure is the same.

Perhaps the most viscosity experienced concerns accessing the mutable state associated with each templates or processor. In the day 20 solution, I took the original "solutionPart1" templates and renamed it to "mix", with a "file:" parameter (the "input" on line 5 was originally named "file"). Since Tailspin mandates that you need to specify the name of the templates owning the mutable state when accessing it from a nested templates, the "@mix" references on lines 24 and 25 had to simultaneously be renamed from "@solutionPart1". This viscosity could be reduced by introducing a more relaxed naming of mutable state, but that would decrease visibility, increase hidden dependencies and perhaps increase error proneness for multi-threading (when that is enabled).

Another viscosity frequently encountered concerns when a number or string becomes autotyped. Tailspin allows you to use plain (untyped, raw) strings or numbers to a certain extent. But as soon as a string or number is assigned to a structure field, it is required to become typed, either as a tagged identifier, or, for numbers, a measure with a unit or dimension. Since Tailspin conservatively considers it an error to compare values of different types, changes have to be made to most places where the value is used. This viscosity is somewhat deliberately chosen to allow quick-and-dirty usage in the small, while generally discouraging use of plain strings and numbers, as well as forcing type conversions or union types to be explicit.

Hidden dependencies

This concerns the cost of finding things that affect each other, particularly if coordinated changes are needed. It particularly identifies one-way dependencies, e.g. you don't know the callers of a function, and local dependencies, i.e. you know the next step but may need to follow a long dependency chain.

Tailspin tries to make dependencies visible by:

The above-mentioned requirement to name the context of the mutable state accessed from nested contexts.
Requiring that symbols from another module must be prefixed with a module identifier when accessed.
Requiring that the main program file specifies all modules used, and all modules that it allows other modules to use.

Otherwise Tailspin is probably just as prone to hidden dependencies as other languages, although note that the Cognitive Dimensions tutorial mentions that the remedies for hidden dependencies can sometimes be worse than the dependencies themselves.

Being a fairly dynamic language, I suppose the input and output types to a function should be considered a hidden dependency. In the future, I intend to add contracts to the language. It might be possible to create type inference mechanisms as well.

Dynamic types, or, rather, undeclared types, in general could perhaps be considered to be hidden dependencies. If you have to declare types, it increases the viscosity when types are changed, encourages premature commitment and counteracts provisionality, to an extent depending on how complex type declarations are. Tailspin attempts to find a balance by allowing dynamic types but requiring that field names consistently apply to the same type. This can have a viscosity impact as discussed above, but allows more provisionality.

Premature Commitment and Enforced Lookahead

This concerns constraints on the order of doing things that force the user to make a decision before the proper information is available.

I can't at the moment think of anything that forces a premature commitment or undue lookahead.

Abstractions, abstraction hunger and the abstraction barrier

In the cognitive dimensions framework, the definition of an abstraction is a class of entities, or a grouping of elements to be treated as one entity, either to lower the viscosity or to make the notation more like the user’s conceptual structure. (Side note: The cognitive dimensions framework clearly defines "abstraction" in this context to mean simply a grouping of components. Otherwise, this article tries to clarify the confusion surrounding abstraction.)

The abstraction barrier is determined by the minimum number of new abstractions that must be mastered before using the system; if the system allows one user to add new abstractions that must then be understood by subsequent users, that will further raise the abstraction barrier.

Abstraction-hungry systems can only be used by deploying user-defined abstractions. Abstraction-tolerant systems permit but do not require user-defined abstractions. Abstraction-hating systems do not allow users to define new abstractions (and typically contain few built-in abstractions).

Tailspin is somewhat abstraction-hungry. You have to create a templates (function) instance for any of the following reasons:

Conditional evaluation. A series of matchers can only exist as part of a templates instance.
Mutable state can only exist within a templates instance during an invocation, or, more persistently, as part of a processor (object) instance.
Recursion (obviously) or looping, which is an inner recursion on the matchers. Simple iteration can be accomplished by streaming values from a list or a range, although arguably these are other abstractions.
Performing more than one statement/pipeline from the same value. This can alternatively be achieved by lists or structures, with a pipeline for each element or field.
Combine a sequence of transforms into one unit. This could make a difference if effects (mutable state or input/output) are involved, since Tailspin guarantees every value passes through the current transform before any value passes through the next.

Note that templates can be anonymous and defined inline in the pipeline.

The abstractions that need to be learned before efficiently using the language are statements/pipelines, ranges, lists, structures and templates. Useful additional abstractions are processors (objects) and relations (data tables for relational algebra). It is unclear whether a composer counts as an abstraction, but it does contain a number of matchers to be treated as a unit and it is the only reasonable way to deconstruct text data.

Secondary notation

Secondary notation is about providing additional information by other means than the official syntax needed to make things work.

Tailspin allows addition of comments by a "//" that initiates a comment until the end of the line.

Tailspin does not depend upon whitespace so that can be used to convey additional information by line breaks and indentation.

Tests could possibly also be considered a secondary notation that conveys information about the program. Tests can in Tailspin be written within test blocks in the same file as the program.

On the drawing board for Tailspin is the ability to annotate data with metadata, although it's still being evaluated. Beyond code being allowed to ignore metadata, what is really the difference between data and metadata? If it does get ignored and lost in processing, what is its use? How should it carry to derived or transformed data?

Visibility & juxtaposibility

Visibility concerns the ability to view and find components easily, while juxtaposability concerns the ability to place components side-by-side to compare them.

Juxtaposibility is not provided within the language and has to be achieved by, for example, split editor windows.

In Tailspin the syntax is designed to show the start and end of components clearly, intended to help visibility:

Templates definitions start with "templates", "source" or "sink" followed by a name, and end with "end" also followed by the name. Repeating the name increases visibility at the cost of diffuseness. Similarly, processor and composer definitions end with "end" and the name repeated.
Test blocks start with "test" followed by a descriptive string and end with "end" followed by the same description.
Inline templates start with "$" and end with "$". They can be anonymous or have a name after the "\", repeated both at start and end.
Value definitions, state assignments and string interpolations end with ";". Definitions start with "def", state assignments start with "@" and string interpolations start with "$".
A pipeline usually ends with a "!", either alone as "emit into calling context", or before an identifier that "swallows" the current value. A pipeline could end with a "#", which is not an end but initiates conditional processing over the matchers. Pipelines used to create values in lists, structures and assignments will end in the way corresponding to that component.
Lists start with "[" and end with "]", list items end with "," except at the end of the list. Structures start with "{" and end with "}", fields start with an identifier followed by a ":" and a value production ending in a ",", except at the end of the structure. Relations start with "{|" and end with "|}", with comma-separated structures or value productions.
Matcher expressions start with "<" and end with ">". Additional conditions start with "?(" and end with ")".

As previously mentioned, accessing mutable state outside the current templates needs to be done with the name of the outer templates where the state is declared.

Using a symbol from a module is required to be prefixed with an identifier associated with the module.

Closeness of mapping

This dimension concerns how closely the notation represents the original problem.

No matter how declarative a language is, there always seem to crop up situations where the mechanics of executing the program overshadow the description of the problem. In some or most languages it is possible to create abstractions that fit more closely to the problem description, even to the point of creating a domain specific language, but abstractions can come with their own problems, sometimes creating barriers to learning and maintainability.

To examine this dimension, I will present a few examples of Tailspin code, describe it in prose and see how that corresponds to the problem description.

We have already seen at the beginning of this article that composer parsing of strings is very declarative and that the day 5 code corresponded well to the problem description.

For another example, consider the following code from lines 11-26 the day 13 solution:

operator (left inOrder right)
  0 -> #
  when <?($left <..>)?($right <´´ ..~$left>)> do 1!
  when <?($left <..>)?($right <´´ $left~..>)> do -1!
  when <?($left <..>)?($right <´´ =$left>)> do 0!
  when <?($left <..>)?($right <[]>)> do $ -> ([$left] inOrder $right)!
  when <?($left <[]>)?($right <..>)> do $ -> ($left inOrder [$right])!
  when <?($left <[](1..)>)?($right <[](0)>)> do 1!
  when <?($left <[](0)>)?($right <[](1..)>)> do -1!
  when <?($left <[](0)>)?($right <[](0)>)> do 0!
  when <?($left <[]>)?($right <[]>)> do
    ($left(first) inOrder $right(first)) -> \(
      when <´´ =-1|=1> do $!
      otherwise ($left(first~..last) inOrder $right(first~..last))!
    \)!
end inOrder

The operator "inOrder" with a left and a right argument is evaluated as follows (of course, you need to learn the vocabulary of the language):

When the left value is a number and the right value, which can be anything, is a number less than the left, emit 1
Otherwise when the left value is a number and the right value, which can be anything, is a number greater than the left, emit -1
Otherwise when the left value is a number and the right value, which can be anything, is a number equal to the left, emit 0
If the left value is a number and the right value is a list, wrap the left value in a list and apply the operator to the new left and the right
If instead it is the right value that is the number and the left value the list, wrap the right value in a list and apply the operator again.
If both values are lists, and only the right is empty, emit a 1
If instead only the left list is empty, emit -1
If both lists are empty, emit 0
If both values are (implicitly non-empty) lists, apply the operator to the first elements of both lists, and emit a 1 or -1 result, else apply the operator to the remainder of both lists.

Compare with the day 13 problem description:

When comparing two values, the first value is called left and the second value is called right. Then:

If both values are integers, the lower integer should come first. If the left integer is lower than the right integer, the inputs are in the right order. If the left integer is higher than the right integer, the inputs are not in the right order. Otherwise, the inputs are the same integer; continue checking the next part of the input.
If both values are lists, compare the first value of each list, then the second value, and so on. If the left list runs out of items first, the inputs are in the right order. If the right list runs out of items first, the inputs are not in the right order. If the lists are the same length and no comparison makes a decision about the order, continue checking the next part of the input.
If exactly one value is an integer, convert the integer to a list which contains that integer as its only value, then retry the comparison. For example, if comparing [0,0,0] and 2, convert the right value to [2] (a list containing 2); the result is then found by instead comparing [0,0,0] and [2].

This is very similar, although you still need to infer the encoding of "-1" meaning "the right order", "1" meaning "not the right order" and "0" meaning "no decision is made".

Matcher arrays generally tend to correspond quite closely to the requirements, so is a good feature to have. Let's look at yet another example from the Tailspin solution to day 3:

source solutionPart1
  ($table({elf:,item:,compartment:})
      divide&{over: $table({elf:, item:})}
      $table({compartment:, elf:}))...
    -> $.item -> toPrio -> ..=Sum&{of::()}!
end solutionPart1

source solutionPart2
  ($table({elf:,item:,group:})
      divide&{over: $table({group:, item:})}
      $table({group:, elf:}))...
    -> $.item -> toPrio -> ..=Sum&{of::()}!
end solutionPart2

Explanation (again, knowing the language):

For part 1, consider the relation between elves, items and compartments, and return the elf-item combinations that exist for all compartments related to the elf. Take the item of each value of the result, transform by the "toPrio" templates and sum those values.

For part 2, consider the relation between elves, items and groups, and return the group-item combinations that exist for all elves in the group. Take the item of each value of the result, transform by the "toPrio" templates and sum those values.

Here's how the problem was stated for part 1:

Find the item type that appears in both compartments of each rucksack. What is the sum of the priorities of those item types?

And the description for part 2:

...within each group of three Elves, the badge is the only item type carried by all three Elves... Find the item type that corresponds to the badges of each three-Elf group. What is the sum of the priorities of those item types?

Relational algebra is also a handy feature which often corresponds well to the requirements. Of course, I am cherry picking a bit. It might not be entirely clear that the following line from the tailspin solution to day 17:

$(1)::length + 3 -> \(<$@Funnel.top::raw..> $ - $@Funnel.top::raw + 1 !\)
  -> ..|@Funnel: { pipe: [x (1..$ -> [x 80 x]) ($@Funnel.pipe) x],
        top: $ + $@Funnel.top::raw};

which reads: Take the length of the first element of the input array, add 3. If it is greater than the "top" field of the "Funnel" state, take the excess plus one, then append to the "Funnel" state a "pipe" field that contains a byte array which prepends the calculated value amount of "80" bytes to the previous "pipe" value and increments the "top" field by the calculated value.

corresponds to the description:

Each rock appears so that ... its bottom edge is three units above the highest rock in the room (or the floor, if there isn't one).

Consistency

In a consistent notation, similar semantics are expressed in similar syntactic forms.

Tailspin is designed with consistency in mind:

All kinds of abstractions (as defined for cognitive notations framework) are defined starting with the abstraction type followed by the name and ended with the word "end".
Angle brackets "<>" signify a matcher, checking if a value matches the expression, even if it is the whole value in a regular matcher and the beginning of the remaining string in a composer.
Square brackets "[]" signify creation of a list/array. If it has an x, "[x" and "x]", it is a byte array. A possible inconsistency is the use of "\[i]( \)" which is an element-wise transform of an array where the square brackets surround a holder variable for the index of the element. It would perhaps be more consistent as "\[i ( )\]"?
Curly braces signify a set, "{" and "}" for a set of keys with attached values (a.k.a. a structure), "{|" and "|}" for a set of structures with the same keys (a.k.a. a relation).
".." means numeric values above the left bound and below the right bound, or just any numeric value if no bounds are given. This applies both when generating ranges and when testing conditions.

Parentheses "()" are used for a number of purposes. Mainly it is used for grouping: in arithmetic, in part sequences of a byte array, when creating a "free" key-value pair and in the form "$" and "$" as an inline templates (grouping statements and matchers). The other use of parentheses is to enclose an indexing operation (or more generally, a projection) into a compound data structure. A third use of parentheses is to enclose an additional condition in a matcher, as "?( )". It is not clear if this inconsistency is confusing or not.

Another inconsistency is perhaps that both "$struct.field" and "$struct(field:)" yield the same value. On the other hand, "$struct({field:})" gives a structure containing the field, which is consistent with "$arr([5])" giving an array consisting of the fifth element while "$arr(5)" gives just the fifth element. Could it be confusing that "$arr(5..5)" gives an empty array if there is no fifth element while the other forms would error? And there is currently no form for optional field access.

Diffuseness

The verbosity of the language.

Although the effect of over-verbosity is often slight, it can take up more slots in the brain's working memory. On the other hand, terseness can be a problem because it can increase error-proneness and reduce visibility.

Comparing with Advent of Code solutions written in other languages, the Tailspin solutions tend to be fairly concise even when written to be clear (close mapping).

Tailspin can sometimes perhaps be too terse. Matchers can be written just as they are, but there is an option to surround a matcher with the words "when" and "do", which seems to increase readability.

Another example of the syntax being too terse is possibly "<5..>" vs "<..5>" meaning greater than or equal to 5 and less than or equal to 5 respectively. A mistake choosing the wrong one can be hard to spot.

Error-proneness

This concerns whether the notation itself invites certain types of systematic mistakes.

I can't think of any particular errors that the language would invite one to make. Tailspin has a philosophy of erroring on anything that can be discovered that might be a programmer error.

Hard mental operations

Does the language sometimes put a high demand on cognitive resources?

I don't think Tailspin requires more hard mental operations than other languages. Perhaps rather the opposite, since pipelines focus on one step at a time and matchers encourage specifying one case at a time.

Progressive evaluation

This dimension explores whether it is possible to evaluate incomplete work.

I have written an article that demonstrates progressive evaluation in Tailspin. It is very easy and a favourite technique of mine.

Another little trick is to insert "$$ -> !OUT::write $! $" as an extra stage in a pipeline to see what the current value is at that point.

Tests can also be used to evaluate parts of programs.

Provisionality

How committed are you to things already written?

In Tailspin it is quite easy to put in provisional values along the way and perfect the details later. For example, if you know that the current value is to be transformed to a different structure before being processed in the next stage, but you don't quite know yet how or what to do in the first transformation, you can just inject a constant, e.g. "$ -> {foo: 1, bar: 'qux'} -> my_current_focus" and change it later.

Role-expressiveness

Can the purpose of a component (or an action or a symbol) be readily inferred?

Most of the things mentioned under "visibility" perhaps belong here instead. Also the things mentioned under "consistency".

I'd like to highlight the use of sigils in Tailspin. A "$" always means a value is being brought into play, while a "!" means a value is being sent out of scope and lack of a sigil means a transform is applied. Also, an "@" always means mutable state, while the lack of it means immutable values, except in the case of processors. Communication with processors is done through messages which are "sent" by "::".

Final thoughts

It would obviously be more useful if other people did this evaluation, but even doing it on my own forced me to look at Tailspin from a few different angles, which I think is ultimately worthwhile. It will now also be easier to reflect back along these dimensions as I use or develop Tailspin going forward.

Write meaningful unit tests

2021-08-21T06:13:00.002-07:00

This article has moved

The power of nothing

2021-05-28T12:44:00.000-07:00

I came across an article comparing F# and Clojure on a toy problem, a json treasure hunt to find a certain value in a large json document and print the path to it. Both are very nice in their own way. Personally I have a slight preference for the F# syntax and think I might want to try some serious coding with it.

Both basically pattern match over json types, and recurse for lists and objects. In F# you declare a union type and match over that, while in Clojure you declare a protocol and implement the protocol for the built-in types. Take a look at the article, F Sharp vs Clojure Toy Problem Shootout.

Now think about the return values in each case. In F#, the return value is declared as a union type with either a list of crumbs or a value named HuntResult.Null, while in Clojure, being dynamic, either the list of crumbs or the boolean value false is returned. This seems pretty standard, similar to what you'd do in pretty much any language, you're probably even wondering why I'm pointing it out. But look at how many lines are concerned with handling HuntResult.Null and false, respectively.

The Tailspin programming language handles things a little differently. Since Tailspin doesn't require that you return a value at all, you can just ignore all the failure cases and not emit a value at all. Not a value called nothing, literally nothing. This turns out to be a pretty powerful concept, drastically reducing the need for conditional statements and the returning of empty values. Here's the Tailspin code:


templates findAllTreasurePaths
  when <='dailyprogrammer'> do [] !
  when <[]> do $ -> \[i]($ -> findAllTreasurePaths -> [$i - 1, $...] ! \)... !
  when <{}> do $... -> \(def key: $::key; $::value -> findAllTreasurePaths -> [$key, $...] ! \)!
end findAllTreasurePaths

It's built pretty much the same way as the other solutions, pattern match on the json values, recurse on lists and objects and start building a result when the value is found. But all the code to handle dead ends is gone, if a dead end is hit, nothing further happens.

There is another difference here, though. The Tailspin code returns all treasures if there is more than one, while the F# and Clojure solutions only returns the first one found. Actually, "returns" is not an accurate description of what happens in Tailspin, rather zero or more values may be emitted.

Maybe we really did need to return only the first value found, then you will have to do a little more work in Tailspin to halt your search when a value is found. You also need to detect the failure case where nothing at all is returned and you need to check the next element of a list or object. To do that you need to wrap the recursive call in a list constructor so that you get an empty list for failure and match on the alternatives. You also need to track the current index into the candidate list:


templates findFirstTreasurePath
  when <='dailyprogrammer'> do [] !
  when <[]> do
    def list: $;
    [0] -> \(
      when <[](2)> do [$(1) - 1, $(2)...] !
      when <[<..~$list::length>](1)> do def next: $(1) + 1; [$next, $list($next) -> findFirstTreasurePath] -> #
    \) !
  when <{}> do
    def attrs: [$...];
    [0] -> \(
      when <[](2)> do [$attrs($(1))::key, $(2)...] !
      when <[<..~$attrs::length>](1)> do def next: $(1) + 1; [$next, $attrs($next)::value -> findFirstTreasurePath] -> #
    \) !
end findFirstTreasurePath

It takes a little mind-shift to get used to, but I'm still amazed at how well it works to program a flow where only things you care about proceed to the next processing step.

Learning Tailspin by comparing to Javascript

2021-05-02T02:33:00.016-07:00

Since the Tailspin programming language is a little different syntaxwise from most programming languages, I recently got a suggestion to put Tailspin code examples next to some well-known language. So with some help from Rosettacode, here goes, Javascript on the left vs Tailspin on the right, with logic as similar as is reasonable. Note that the syntax-highlighting algorithm doesn't really know Tailspin so sometimes it may mislead.

To begin with, you should probably try to forget everything you think you know about programming language syntax, Tailspin is very different.

Hello World

Tailspin uses an arrow -> to denote that the value created on the left is input to the transform on the right. This is similar to the "pipe" operation in shell-script programming. You can also think that the arrow corresponds to something like ".map" or ".forEach" in javascript. Note also that the bang ('!') in Tailspin shows where a value disappears, so the chain stops after the value has been sent with the write-message to OUT.

Javascript


console.log("Hello world!")

Tailspin


'Hello world!' -> !OUT::write

Simple math

Tailspin uses a dollar-sign to denote when you source a value, e.g. from a defined symbol. Note also the round parentheses used for array indexing and that indexes start at 1.

Javascript


const a = 2;
const b = 3;
const c = [1, 6];
console.log(a + b + c[0] - c[1])

Tailspin


def a: 2;
def b: 3;
def c: [1, 6];
$a + $b + $c(1) - $c(2) -> !OUT::write

A+B

Add two numbers given on an input line. Note that the javascript version would handle more than two numbers on the line.

Tailspin comes with a built-in PEG-like parser syntax, used inside a "composer". Things within angle brackets, '<' and '>', are matchers, here the built-in WS for whitespace and INT that produces an integer. So here we match a string with two integers, discarding whitespace around them, and output as an array (of two integers). After parsing, we add the first and second elements of the array together. The $ without a name refers to the current value being handled (here first as the array produced by nums, and in the next step it is the produced sum that is interpolated into the string to append a line break). Note also the $ in front of IN, the $ denotes a source, a place where a value (or several values) appears, here as a result of sending the readline-message to IN.

Javascript


process.stdin.on("data", buffer => {
  console.log(
    (buffer + "").trim().split(" ").map(Number)
        .reduce((a, v) => a + v, 0)
  );
});

Tailspin


composer nums
  [ (<WS>?) <INT> (<WS>) <INT> (<WS>?) ]
end nums
 
$IN::readline -> nums -> $(1) + $(2) -> '$;
' -> !OUT::write

FizzBuzz

In Tailspin, "templates" corresponds fairly well to "function", except that templates only take one input value (and can produce zero or more output values). The "when .. do" checks if the current value matches the expression inside the angle brackets and if so, executes the following code up to the next when case (remember, angle brackets are always around matchers, although these matchers are slightly different from matchers in a composer in that they can match other things than strings). The "otherwise" statement is executed if no "when" matches. To emit a value (similar to a "return", but processing can continue afterwards), you use a lone bang ('!') which also ends that value stream by "disappearing" the value into the calling context. The "$" to "$" section is an anonymous inline templates (a lambda, essentially, the backslash looks a bit like a lambda-sign if you squint). There is no for-loop in Tailspin, we simply create a stream of the integers 1 to 100, inclusive. Note how the string interpolation of a value starts with a $ and ends with a semi-colon (';') and remember that a lone $ refers to the current value.

Javascript


var fizzBuzz = function (i) {
  function fizz(i) {
    return !(i % 3) ? 'Fizz' : '';
  };
  function buzz(i) {
    return !(i % 5) ? 'Buzz' : '';
  };
  return `${fizz(i)}${buzz(i)}` || i;
};
for (var i = 1; i < 101; i += 1) {
  console.log(fizzBuzz(i));
}

Tailspin


templates fizzBuzz
  templates fizz
    when <?($ mod 3 <=0>)> do 'Fizz'!
  end fizz
  templates buzz
    when <?($ mod 5 <=0>)> do 'Buzz'!
  end buzz
  def i: $;
  '$->fizz;$->buzz;'
    -> \(when <=''> do $i! otherwise $! \) !
end fizzBuzz
1..100 -> '$->fizzBuzz;
' -> !OUT::write

Fibonacci

Return the nth fibonacci number.

In Tailspin, the only values that can be modified are the @-values that live inside a templates. The pound sign ('#') denotes that the value should be matched against the matchers (the when-statements). To repeat, we send a new value back to be matched. Note that the "<0~..>" matcher matches a value strictly greater than zero.

Javascript


function fib(n) {
  var a = 0, b = 1, t;
  while (n-- > 0) {
    t = a;
    a = b;
    b += t;
  }
  return a;
}

Tailspin


templates nthFibonacci
  @: {a: 0, b: 1};
  $ -> #
  when <0~..> do
    @: {a: $@.b, b: $@.a + $@.b};
    $ - 1 -> #
  otherwise $@.a!
end nthFibonacci

Matrix multiplication

The Javascript version has been written to mirror the Tailspin version, though it would probably naturally be written slightly differently. The A-matrix is used as a template for the rows of the output, while the first row of the B-matrix is used as a template for the columns of the output.

We can define binary (two-argument) operators in Tailspin. Note also the "\[i](" construct where backslash is the start of an inline function definition (a lambda), which ends at "\)". The i inside the square brackets says that the lambda should apply to each element of an array and that the index should be provided as the defined symbol 'i'. The result is still an array, but with each element replaced with the result of the lambda.

Javascript


matmul = function(A, B) {
  return A.map((_r, i) => {
    return B[0].map(_c, j) => {
      var cell = 0;
      for (var k = 0; k < B.length; k++) {
        cell += A[i][k] * B[k][j];
      }
      return cell;
    });
  });
}
 
const a = [[1,2],[3,4]];
const b = [[-3,-8,3],[-2,1,4]];
print(matmul(a,b));

Tailspin


operator (A matmul B)
  $A -> \[i](
    $B(1) -> \[j](
      @: 0;
      1..$B::length -> @: $@ + $A($i;$) * $B($;$j);
      $@ !
    \) !
  \) !
end matmul

def a: [[1,2],[3,4]];
def b: [[-3,-8,3],[-2,1,4]];
($a matmul $b) -> !OUT::write

Reverse words in a string

In the Tailspin version we keep the whitespace between the words while the Javascript removes it and replaces it. The Tailspin input is also already divided into lines.

The ellipsis ('...') streams out the individual elements of the array. The tilde ('~') denotes inverse, or "not", in Tailspin. The composer produces an array of word-productions, where the word rule in turn produces a sequence of non-whitespace characters followed by an optional sequence of whitespace characters. The two sequences (strings) produced are just separate strings in the resulting array on the same level as the other strings. Note also how we can select a sequence of elements from an array, with an optional stride, in this case we take all elements in reverse order, by "$(last..first:-1)".

Javascript


const input =
"---------- Ice and Fire ------------\n\
\n\
fire, in end will world the say Some\n\
ice. in say Some\n\
desire of tasted I've what From\n\
fire. favor who those with hold I\n\
\n\
... elided paragraph last ...\n\
\n\
Frost Robert -----------------------";
 
function reverseString(s) {
  return s.split('\n').map(
    function (line) {
      return line.split(/\s/).reverse().join(' ');
    }
  ).join('\n');
}
 
console.log(
  reverseString(input)
);

Tailspin


def input: ['---------- Ice and Fire ------------',
            '',
            'fire, in end will world the say Some',
            'ice. in say Some',
            'desire of tasted I''ve what From',
            'fire. favor who those with hold I',
            '',
            '... elided paragraph last ...',
            '',
            'Frost Robert -----------------------']
;
 
composer words
  [ <word>* ]
  rule word: <~WS> <WS>?
end words
 
$input... -> '$ -> words -> $(last..first:-1)...;
' -> !OUT::write

Water collected between towers

Fill a "skyline" with water, so the input [1, 5, 3, 7, 2] will result in a total of two units of water held above the 3. For a better description and interesting links, see the rosetta code page for this problem.

The chosen algorithm goes first from left to right to find the height of the left containing wall, then from right to left to see how high the water level can be at that point. The Javascript is written to match the Tailspin as closely as possible.

Here we mostly put stuff together into a more complex algorithm. The matchers with the dots in are range matchers, so "<$val..>" matches a value greater than or equal to "val", while a tilde acts to exclude the value, so "<$val~..>" matches a value strictly greater than "val". The array is reversed as we did in the previous example and then we stream the elements out individually (by '...') and send them to the matchers (by '#').

Javascript


function histogramWater(a) {
  var leftMax = 0;
  return a.map((h) => {
    if (h > leftMax) {
      leftMax = h;
    }
    return { leftMax: leftMax, value: h };
  }).reduceRight((acc, point) => {
    if (point.value >= acc.rightMax) {
      acc.rightMax = point.value;
    } else if (point.value >= point.leftMax) {
      // do nothing
    } else if (point.leftMax <= acc.rightMax) {
      acc.sum += point.leftMax - point.value;
    } else {
      acc.sum += acc.rightMax - point.value;
    }
    return acc;
  }, {rightMax: 0, sum: 0}).sum;
}

console.log(histogramWater([1, 5, 3, 7, 2]));

Tailspin


templates histogramWater
  $ -> \( @: 0;
    [$... -> { leftMax: $ -> #, value: $ } ] !
    
    when <$@~..> do @: $; $ !
    otherwise $@ !
  \) -> \( @: { rightMax: 0, sum: 0 };
    $(last..1:-1)... -> #
    $@.sum !
    
    when <{ value: <$@.rightMax..> }> do @.rightMax: $.value;
    when <{ value: <$.leftMax..> }> do !VOID
    when <{ leftMax: <..$@.rightMax>}> do
      @.sum: $@.sum + $.leftMax - $.value;
    otherwise
      @.sum: $@.sum + $@.rightMax - $.value;
  \) !
end histogramWater

[1, 5, 3, 7, 2] -> histogramWater -> !OUT::write

Range expansion

A string with compressed ranges is to be expanded into a list of integers, e.g. "-6,-3-1,3-5,7-11,14,15,17-20" will expand to [-6, -3, -2, -1, 0, 1, 3, 4, 5, 7, 8, 9, 10, 11, 14, 15, 17, 18, 19, 20]

Tailspin has a built-in PEG-like parser syntax which is THE way to do string manipulation. You could use a PEG library in Javascript, but normally you soldier on with primitive string handling.

Here we produce an array of zero or more elements. The element rule will either be a range or an integer, optionally followed by a comma that is ignored. If it is a range, it will be an integer that is captured into the definition of the "start" symbol, followed by a dash that is ignored, then another integer which is sent on to produce a stream of integers from start to the current value, inclusive.

Javascript


function rangeExpand(rangeExpr) {
 
    function getFactors(term) {
        var matches = term.match(/(-?[0-9]+)-(-?[0-9]+)/);
        if (!matches) return {first:Number(term)};
        return {first:Number(matches[1]), last:Number(matches[2])};
    }
 
    function expandTerm(term) {
        var factors = getFactors(term);
        if (factors.length < 2) return [factors.first];
        var range = [];
        for (var n = factors.first; n <= factors.last;  n++) {
            range.push(n);
        }
        return range;
    }
 
    var result = [];
    var terms = rangeExpr.split(/,/);
    for (var t in terms) {
        result = result.concat(expandTerm(terms[t]));
    }
 
    return result;
}

console.log(rangeExpand('-6,-3--1,3-5,7-11,14,15,17-20'));

Tailspin


composer expand
  [<element>*]
  rule element: <range|INT> (<=','>?)
  rule range: (def start: <INT>; <='-'>) <INT> -> $start..$
end expand
 
'-6,-3--1,3-5,7-11,14,15,17-20' -> expand -> !OUT::write

Hopefully you now have a sense for how the basic syntax of Tailspin works so that you can better understand more complex code examples.

Creating an algorithm (Dart code)

2020-05-17T11:03:00.009-07:00

NOTE: This article is exactly the same as my previous article except the code example here is in Dart instead of Tailspin and therefore I also made some different implementation choices and the TDD flow ended up being a bit different. If you just want to see the code or TDD-process in this article, skip to it.

"How do you create an algorithm?" is a question i saw on Quora recently and it started me thinking. What are the steps I follow and could there be something that someone could learn from that? Since I've been wanting to write a sudoku solver just for fun, I decided to write up the process.

Interestingly, the task of writing a sudoku solver has been the subject of debate previously. That debate was about the limitations of TDD and might be well worth reading. I am a big fan of TDD and would add my vote to those who claim that it helps you develop faster and more fearlessly, when done right. But when done wrong, it probably can slow you down and prevent you from doing the right thing.

Now I don't want to get into a long discussion about TDD, but there is one thing I want to point out: tests are mainly about verifying your assumptions. A failing test verifies your assumption about what was wrong with the code. A failing test that starts passing verifies that the code you just wrote fixes the problem. An automated unit test that you run regularly verifies your assumption that "this change couldn't possibly affect that code". I've previously written about how assumptions slowed me down here and here. So for effective testing, make sure your tests verify assumptions about the code. If you're only verifying that the code was written in a specific way (easy to do with mocks), you should probably re-assess the usefulness of the test. Even with tests, don't forget to make the code readable, and do code reviews, pair programming or mob programming according to your preference. (If you do want to read a longer recent post on TDD, I suggest this one).

OK, when creating an algorithm we must first make sure we understand the problem and the requirements, so if you don't know sudoku, go read up on it.

Now we can start thinking about how we would go about solving it. The easiest solution would be to just type up the rules in some constraint solver software or programming language like Prolog. But that kind of programming is a very different mindset and it might be hard to switch to even if using a language like Shen or Curry that has it built-in, so I'm not doing that.

What else do we know or can decide? I'm pretty sure I want to input the problem as text something like this:

534|678|912
672|195|348
198|342|567
-----------
859|761|423
426|853|791
713|924|856
-----------
961|537|284
287|419|635
345|286|179

with '.' for unknowns, and output it the same way. Maybe I'll allow more versions for the input, adding whitespace, ignoring lines, zeroes instead of dots. I don't think I need to explore any variations here. I can think of at least wanting to test one known sudoku puzzle as an acceptance test that I am done. I might also want to test edge cases like inputting an already solved puzzle and inputting a puzzle with all unknowns. So should I go ahead and write these tests? The problem is that I cannot make them pass immediately, so running them all the time is going to introduce noise into the process with these expected failing tests. On the other hand, I am thinking about it now, so it would be good to capture that thinking. I'm pretty sure that my assumption is correct that my code won't work right now, I don't need a failing test, so I'll just write a TODO for now.

So what about writing the input and output code? No! One of the biggest problems in the software industry, even inside Google, is that programmers are too quick to start writing code, which means there's just more and more code solving the wrong problem or solving it the wrong way and then you leave it to someone else to clean up. We are not ready to make a quick decision on what the internal data structure is going to be as that is directly interacting with our algorithm. A premature bad choice is going to throw a big spanner in the works.

What ideas do we have for how we will find the solution?

We could try to do it like humans do, with different heuristics like "if two positions both need one of the same two values, then either of those values cannot be assigned to a third position". This seems like it might be complicated and we can't be sure we have all the needed heuristics. Also unclear what data structure would best support the algorithm.
We could just store the placed digits and open spaces in a mutable array, much like our input format, and try each value in each open position and check for validity. The good thing about this is that we can easily try the states in order because we can reverse each decision easily, just replace the last placed digit with a '.' and backtrack. The bad thing is that we probably won't finish until the end of the universe, with about 9^50 possibilities (with 31 givens). We might still be ok if we check for consistency after placing each digit, but it seems likely we won't be reducing the search space fast enough, with too many high multipliers at the beginning of the search.
What if we kept track of the remaining possibilities in each open position and always selected the one with the fewest alternatives to try? That should keep the multiplicative combinatorial explosion down. This sounds promising and there are some data structure options, but before exploring those, let's see if we have any more ideas about how to solve the problem.
Another idea could be to try and fill out all occurrences of a digit before moving on to the next, but that just seems like a variation of the previous with unclear benefits. It's a possible extension if needed. I can't think of any way to generate look-up tables or otherwise speed things up. No other approaches come to mind right now.

OK, so, keeping track of remaining possibilities we could store the state as an array with each cell containing either a placed digit or a list of possibilities. Since we want to search for the open position with fewest options, it might be more efficient to keep a list of just the open positions separate from a list or array of placed digits. For faster access to each open position we could even key them into a map/dict. But do we need it?

Thinking more about how we will move forward and backtrack (undoing choices that didn't work) through potential solutions, it seems easiest to just copy the new state for the next step. If we're copying all the state anyway, we may as well just go with the simpler array option.

Almost ready to code! What shall we do about the initial input state? Shall we represent the given digits as already placed and work out what the remaining options for the open positions are? Or shall we represent the given digits as open positions with one remaining possibility, setting the open positions to have all possibilities remaining, and then just run it through the same algorithm all the way? The second option seems to be a slam-dunk so we don't get two pieces of code interpreting the same rules.

If this code has to be part of a large code base, we would also have to study the structure of the code base so we can add the code in a good place. Don't just add code at the first place it could work, that will just start to accumulate a mess, usually at the edge of the system.

Now we need to choose how we want to drive the tests, either through the text input described above or with the internal representation. Since it isn't too hard to create an internal representation in my chosen language and I think it might be easier to verify properties of the internal output, I will drive it internally, then separately drive the text conversion and add an integration test at the end.

The code

Right, let's code! I will be using Dart for this example. If you know Java or Javascript or anything similar you should be able to follow along. If you feel you want an intro to Dart, try this.

I first want to add a test for when the puzzle is complete, or when there are no more choices to be made. Since I'm not entirely thrilled about working too much with two-dimensional arrays (or lists of lists) in Dart, I think I will use a list of open positions as my internal input format and return a two-dimensional array with the solution.

The minimal implementation passes and I add a test that the last remaining digit gets placed, with the minimal code to make it compile (failing to compile is a failing test, but that has a bit quicker turnaround):

Which fails, as expected:

00:03 +1 -1: internal solver last digit gets placed [E]                                                                                                
  Expected: satisfies function
    Actual: [
              ['.', '.', '.', '.', '.', '.', '.', '.', '.'],
              ...

The following code makes the test pass:

You may have noticed that my code is flawed. I should have made a deep copy of "remaining" before passing it on, as should become clear soon. Today I think of it, but other days I might not, that's how it goes when values are mutable. I deliberately choose not to fix this now because my tests should force me to do it (let's see if they do!).

Another thing you may notice is that I do more than the test requires by selecting the position with least options. On a day when I'm the best version of myself, I might play a game of trying to outsmart the tests to force myself to specify things better, e.g. by just selecting the first open position. Not today.

Still oblivious and proceeding in a good rhythm, I add tests for propagating constraints in rows, columns and blocks, one at a time with the corresponding code that makes them pass.

One thing you may have noticed is that my test data would never come up in a real run of the program because there would be more constraints working together. The test depends on the knowledge that my algorithm doesn't care about those other constraints. It could be a risk to depend on knowledge about the algorithm in the tests, but it's worth it for having tests that are much simpler and easier to understand. I've carefully chosen to make the middle digit be alone (first pick) in the test cases to reduce my knowledge of the inner workings, so that whether the code picks the first or the last option of many I should discover whether the constraints are propagated or not.

The next test depends even more on knowledge of the inner workings, but again I prefer that the test is simpler. I add a comment about it because my reasoning is perhaps not immediately obvious. At least I am still mostly testing assumptions about results of the code and not how the code is written. Just note that it's a slippery slope.

The test fails with an exception "Bad state: No element" and I realize that I haven't handled the simple case where there are no options for an open position, so I comment out the failing test and deal with that first.

Going back to the backtracking test, it now returns null and fails, as expected since we hit the contradiction and don't make another choice. I decide I need to restructure the code quite a bit, still just mutating objects.

Running the test gives a very strange result that I'm not sure I understand.

00:02 +6 -1: internal solver contradiction is backtracked [E]                                                                                          
  Expected: (satisfies function and satisfies function and satisfies function and satisfies function)
    Actual: [
              ['3', '6', '.', '.', '.', '.', '.', '.', '.'],
              ['.', '.', '.', '.', '.', '.', '.', '.', '.'],
              ['.', '.', '.', '.', '.', '.', '.', '.', '.'],
              ...

Thinking really hard, I realize it is because I remove the best open positions from the same "remaining" array, so when we backtracked and try to place '3' and then '6', the "remaining" array is empty and the algorithm returns a solution. I obviously need to copy the remaining array before the recursive call.

Well, that did something, we're back to failing with a null (no solution found). Of course, I need to copy the OpenPosition objects as well so that each backtrack has the right options available!

I was sure that would work, but I only get 77 elements, I'll have to re-examine my assumptions. When I check the "digits" list, I see that it contains '0'-'8', not '1'-'9'. Doh! There are 4 nines in the input, so that should cover it.

I add a few more assertions to verify the parsed input, which all pass.

Excellent! Now we just need to put it all together and I write the test I deferred at the beginning of this article. The printing code takes a bit of thought but I decide that I can easily distinguish errors in printing from errors in solving so I don't need a separate test right now.

Surprisingly, I get 'No solution found'. What is going wrong here? I take a look at the code and have the idea that perhaps "choices.toList()" does not create a new list. So I add a check and run again.

That doesn't seem to be the problem though, I have to go deeper. I add a test to see what happens if I input an already solved sudoku. That also fails to find a solution. I double-check that it is a valid sudoku. WAT? I'm about to start extending the tests for rows, columns and blocks, when it jumps out at me from the constraint propagation code:

(where.x ~/ 3 == position.x ~/ 3 &&
                where.y ~/ 3 == position.x ~/ 3)

I remove that line and verify that the block-filling test fails. It does, but I see from the test output that I had placed the block in the top left corner, which has the same x and y coordinates, a bad choice for a test that depends on both x and y. So I change that test to be three rows further down, verify it still fails and that putting back the removed code line still fails until I correct it.

That does the trick, but now I get a type error on my full solution " type 'MappedListIterable<List<string>, String>' is not a subtype of type 'List<string>'". I have to do a websearch for that and it seems that the error is that the compile-time type-checking isn't quite working here. When I create the "rows" variable I just do a "map" operation, but that does not return a List, so I have to append ".toList()".

That was a bit sweaty, but now it works, apart from a newline missing from the end. I correct that, and I look over my code to see if I can clean it up a bit, e.g. some names need changing or code can be rephrased clearer. You can see the final result here.

Summary of the process

Don't start coding until you understand the problem.
Don't start coding until you understand well enough how you are going to solve it
- Explore various possibilities and evaluate
- Choose the simplest solution that is good enough
- If you cannot decide, ask yourself what information will help you decide and go find that out.
- Don't get stuck, flip a coin if you have to. If it turns out bad you will have learned something along the way.
Don't start coding until you know how the new code will fit into existing code, don't just add it the first place you think it might work. Refactor existing code to create a good fit, if needed.
Use tests to validate (or disprove) your assumptions.
- A failing test validates your assumption about what your code does not yet do correctly.
- A failing test that starts to pass validates your assumption that the code you wrote actually does something useful.
- An automated unit test that is kept and run periodically tests your assumption that "this code couldn't possibly break that code".
When you don't fully understand how your tools work, you need to explore, observe and test your assumptions about them. Just don't confuse exploratory coding with production coding, you still need to validate that you can't simplify the experimental code.
When everything works, make sure your code is easy to read and understand and clean up whatever can be cleaned up. Another pair of eyes is good here, do code reviews or pair or mob programming.

Creating an algorithm

2020-05-15T01:41:00.006-07:00

UPDATE: This article has been re-published with a code example in Dart instead of Tailspin here. The choices made and the TDD process ended up somewhat different so it may be worth looking at both. If you just want to see the code or the TDD process in this post, you can skip to it.

If you're in a hurry, jump to the summary at the end.

OK, when creating an algorithm we must first make sure we understand the problem and the requirements, so if you don't know sudoku, go read up on it.

What else do we know or can decide? I'm pretty sure I want to input the problem as text something like this:

534|678|912
672|195|348
198|342|567
-----------
859|761|423
426|853|791
713|924|856
-----------
961|537|284
287|419|635
345|286|179

What ideas do we have for how we will find the solution?

We could try to do it like humans do, with different heuristics like "if two positions both need one of the same two values, then either of those values cannot be assigned to a third position". This seems like it might be complicated and we can't be sure we have all the needed heuristics. Also unclear what data structure would best support the algorithm.
We could just store the placed digits and open spaces in a mutable array, much like our input format, and try each value in each open position and check for validity. The good thing about this is that we can easily try the states in order because we can reverse each decision easily, just replace the last placed digit with a '.' and backtrack. The bad thing is that we probably won't finish until the end of the universe, with about 9⁵⁰ possibilities (with 31 givens). We might still be ok if we check for consistency after placing each digit, but it seems likely we won't be reducing the search space fast enough, with too many high multipliers at the beginning of the search.
What if we kept track of the remaining possibilities in each open position and always selected the one with the fewest alternatives to try? That should keep the multiplicative combinatorial explosion down. This sounds promising and there are some data structure options, but before exploring those, let's see if we have any more ideas about how to solve the problem.
Another idea could be to try and fill out all occurrences of a digit before moving on to the next, but that just seems like a variation of the previous with unclear benefits. It's a possible extension if needed. I can't think of any way to generate look-up tables or otherwise speed things up. No other approaches come to mind right now.

Almost ready to code! What shall we do about the initial input state? Shall we represent the given digits as already placed and work out what the remaining options for the open positions are? Or shall we represent the given digits as open positions with one remaining possibility, just setting the open positions to have all possibilities remaining, and then just run it through the same algorithm all the way? The second option seems to be a slam-dunk so we don't get two pieces of code interpreting the same rules.

Now we just need to choose how we want to drive the tests, either through the text input described above or with the internal representation. Since it isn't too hard to create an internal representation in my chosen language and I think it might be easier to verify properties of the internal output, I will drive it internally, then separately drive the text conversion and add an integration test at the end.

The code

Right, let's code! I will be using my own programming language, Tailspin. If you want an intro to Tailspin, read this. First I set up a test that verifies that an already solved sudoku stays solved.

templates placeDigit
  $ !
end placeDigit

test 'internal solver'
  def sample: [
    [5,3,4,6,7,8,9,1,2],
    [6,7,2,1,9,5,3,4,8],
    [1,9,8,3,4,2,5,6,7],
    [8,5,9,7,6,1,4,2,3],
    [4,2,6,8,5,3,7,9,1],
    [7,1,3,9,2,4,8,5,6],
    [9,6,1,5,3,7,2,8,4],
    [2,8,7,4,1,9,6,3,5],
    [3,4,5,2,8,6,1,7,9]
  ];

  assert $sample -> placeDigit <=$sample> 'completed puzzle unchanged'
end 'internal solver'

That passes and I add a test that the last remaining digit gets placed:

test 'internal solver'
  ...
  assert [
    [[5],3,4,6,7,8,9,1,2],
    $sample(2..last)...] -> placeDigit <=$sample> 'final digit gets placed'
end 'internal solver'

Which fails, as expected:

internal solver failed:
assertion that final digit gets placed failed with value [[[5], 3, 4, 6, 7, 8, 9, 1, 2], ...

I add a fairly large amount of code to try to pass this test, which makes me wonder if that code is fully tested. It is usually fun to play a game of trying to outsmart the tests by writing simpler code than you need, e.g. I should look for the first open position instead of the best to begin with. But I'll leave that for a day when I'm a better person.

templates placeDigit
  templates nextDigit
    @:{options: 10};
    $ -> \[i;j](when <[](..~$@nextDigit.options)> do @nextDigit: {row: $i, col: $j, options: $::length}; \) -> !VOID
    $@ !
  end nextDigit
  templates set&{pos:}
    $ -> \[i;j](
      when <?($i <=$pos.row>)?($j <=$pos.col>)> do $(1) !
      otherwise $ !
    \) !
  end set
  $ -> set&{pos: $ -> nextDigit} !
end placeDigit

Surprisingly, I get an error (I need to file a bug report to get a better diagnostic here).

Exception in thread "main" java.lang.NullPointerException: No value defined for $pos.row

It turns out that I no longer handle the first case, so I have to add a check for that.

templates placeDigit
  templates nextDigit
    @:{options: 10};
    $ -> \[i;j](when <[](..~$@nextDigit.options)> do @nextDigit: {row: $i, col: $j, options: $::length}; \) -> !VOID
    $@ !
  end nextDigit
  templates set&{pos:}
    $ -> \[i;j](
      when <?($i <=$pos.row>)?($j <=$pos.col>)> do $(1) !
      otherwise $ !
    \) !
  end set
  def given: $;
  $ -> nextDigit -> #
  when <{options: <=10>}> do $given !
  otherwise def next: $; $given -> set&{pos: $next} !
end placeDigit

The tests pass and I add an assert for an unsolvable puzzle that fails and then code to make it pass:

templates placeDigit
  ...
  def given: $;
  $ -> nextDigit -> #
  when <{options: <=0>}> do [] !
  when <{options: <=10>}> do $given !
  otherwise def next: $; $given -> set&{pos: $next} !
end placeDigit

test 'internal solver'
  ...
  assert [
    [[],3,4,6,7,8,9,1,2],
    $sample(2..last)...] -> placeDigit <=[]> 'no remaining options returns empty'
end 'internal solver'

Proceeding in a good rhythm, I add tests for propagating constraints in rows, columns and blocks, one at a time with the corresponding code that makes them pass. I also have to make a recursive call at the end to keep solving all remaining digits.

templates placeDigit
  ...
  templates set&{pos:}
    def digit: $($pos.row;$pos.col) -> $(1);
    $ -> \[i;j](
      when <?($i <=$pos.row>)?($j <=$pos.col>)> do $(1) !
      when <[]?($i <=$pos.row>)> do [$... -> \(<~=$digit> $! \)] !
      when <[]?($j <=$pos.col>)> do [$... -> \(<~=$digit> $! \)] !
      when <[]?(($i-1)~/3 <=($pos.row-1)~/3>)?(($j-1)~/3 <=($pos.col-1)~/3>)> do [$... -> \(<~=$digit> $! \)] !
      otherwise $ !
    \) !
  end set
  ...
  otherwise def next: $; $given -> set&{pos: $next} -> placeDigit !
end placeDigit

test 'internal solver'
  ...
  assert [
    [[5],3,4,6,[2,5,7],8,9,1,[2,5]],
    $sample(2..last)...] -> placeDigit <=$sample> 'solves 3 digits on row'

  assert [
    [5,3,4,6,7,8,9,1,2],
    [[6,7,9],7,2,1,9,5,3,4,8],
    [1,9,8,3,4,2,5,6,7],
    [8,5,9,7,6,1,4,2,3],
    [4,2,6,8,5,3,7,9,1],
    [[7],1,3,9,2,4,8,5,6],
    [[7,9],6,1,5,3,7,2,8,4],
    [2,8,7,4,1,9,6,3,5],
    [3,4,5,2,8,6,1,7,9]
  ] -> placeDigit <=$sample> 'solves 3 digits on column'

  assert [
    [5,3,[4,6],6,7,8,9,1,2],
    [[6],7,2,1,9,5,3,4,8],
    [1,[4,6,9],8,3,4,2,5,6,7],
    $sample(4..last)...
  ] -> placeDigit <=$sample> 'solves 3 digits in block'
end 'internal solver'

One thing you may have noticed is that my test data would never come up in a real run of the program. The test depends on the knowledge that my algorithm doesn't care about the already placed digits. It could be a risk to depend on knowledge about the algorithm in the tests, but it's worth it for having tests that are much simpler and easier to understand. I've carefully chosen to make the middle digit be alone (first pick) in the test cases to reduce my knowledge of the inner workings, so that whether the code picks the first or the last option of many I should discover whether the constraints are propagated or not.

  // This gives a contradiction if 3 gets chosen out of [3,5]
  assert [
    [[3,5],[3,4,6],[3,4,6],[3,4,6],7,8,9,1,2],
    $sample(2..last)...] -> placeDigit <=$sample> 'contradiction is backtracked'

The test fails, as expected.

internal solver failed:
assertion that contradiction is backtracked failed with value []

Now we can't just output the result of the recursive call, we have to see if it resulted in a contradiction and, if so, remove the guess we made and make another.

templates placeDigit
  templates nextDigit
    ...
  end nextDigit
  templates set&{pos:}
    ...
  end set
  @: $;
  $ -> nextDigit -> #
  when <{options: <=0>}> do [] !
  when <{options: <=10>}> do $@ !
  otherwise def next: $;
     def result: $@ -> set&{pos: $next} -> placeDigit;
     $result -> \(<~=[]> $! \) !
     $result -> \(<=[]> $! \) -> ^@($next.row;$next.col;1) -> $@ -> nextDigit -> #
end placeDigit

Great, that works! Now I am quite confident that this will solve a sudoku. We just have to transform the input into internal form. Adding a new function to parse the input and the corresponding test for it (test first, of course!). Tailspin has a special syntax called a composer for specifying the result you want with regex-matchers for snippets of the input string. So we want an array that has three sections, each consisting of three rows and optionally ended with a line of dashes that we ignore, where each row has groups of three digits optionally separated by an ignored pipe-character.

composer parseSudoku
  [<section>=3]
  rule section: <row>=3 (<'-+'>? <WS>?)
  rule row: [<triple>=3] (<WS>?)
  rule triple: <digit|dot>=3 (<'|'>?)
  rule digit: [<'\d'>]
  rule dot: <'\.'> -> [1..9]
end parseSudoku

test 'input sudoku'
  def parsed:
'53.|.7.|...
 6..|195|...
 .98|...|.67
 -----------
 8..|.6.|..3
 4..|8.3|..1
 7..|.2.|..6
 -----------
 .6.|...|28.
 ...|419|..5
 ...|.8.|.79' -> parseSudoku;

 assert $parsed <[<[<[]>=9](9)>=9](9)> 'parsed sudoku has 9 rows containing 9 columns of lists'
end 'input sudoku'

I was confident that would work, but it doesn't:

Exception in thread "main" java.lang.IllegalStateException: No composer match at '53.|.7.|...

This is turning out to be more educational than I intended! Obviously I don't fully understand how this works (or I just made a mistake), but now I have to back up and check some underlying assumptions. First I'll check that the digit and dot rules are correct.

composer parseSudoku
  <digit|dot>
  rule digit: [<'\d'>]
  rule dot: <'\.'> -> [1..9]
end parseSudoku

test 'input sudoku'
  assert '9' -> parseSudoku <=[9]> ''
  assert '.' -> parseSudoku <=[1..9]> ''
end 'input sudoku'

out:
input sudoku failed:
assertion that  failed with value [9]

That's a surprise, the output looks like what I expect. It turns out, though, that the string '9' gets displayed like the integer 9 (filing an issue for better test output). It doesn't actually matter whether I use character strings or integers, the internal solver handles either, as long as we use the same type throughout. I will use characters (chosen by coin-flip), but that wasn't really the problem because the parsing actually works. So I extend the experiment a bit:

composer parseSudoku
  <digit|dot>=3 (<'|'>?)
  rule digit: [<'\d'>]
  rule dot: <'\.'> -> [1..9 -> '$;']
end parseSudoku

test 'input sudoku'
  assert ['139' -> parseSudoku] <=[['1'],['3'],['9']]> 'plain'
  assert ['139|' -> parseSudoku] <=[['1'],['3'],['9']]> 'with |'
end 'input sudoku'

out:
Exception in thread "main" java.lang.IllegalStateException: Composer did not use entire string. Remaining:'|'

So there we have it, my rule to match <'|'> doesn't match a '|' because it's a special character in regex syntax. I have to escape it with a backslash. I correct the issues, reinstate the original test and the test passes. I add two more assertions which pass immediately.

 assert $parsed(1;1) <=['5']> 'a digit'
 assert $parsed(1;3) <=['1','2','3','4','5','6','7','8','9']> 'a dot'

templates solveSudoku
  $ -> parseSudoku -> placeDigit -> #
  when <=[]> do 'No result found' !
  otherwise def result: $;
    [1..7:3 -> $result($..$+2) -> \section('$:1..11 -> '-';$#10;' ! $... ->
      \row( def r: $;
        [1..7:3 -> $r($..$+2) -> \triple('|' ! $... ! \triple)] -> '$(2..last)...;$#10;' !
      \row)
    !\section)] -> '$(2..last)...;' !
end solveSudoku

test 'sudoku solver'
  assert
'53.|.7.|...
 6..|195|...
 .98|...|.67
 -----------
 8..|.6.|..3
 4..|8.3|..1
 7..|.2.|..6
 -----------
 .6.|...|28.
 ...|419|..5
 ...|.8.|.79'
  -> solveSudoku <=
'534|678|912
 672|195|348
 198|342|567
 -----------
 859|761|423
 426|853|791
 713|924|856
 -----------
 961|537|284
 287|419|635
 345|286|179'> 'solves sudoku and outputs pretty solution'
end 'sudoku solver'

And there we have it! Apart from some whitespace errors it works like a charm. I correct those and look over the code. I don't like the names I have assigned to the templates. I can also consolidate the code for updating remaining possibilities and simplify the backtracking a bit because I will never find another position to continue from. The finished code is here.

Summary of the process

Don't start coding until you understand the problem.
Don't start coding until you understand well enough how you are going to solve it

Explore various possibilities and evaluate
Choose the simplest solution that is good enough
If you cannot decide, ask yourself what information will help you decide and go find that out.
Don't get stuck, flip a coin if you have to. If it turns out bad you will have learned something along the way.

Don't start coding until you know how the new code will fit into existing code, don't just add it the first place you think it might work. Refactor existing code to create a good fit, if needed.
Use tests to validate (or disprove) your assumptions.

A failing test validates your assumption about what your code does not yet do correctly.
A failing test that starts to pass validates your assumption that the code you wrote actually does something useful.
An automated unit test that is kept and run periodically tests your assumption that "this code couldn't possibly break that code".

When you don't fully understand how your tools work, you need to explore, observe and test your assumptions about them. Just don't confuse exploratory coding with production coding, you still need to validate that you can't simplify the experimental code.
When everything works, make sure your code is easy to read and understand and clean up whatever can be cleaned up. Another pair of eyes is good here, do code reviews or pair or mob programming.

A little Tailspin

2020-05-04T14:33:00.002-07:00

I was delighted to come across Uncle Bob's blogpost "A little more Clojure" the other day, where he concludes:

Now I want you to think carefully about how we solved this problem. No if statements. No while loops. Instead we envisioned lists of data flowing through filters and mappers. The solution was almost more of a fluid dynamics problem than a software problem. (Ok, that’s a stretch, but you get my meaning.) Instead of imagining a procedural solution, we imagine a data-flow solution.
Think hard on this – it is one of the keys to functional programming.

The way a programming language feels to use, like "fluid dynamics", is the main thing I was trying to get at in my previous article. Do your programming languages feel like molding clay, or building sandcastles or even wrestling bears?

This idea of thinking of data-flow, where streams of values get transformed step by step is the very foundation of the Tailspin programming language I am developing. To give a sampler of Tailspin, I thought I would follow the steps from Uncle Bob's article so please refer back to that as needed. At the end of the day, we should still remember that Clojure is a kick-ass production-ready language while Tailspin is still only a runnable prototype.

We'll begin with a function, called "templates" in Tailspin, that does nothing and an invocation:


templates primes
  !VOID
end primes

10 -> primes -> '$;
' -> !OUT::write

Note that when run, this code will do absolutely nothing. The "10" is sent to the "primes" template where it is sent into the void and nothing goes on to the next step. Tailspin does not have any null or nil values and does not require that a function returns a value each time. If "primes" were to output a value (or several values), the (each) value would be sent on to the string construct. To create a string, simply write something inside apostrophes. This is another foundation of Tailspin, that creation of values should happen in a very literal way. Inside the string, anything that starts with "$" and ends with ";" will be replaced by the value referred to. In this case we refer to the value with no name which is the current value coming through the "assembly line". The created string is then sent on to be written to the standard output. So now let's get the numbers from one to the input:


templates primes
  [1..$] !
end primes

When we run the program we get "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" output. Note again that to create a list, we simply write the elements inside square brackets. The ".." operator produces a range, so the elements in the list will be 1 to the current value, inclusive. The bang ("!") indicates that the produced value should be output from the templates. Now we have to filter out all the primes:


templates primes
  [1..$ -> ifPrime] !
end primes

In Tailspin, we can just filter directly on the stream of elements and we will make sure that "ifPrime" outputs a value only if it is a prime, otherwise nothing should be output. Of course we start with the do-nothing function:


templates ifPrime
  !VOID
end ifPrime

When we run our "primes" program, all the values get sent into the void, but the list is still created so we get "[]".

To work out if a number is prime we will divide it by all numbers up to the square root of the number, so let's start by calculating that and modifying our program to output the value of "ifPrime" instead:


templates ifPrime
  $ -> sqrt !
end ifPrime

100 -> ifPrime -> '$;
' -> !OUT::write

This gives "10" as a result, so far so good!

Next we want to get a list of the integers between 2 and the square root:


templates ifPrime
  def root: $ -> sqrt;
  [2..$root] !
end ifPrime

What's new here is that we define "root" to be the square root of the input value, and then we use that value as "$root". When we run this we get "[2, 3, 4, 5, 6, 7, 8, 9, 10]" as we should.

Now we want to work out which of those numbers divide the input evenly:


templates ifPrime
  def n: $;
  def root: $ -> sqrt;
  [2..$root -> $n mod $] !
end ifPrime

We have to save the input value by associating it with the name "n" so that we can use it later in a context where the current value has changed. We don't need an anonymous function as in the Clojure solution, or perhaps we can view each step in a chain as an anonymous function. Anyway, we transform each value by taking the remainder when n is divided by it. The output is "[0, 1, 0, 0, 4, 2, 4, 1, 0]".

If there is a zero in the produced list, n is not prime. So how can we check that?


templates ifPrime
  def n: $;
  def root: $ -> sqrt;
  [2..$root -> $n mod $] -> \(<~[<=0>]> $n ! \)!
end ifPrime

A third foundation of Tailspin is to be able to match values through a simple illustrative syntax. A matcher is defined inside angle brackets and the matcher here says that it matches if the current value is not (by "~") a list (by "[" and "]") that contains a zero (by the contained equality matcher "<=0>"). If it matches, the value of "n" will be output to the following step, which is to just output it from "ifPrime". Since there is no further alternative matcher, the output will be nothing if the list contains a zero. The "$" and "$" delimit an inline templates definition. This one is anonymous as well, but could have been given a name after "\".


100 -> ifPrime -> '$;
' -> !OUT::write

(No output)

17 -> ifPrime -> '$;
' -> !OUT::write

17

Finally, changing back to running the "primes" program:


100 -> primes -> '$;
' -> !OUT::write

[1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

OK, great, except for the "1", but I'm sure you can fix that. I'm not happy there, though, it still seems inefficient to keep dividing by all those numbers all the time. Couldn't we stop as soon as we get a zero?


templates ifPrime
  def n: $;
  def root: $ -> sqrt;
  2 -> \(
     when <?($n mod $ <=0>)> do !VOID
     when <..$root> do $ + 1 -> #
     otherwise $n !
  \)!
end ifPrime

So instead of doing the modulo operation for all potential divisors, we can start with 2 and here our inline templates has a series of matchers. The "?()" construct on the first allows us to compare a calculated value instead of the current value, and here we say that if the remainder is zero, we just stop by going into the void. If that doesn't happen, the next matcher is tried and if the current value is less than or equal to the square root of the input, we try the next number by sending it to be matched (by the "#" operator). If that isn't the case either, the input must be prime and we output it.

The result we get now is "[3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]". We lost the "2" (can you figure out why?), but I'm not concerned with that right now, I'm going to make more changes. So what if we use a similar trick to build the list of primes as we go along, couldn't we then just use that to test division only by the already found smaller primes?


templates primes
  def N: $;
  @: [];
  2 -> \(
    when <..$N> do $ -> ifPrime -> ..|@primes: $;
      $ + 1 -> #
  \) -> !VOID
  $@ !
end primes

Here we have some new constructs. When you "def" a value in Tailspin, it is immutable and can never change. But sometimes it is handy to have a mutable value, so each templates object has a mutable value called "@", which can be anything. Here we initialize it to an empty list. Then in the inline templates, as long as the current value is less than or equal to N we will increment and rematch after first checking if it is prime and, if so, append it to the list, which must be referred to as "@primes" because just "@" would have been the mutable value of the inline templates. Nothing is output from the inline templates, but at the end we output the accumulated list.

Running the program gives "[3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]", so all is still good. Now we just need to pass the list we are building to the ifPrime templates and use it there. The final program looks like this:


templates ifPrime&{primes:}
  def n: $;
  1 -> \(
     when <?($n mod $primes($) <=0>)> do !VOID
     when <..~$primes::length> do $ + 1 -> #
     otherwise $n !
  \)!
end ifPrime

templates primes
  def N: $;
  @: [2];
  3 -> \(
    when <..$N> do $ -> ifPrime&{primes: $@primes} -> ..|@primes: $;
      $ + 1 -> #
  \) -> !VOID
  $@ !
end primes

100 -> primes -> '$;
' -> !OUT::write

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Note that we changed ifPrime to loop on indices into the primes list, that indices start from 1 and that the "~" in "<..~$primes::length>" turns it into a "strictly less than" comparison.

EDIT: Stupidly, I now lost the optimization to only divide by numbers less than or equal to the square root of the input number. I'll let you think about how to resurrect that and post the final version at the end of the article.

Since Tailspin is still in a prototype stage, it doesn't have much in the way of libraries, not even a square root function, so I had to write one, but with proper tests this time. You can probably figure out how it works now ("~/" is truncation division):


templates sqrt
  def n: $;
  @: $;
  $n ~/ $@ -> #
  when <$@..> do $@ !
  otherwise @: ($ + $@) ~/ 2;
     $n ~/ $@ -> #
end sqrt

test 'sqrt'
  assert 1 -> sqrt <=1> ''
  assert 2 -> sqrt <=1> ''
  assert 3 -> sqrt <=1> ''
  assert 4 -> sqrt <=2> ''
  assert 100 -> sqrt <=10> ''
  assert 53 -> sqrt <=7> ''
end 'sqrt'

To limit ourselves to only dividing by primes less or equal to the square root, we can rearrange the matchers a bit:


templates ifPrime@{primes:}
  def n: $;
  def root: $ -> sqrt;
  1 -> \(
     when <?($primes($) <$root~..>)> do $n !
     when <?($n mod $primes($) <=0>)> do !VOID
     otherwise $ + 1 -> #
  \)!
end ifPrime

Is your programming language made of multidimensional plasma?

2019-11-06T13:01:00.000-08:00

Published on Cygni website

The perfect programming language

2019-10-13T12:15:00.002-07:00

This post has moved to https://cygni.se/the-perfect-programming-language/

Java enums are not constants

2018-07-28T05:32:00.001-07:00

Every now and then I come across a person who points out that my enum values should be written as all upper case with underscores because in their minds an enum is a constant. I find myself disagreeing but haven't previously managed to explain why.

Historically in java we would use constants where we in other languages would have used an enum, so it doesn't seem unreasonable to consider enums to be constants. And yet they are not.

Consider the following code where we have tacos on a Friday as we like to do in Sweden:


class DinnerPlanner {
...
  Menu createMenu() {
...
    if (today.equals(DayOfWeek.FRIDAY)) {
      menu.add("tacos");
    }
...
  }
...
  void buyIngredients() {
...
    if (today.equals(DayOfWeek.FRIDAY)) {
      buy("taco shells", "tomatoes", ...);
    }
...
  }
...
  void cook() {
...
    if (today.equals(DayOfWeek.FRIDAY)) {
      oven.put("taco shells");
      fryingPan.put("mince").put("taco spices");
    }
...
  }
}

After some time we get more and more influences from the US and we want to make tacos on Tuesday instead.

And now it should be evident: DayOfWeek.FRIDAY is not a constant, it is a hard-coded value. We could introduce a constant:


  static final DayOfWeek TACO_DAY = DayOfWeek.FRIDAY;

Now we can just reassign the constant TACO_DAY to the value DayOfWeek.TUESDAY.

In many other languages, it is possible to say that an enum has a value, e.g. DayOfWeek.FRIDAY might correspond to the integer value 5 and we can use FRIDAY or 5 as we see fit, but not so in java where FRIDAY is the value in the APIs and we are actively discouraged from thinking about its ordinal value in the list.

Consider also my previous post where instead of switching on an enum value to determine the code, we can actually implement the code in the enum.

Still think a java enum is a constant?

WTF-debugging: the case of the unfortunate design choices fooling perception

2018-03-18T12:28:00.001-07:00

A few of my former colleagues at Google really love golang so I decided to start playing with it. I highly recommend the interactive tour https://tour.golang.org to get a quick sense of it. It's a fairly nice language, simple but with the parts you really need, feels nice and "javascripty" in object creation but still structured and typed strictly enough.

However, there is at least one case where the desire to avoid too prescriptive syntax results in an unfortunate combination of design choices leading to WTF-debugging.

Consider https://play.golang.org/p/yUlFI3aqYSS (run it, it prints 1,2,3,4). Now change line 12 (which could have been defined much further away, even in another file) by adding an asterisk in position 8 before Payload, to read:


func (p *Payload) UploadToS3() error {

Run it again and observe 4,4,4,4! Note that the loop where this happens is unchanged, but the loop variable "payload" has magically been changed from a value to a pointer. Spooky action at a distance now causes a different part of your program to be wrong. Keep staring at the loop and you will never figure it out.


    for _,payload := range payloads {
        go payload.UploadToS3()
    }

In java we would always know what is a value and what is a reference and we are of course also saved by the fact that variables used in closures have to be final (or effectively final). And in a functional language the variables would be immutable so this would never happen there either. In javascript, though, we deal with this all the time, so a javascript programmer might be more confused that the first version actually worked. One of the problems in go is that we can have either values or pointers, but we don't have to be explicit about it because the compiler is too helpful. Another problem is that it is unclear what code is executed now and what code is executed later. The word "go" is (at least not yet) a strong signal to my mind that the code after it is actually executed later. I have to go into deep analysis mode before I have a chance to perceive that. Consider the difference if we had been forced to write a little more, would it have been clearer?


    for _,payload := range payloads {
        go func() {
          payload.UploadToS3()
        }()
    }

I think that is slightly clearer, but I think it still indicates the problem with inline closures for asynchronous computing. In java I would tend to recommend to never use anonymous inner classes, but take the time to make it a named inner class, defined elsewhere, so you have a better chance of realizing that the code will execute at another time. I have seen otherwise excellent coders stumble on this temporal misperception.

Interestingly, the perceptual difficulty of when code executes can also go the other way. When using Mockito, the below code doesn't immediately signal your brain that the bar() method actually gets called.


    when(foo.bar()).thenReturn(5);

WTF-debugging: the case of the obscure configuration

2018-02-23T11:44:00.000-08:00

Ever stared at a piece of code without understanding why in the world it does not work? Or why it actually works at all? I'd like to call this phenomenon WTF-debugging and I've been remarkably free of it since I've been doing framework-free backend java. But now I have a new job and we use all the popular frameworks, for better and for worse. Well, really only for worse, IMO, but I will write more on that when I understand my aversion better. So far, I have discovered that the tendency to want to write frameworks is very strong because it is the ultimate intellectual masturbation. The tendency to want to use frameworks is more puzzling, but I think we are all attracted to magic to some degree and there is a powerful illusion that frameworks provide a lot of value, automagically.

We are working with JSON in Java and using Jackson, and I had a little problem where one of the fields of the main object could be a different type depending on what the object represented, as indicated by a type name in another field. So I had to work out how to configure Jackson to handle it, which turned out to be a little challenging. After an hour or so I hit upon a fruitful phrasing of the search terms and found a solution.


@JsonDeserialize(builder = Attachment.AttachmentBuilder.class)
public class Attachment {

    private final long id;
    private final String attachmentType;
    ...

    public interface ExcelSheets extends List<ExcelSheet> {}

    private static class ExcelSheetsImpl extends ArrayList<ExcelSheet> implements ExcelSheets {}

    @JsonTypeInfo(use = Id.NAME, property = "attachmentType")
    @JsonSubTypes(value =  {
            @JsonSubTypes.Type(value = ExcelSheetsImpl.class, name ="excel")
    })
    private final Object typeSpecificData;

    Attachment(long id, String attachmentType, long reportId, String filename, Object typeSpecificData) {
        this.id = id;
        this.attachmentType = attachmentType;
        ...
        this.typeSpecificData = typeSpecificData;
    }

    public long getId() { return id; }

    public String getAttachmentType() { return attachmentType; }

    ...

    public Object getTypeSpecificData() { return typeSpecificData; }


    @Override
    public AttachmentBuilder builder() { return new AttachmentBuilder(this); }

    public static class AttachmentBuilder implements Builder {
        private long id;
        private String attachmentType;
        private Object typeSpecificData;
        ...

        public AttachmentBuilder() {}

        AttachmentBuilder(Attachment attachment) {
            this.id = attachment.id;
            this.attachmentType = attachment.attachmentType;
            ...
        }

        @Override
        public AttachmentBuilder withId(long id) {
            this.id = id;
            return this;
        }

        public AttachmentBuilder withAttachmentType(String attachmentType) {
            this.attachmentType = attachmentType;
            return this;
        }

        public AttachmentBuilder withTypeSpecificData(Object typeSpecificData) {
            this.typeSpecificData = typeSpecificData;
            return this;
        }

        ...

        @Override
        public Attachment build() {
            return new Attachment(id, attachmentType, reportId, filename, typeSpecificData);
        }
    }
}

Now I could be happy with that and sing the praises of Jackson and "look how elegantly it got configured". But should I?

Even when I have this solution before me, I still can't quite figure it out from the documentation (WTF?), so what will happen in six months time when I have to modify this code?

And here comes an even bigger WTF: change the type "Object" for typeSpecificData to "ExcelSheets" and deserialization no longer works! (What I really wanted to do was to introduce a marker interface, TypeSpecificData, but, as you can surmise, that didn't work either.)

Even though Jackson is (sadly) perhaps the easiest way to handle JSON in Java, I think there may be good reasons besides the above to avoid using it. I won't go into that in detail in this post, but I will leave with this thought: Jackson or any other automagic data-binding framework entices you to create Java objects that match the serialized JSON, but your serialization format is not your internal model, at least not forever, because the two have different reasons to change. So even after you get a deserialized object from Jackson, you should probably write lots of code to transfer the data into your internal representation. Then what did you gain?

Prove your assumptions, but remember Murphy was an optimist

2011-01-05T21:49:00.000-08:00

Continuing on my unremarkable coding task, having gotten it to work, it was now time to clean up the code.

Given the large load expected, I couldn't allocate a new direct ByteBuffer for every piece of data I wanted to handle. But code that accepts a ByteBuffer is equivalent to code that accepts a byte array, a starting offset and a length (or a limit), right? So I just set the position and the limit on the boundaries of the data and I'm good to go.

Now I ran into some of my own previous assumptions. Luckily, the failure resulted in a log message that made it clear where something was going wrong. It was one of those places where a deviation from the happy path either cannot happen, should not happen or we didn't really care too much. A place where I used to code an empty block "{}", or when I was feeling diligent, with a comment "{/*cannot happen*/}". Now I'll code it as "{throw new AssertionError("Cannot happen.");}" or at least "{LOG.fine("Don't care.");}".

Anyway, a quick analysis showed a likely cause. I say likely, because it's an assumption until proven. I wrote a quick unit test that failed, fixed the code and the test passed. But the program still failed, with the same log message as before. Imagine what I might have been doing if I hadn't proven the failure and the fix with the unit test, I'd still be staring at the same place and possibly ripping my code to shreds in desperation. Now I could immediately move on and fix the second error.

OK, on a roll now, all assumptions proven by assertions, I'm just about done. Except that the code still doesn't work and no sign of why. Murphy's law in full action.

Finally I find it. There's a trap in ExecutorService. What's the difference between calling "execute(Runnable)" versus "submit(Runnable)"? Nothing much, when the code works. But "submit(Runnable)" should have a big red warning sticker. It returns a Future, with no result. You don't bother to "get()" nothing. The devastating side-effect is that all exceptions get preserved until "get()" is called, so this is a hidden equivalent of "catch(Exception e){}". Next task: change this everywhere and add a rule to FindBugs.

Your assumptions are dangerous, you know too much.

2011-01-01T11:06:00.001-08:00

I have just completed a rather unremarkable piece of code. The system was designed to allow this type of addition, so it just took a couple of hours or three to write the code and touch up the parts to selectively enable the functionality by user.

Then a colleague and I spent two days debugging. We hacked around issue after issue, just to prove basic workability before trying to solve the issues.

The final obstacle was that our socket channel was not being selected for reading on the selector. So I hacked around that by creating a new thread to loop infinitely around reading the channel. Which made things work fine up to the point where actual data was coming in and the server blew up with a segmentation fault.

After reflecting while travelling on the bus home, I remembered that perhaps our JNI code needed a direct byte buffer, with data starting at position 0 and the whole backing array available for use. Inconvenient in this case, but another little hack and everything worked like a charm.

Back to the previous question: why no reads? Perhaps my server socket in non-blocking mode actually returned a socket configured for blocking mode instead of for non-blocking mode? It did, and explicitly setting it to non-blocking fixed it.

As I backtracked through the issues, it struck me that every single problem we'd run into had to do with assumptions.

I could of course have checked my assumption about the blocking mode. But I also have another assumption, which is more valid: incorrect usage of an API should not fail silently. This turns out to be correct, because a SelectableChannel throws an IllegalBlockingModeException.

Unfortunately, the "helper" framework that we have in place inadvertently masked that by running the register call in a FutureTask that had a boolean "success" return value that nobody had found the need to check, because "true" was the only possibility. Well, that is, assuming no exceptions are thrown.

Perhaps there is also a flaw in the assumption that a helper framework that obscures the standard API is actually helpful.

Certainly, the direct byte buffer assumption mentioned above should probably have been asserted somewhere, it's easy enough to throw an IllegalArgumentException if buffer.isDirect() returns false. Obviously, the programmer who created the JNI call was not assuming we had a direct byte buffer, he knew we had one. But that's the trick of maintainable and re-usable object oriented code: you cannot rely on any knowledge outside the class you are currently in. From the point of view of the class, such knowledge is an assumption.

Another issue I had hit on the way concerned the UserIdentifier class. It is really just a wrapper around a string, but because it has a specific semantic meaning it was correctly exposed as a separate value class. To limit the new functionality by groups of users, I found it convenient to construct the user identifier slightly differently. The code did not work as expected.

At another point in the code, a programmer had used his knowledge of how the user identifier was constructed, which introduced a hidden assumption about the structure of the user identifier. The correct code would have been to obtain the user identifier from a single authoritative place.

The root cause of the user identifier hack was a design assumption that two things were independent. When they were not, a co-dependency web was created. In a sense, this is also a case of classes having too much knowledge: they knew about too many different other classes that they needed to collaborate with. To keep your code clean and honest, such things need to be refactored (re-designed).

Consider the small amount of care and time it would have taken to avoid the assumptions in the first place and compare it to the four man-days of lost productivity that was caused. We are always under time pressure, but that will be the case next week, month or year as well. If you don't pay the price now, you pay it with interest later.

The more things your classes know about your system, the harder it is to change or re-use the code. Make sure that the knowledge you put into a class is appropriate and doesn't create a dangerous web of assumptions.

Clarity of code is clarity of thought

2009-08-28T11:43:00.000-07:00

I remember at my first job when we introduced the concept of code reviews. I don't think anybody really looked at anybody else's code before pressing the "approve" button, I know I didn't. Reading code is boring and it can be hard and it felt like a waste of time. I had my own code to write and why shouldn't the author of the code have written it right in the first place?

When I moved to another job, they talked about how, in theory, code reviews was the number one most effective way to increase code quality. But they had given up on it, because they felt it came down to a discussion of where to put the dots and commas (or, rather, semi-colons and parentheses).

Quite aside from the issue of code reviews, I had come to realize that I spent much more time reading my code than I spent writing it. Every debugging session is spent reading code over and over. Every time you have to add a feature or change some functionality you have to read the code, and re-read it to avoid breaking stuff. Don't tell me tests will do it for you. Now don't get me wrong, tests are great and I strongly advocate test-first coding, it's a great way to achieve focus and clarity of thought. But when a test fails, you're thrown into debugging mode, which means reading code.

So I concluded it was worth spending a little extra time typing longer variable names, and taking the time to find descriptive names. It was worth spending a little more time breaking down those long methods and simplifying those complex structures. Whenever I was reading code that made me stop and think, I would usually refactor it to be clearer (although the term refactoring hadn't been invented yet). I would also change existing code to make a new feature fit in better, in a more readable and more logical way.

In "The Pragmatic Programmer" the distinction is made between "programming by coincidence" and "programming by intention". We all have to do it occasionally, a little trial-and-error programming, because we're not quite sure how things work. That's "programming by coincidence". Before you check your code in, you want to make sure you understand what each statement does and that all unnecessary statements are removed. Then you've transformed the code from coincidental to intentional.

But that's not enough. Your very functional and intentional code is going to lose you valuable time unless you also transform it to readable code, which clearly displays your intent.

A much-touted wisdom is that you should document and comment your code. Fair enough, that works, but it has many weaknesses. Your energy is far better spent making the code explain itself. Comments often lie, but the code is always pure truth, so prove it in code. Only comment on "why" you are doing something, and that only if you cannot make it evident in the code.

I like the following sound-bite from "Refactoring": "Any fool can write code that a machine can understand, it takes a good programmer to write code that a human can understand."

Test-first "anything" is efficient and focused because it sets up the criteria for success and the means to measure it up front. So what's the best way to test if your code is readable? Get another person to read it, i.e. a code review.

I'm very grateful to those who review my code carefully and pick on every detail, it makes the code better and it helps assert that my thinking was clear. That gratitude gives me the energy to return the favour by reviewing their code equally mercilessly.

You will sometimes, but rarely, find bugs by just reading code (only because everybody has a brain-fart now and then). But the real value of the reviews is in the "dot and comma" discussions and especially in picking good names. In addition to making sure that the code is easy to read, it will sometimes bring a real little nasty bug to the surface.

An example: An index into an array of values is stored into a variable called "value". When the reviewer makes you change the name to "valueIndex" instead, some parts of your code may start to look weird (the bug was exposed).

Clarity of code really is clarity of thought.

Using Java concurrency utilities

2008-12-25T14:22:00.000-08:00

The inspiration for this post comes from Jacob Hookom's blog and I can only second the recommendations he gives. Although, as always, I would caution to test any such implementation properly, that it works well and actually provides a benefit. There are lots of pitfalls and concurrency is tricky even with the excellent utilities provided in Java.

To summarize the interesting problem: parallelize the execution of lengthy tasks in a web request, without creating many threads for each request, but also ensuring that the thread pool is not starved by one request. The idea is to have a reasonably sized thread pool and to limit the number of tasks executing in parallel to a number small enough to allow the expected amount of concurrent requests to share the pooled threads.

Essentially, limiting the number of tasks executing in parallel can be done in two ways: limit the number of tasks submitted at one time or limit the number of workers that execute a set of tasks. Jacob takes the first approach, I will take the second approach, which seems to make it simpler to manage time-out issues.

Here's some code:


<V> Queue<Future><V>> submit(int numberOfWorkers, Queue<Callable><V>> tasks,
                           long timeout, TimeUnit unit)
  throws InterruptedException, TimeoutException {
Queue<Future><V>> result = new ConcurrentLinkedQueue<Future><V>>();
List<WorkerTask><V>> workers = new ArrayList<WorkerTask><V>>(numberOfWorkers);
for (int i = 0; i < numberOfWorkers; i++) {
   workers.add(new WorkerTask<V>(result, tasks));
}
List<Future><Object>> deadWorkers
    = executor.invokeAll(workers, timeout, unit);
for (Future<Object> obituary : deadWorkers) {
  if (obituary.isCancelled()) {
    throw new TimeoutException();
  }
}
return result;
}

And the code for a WorkerTask:


private static class WorkerTask<V> implements Callable<Object> {

private Queue<Callable><V>> tasks;
private Queue<Future><V>> result;

public WorkerTask(Queue<Future><V>> result, Queue<Callable><V>> tasks) {
   this.result = result;
   this.tasks = tasks;
}

public Object call() {
  for (Callable<V> task = tasks.poll(); task != null; task = tasks.poll()) {
  FutureTask<V> future = new FutureTask<V>(task);
    future.run();
    if (Thread.interrupted()) {
      Thread.currentThread().interrupt(); // Restore interrupt.
      break;
    }
    result.add(future);
  }
  return null;
}
}

Note that it is important to have thread-safe collections for tasks and result, we should actually make sure that the tasks are in a thread-safe collection, but I'll ignore that for now. Note also the check if the thread has been interrupted in the call() method of WorkerTask. That is vital to be able to cancel the task when you don't want to wait for it any longer (i.e. on time-out). If possible, the submitted tasks should also handle interrupts. Note the careful restoration of the interrupt status so that the caller of the method may also be notified.

GC is for Goodstuff Collector

2008-11-24T12:55:00.000-08:00

I have over the past few months noticed that there is a fairly common fear of creating objects in Java. When I query the authors, it always seems to boil down to an attempt to create more performant code through avoiding garbage collection.

So why would one want to create more objects, anyway? Well, one good reason would be to get more readable code, e.g. through encapsulating an algorithm or using appropriate helper objects to express an algorithm more clearly.

Even when the code does get put into a helper object, there seems to be a tendency to keep that instance around and reuse it, to avoid garbage collection so that the code performs better. My first comment is always "You should measure that". If I wanted to put it more sharply I should say "If you can't prove it's a performance benefit, then it probably isn't". I have worked with enough optimization to know that what's true for one language will not be true for another. Even using the same language, something that gives a performance benefit on one machine may be a detriment on another kind of machine (or even the same kind of machine with different "tweaks" like page sizes and such).

If you create a temporary object that lives only as long as you need it you gain the following benefits:

Your object is always in a pristine state when you want to use it.
Your code is a big step closer to being thread safe.
Your code is more readable and easier to analyze.
Your garbage collector gets to do less work.

"What?", you say, "The garbage collector does less work by collecting more garbage?".

Indeed it does. When we hear "garbage collector" we tend to think of the work involved in clearing out an overfilled store room, all that work to haul all the garbage out. But the garbage collector doesn't even know the garbage exists, the very definition of garbage is that it can no longer be reached from anywhere. What the garbage collector really does is create a whole new store room and move the stuff you want to keep over to it and then it lets the old store room and all the garbage disappear in a puff of smoke. So all the work done by the garbage collector is really done to keep objects alive, i.e. the GC is really the "goodstuff" collector.

This is obviously a somewhat simplified view and I don't think it holds completely for objects with finalizers (which is probably why finalizers are really bad for performance). Every single measurement and microbenchmark I've done confirms that creating and destroying objects is never worse and often much better than trying to keep objects around. I've done a few, first because I didn't trust it, then because others were so convinced of the opposite that I had to research it. That isn't an absolute proof that it will be true for all scenarios, but I think it's a pretty good indication of where you should be pointing when shooting from the hip.

From an IBM developer works article, we get the numbers that allocating an object in Java takes about 10 machine instructions (an order of magnitude better than the best C/C++ allocation) and that throwing an object away costs, you guessed it, absolutely zero. So just remember that GC stands for "Goodstuff Collector" and you'll be on the right track.

The legacy of inheritance

2008-08-16T11:39:00.000-07:00

Is inheritance really useful or is it a feature that causes more problems than it solves? Certainly I can't think of a case where I've been really grateful that I've been able to inherit from a superclass but I can think of several instances where it has caused friction and where the extension mechanism of inheritance tended to lead the programmer the wrong way.

Consider the following code (public fields for brevity):


public class Square {
  public int base;
  public int getArea() {
  return base * base;
 }
}

public class Rectangle extends Square {
  public int height;
  @Override public int getArea() {
    return base * height;
  }
}

public class Parallelogram extends Rectangle {
  public double angle;
}

Now how do we implement a Rhombus? Is it a Square extended with an angle (and overridden area calculation) or a Parallelogram with conditions on setting the properties so that the invariant is preserved (which is why we should have accessors, by the way)?

Well, the correct answer is neither, even though we have a nice sequence of extensions. The problem is that we have been led astray by the extension mechanism and violated the "is a" rule for subclassing and ended up with a corrupt type system. Clearly a Parallelogram is not a Rectangle which equally clearly is not a Square so a subclass instance may not safely be used in place of a superclass instance. Reversing the class hierarchy solves the issue, however it creates subclasses that are restrictions of the superclass rather than extensions.

An acquaintance of mine who is highly experienced in creating standardized components for information interchange once claimed that it only works to inherit properties by restriction. This is well worth considering, although I'm not entirely convinced because it causes a different set of problems when restriction means "that particular property is not used", as it so often does in information interchange scenarios. In that case it is usually not interesting nor useful to know what the superclass is. In our case at hand it will work because the subclasses actually have a meaningful and useful value of the property but it is restricted by an invariant, so it is also useful to be able to view them as instances of the superclass.

Let's try coding again:

public class Parallelogram {
  protected int base;
  protected int height;
  protected double angle;

  public void setAngle(double angle) {
    this.angle = angle;
 }
}

public class Rectangle extends Parallelogram {
  @Override public void setAngle(double angle) {
      // What should we do here?
  }
}
...

Oops, this doesn't seem to work well, either. Obviously we can't just inherit the setAngle method or we could end up with a corrupt Rectangle, nor is there any reasonable action we can take. We can get out of our pickle, however, because constructors are not inherited! Simply make our instances immutable:

public class Parallelogram {
  public final int base;
  public final int height;
  public final int angle;

  public Parallelogram(int base, int height, double angle) {
    this.base = base;
    this.height = height;
    this.angle = angle;
  }

  public int getArea() {
    return base * height;
  }
}

public class Rectangle extends Parallelogram {
  public Rectangle(int base, int height) {
    super(base, height, Math.PI/2);
  }
}

public class Square extends Rectangle {
  public Square(int side) {
     super(side, side);
  }
}

That seems to work, and the implementation of getArea() is actually profitably inherited. But we still have a problem in Java with a Rhombus, since a Square is both a Rectangle and a Rhombus but a Rhombus is not a Rectangle. To solve that would require defining the types as interfaces and have separate implementation classes (now I understand why that is often recommended :-) ).

If we had chosen to implement Parallelogram with a field for the length of the side instead of the height, the formula for the area would have been base*Math.sin(angle)*side. Even if this would have worked for our Rectangle, it would have been inefficient (although when the fields are immutable it could probably be optimized by the compiler and/or JVM).

On the whole, I believe it is seldom profitable to inherit the implementation of a method, if you have a good example to the contrary, please share it.

I think that overriding all public methods would be preferable, even if you just decide to call super's version, at least it would show that you gave some thought to whether it was correct or not. Without having done an exhaustive study, I believe this is especially true for interfaces that indicate that a class "is" something rather than that it "is a" something, i.e. those interfaces whose name usually end in "able". Try Cloneable, Externalizable, Runnable and Printable.