Rising above the mechanics of computation
UPDATE 2026-04-23: Added piece on logic languages
UPDATE 2026-04-24: Added examples of COBOL and Ada
This essay started with a short talk I gave for the local developer community, where I demonstrated that most programming languages are based on primitives of the mathematical models of computation, but that it instead might be worth basing it on the concepts that programmers need.
That got me thinking: does this have real consequences? If so, what can be done?
Studies have indicated that language ability is more important than mathematical ability for programming. Recruiters have long since realized that a computer science degree is neither necessary nor sufficient for a good software engineer.
In contrast, the models of computation that current programming languages are based on are mainly mathematical in nature.
This creates an intent-to-implementation gap where an intent has to be translated into a code structure that may look very different from the intent.
This essay explores how programming languages are stuck in the primitives of the models of computation, how that impacts the reading, writing and logical correctness of code and suggests some ways that things may be improved.
As a first observation, SQL, XSLT and spreadsheets are examples of programming that are very successfully performed by people without any programmer training. They may be doing something right.
Implications of the intent-to-implementation gap
I believe there are real economic consequences arising from this gap, more difficult to write gives more bugs, more difficult to review increases the time spent.
Like it or not, practically every professional programmer is already using an LLM to assist in coding in some way. Since LLMs are based on human language, not mathematics, the intent-to-implementation gap likely inhibits successful use of LLMs as coding assistants.
Since LLMs are models of human language, and human language is used to specify the intent of the program, it stands to reason that programming languages that work better for humans will also work better for LLMs and vice-versa.
DISCLAIMER: Whatever I claim the LLM has "said" below, do take it with a grain of salt. I think it makes sense, but we would have to do research to be sure.
While there are implementation patterns that can be learned, they are not direct replacements for the intent and often get other intents mixed in. The friction of translating back and forth between implementation and intent is not ideal for humans and is, according to both Claude and Gemini, the biggest source of errors in LLM assisted coding.
A particular form of the gap is the lack of sequentiality. Budding programmers quickly learn to jump around when reading code instead of reading sequentially. When trying to understand code, it is not always easy to know where else one needs to look, either for humans or LLMs.
When writing code, the human programmer will jump back and forth and add details in different places as needed. For the LLM assistant, writing is always sequential. Any error early on has a huge impact on code quality. If it is even possible to correct, the following code will become unnecessarily complex.
Note that for LLMs, the amount of training data available in a language is a significant factor in reproducing syntax accurately, and a language with "loose" logic makes it easier to write code that actually runs, but it might not be logically correct and is often more complex than it needs to be. This works well enough for small one-offs and non-programmer projects, until it gets bogged down after a few iterations. Loose logic and simple syntax also help humans produce working code through trial and error and such languages are appreciated by learners and non-professionals. They start losing their appeal when maintaining logical correctness over time becomes a bigger priority.
What I am aiming for in this essay is logically correct code, not necessarily syntactically entirely correct. It should be easy enough to iterate on the syntax later (and the situation will improve when there is more training data available as a language becomes more popular)
The primitive mechanics
When designing a programming language and considering what it needs to consist of, it is natural to first look at the models of computation to make sure that it will be possible to create all desired programs.
This is a quick and simplified recap of those models, avoiding lots of interesting rabbit-holes.
The Turing machine
Specified in 1936 as a machine designed to be able to perform any computation that a mathematician could do with pencil and paper, the machine has an infinite paper tape and a "head" that can read and write marks on any position of the tape. The head can from some internal state and set of rules determine where to go next along the tape.
This translates almost directly to the von Neumann machines most of us use, that have a large enough addressable memory and a cpu that makes the decisions according to a program and contents of the memory.
In our programming languages, we can recognize named mutable variables and conditional statements that determine what instruction to process next.
Lambda calculus
Also in 1936, Alonzo Church published the lambda calculus as a system of notation that could express any mathematical computation. Alan Turing quickly proved that it had equal computational power to the Turing machine.
In our programming languages we can recognize functions with parameters that get replaced with values on application (calling).
Note that functions are called by nouns, such as "sum" and "square root", there are no verbs in this paradigm.
Types for stability
The raw lambda calculus could express paradoxes, never-ending calculations and nonsensical expressions. This is the whole point when it comes to delivering a negative answer to the "Entscheidungsproblem", but it is inconvenient when using the notation for practical calculation.
By applying a simple type system and forbidding self-reference in the simply typed lambda calculus, complete logical consistency was obtained (it was a so-called total programming language). Unfortunately, it was no longer possible to express all calculations one might wish to express, i.e. it was no longer a turing-complete language.
Nevertheless, types as a stabilizing influence is an important idea. Additionally, types can often be resolved without performing the whole calculation, enabling an "early-warning system" for potential logic errors.
The Actor model
Formally specified in 1973 by Carl Hewitt. On receiving a message, an actor can send messages to other actors it knows about, create new actors and decide its state for handling the next message. This model is at least as powerful as the Turing machine but also contains notions of parallelism and asynchronicity.
In our programming languages, we see some likeness in the object-oriented expression of code.
In the purest expression of the actor model, the only important thing is the "verbs" or "messages" as requests for an actor to do something. What an actor IS is unimportant other than it can be helpful to have a name for the role the actor plays.
Objects are inevitable
The concept of objects has been discovered at least four times:
- 1962-1967: The Norwegians Ole-Johan Dahl and Kristen Nygaard were doing modeling of real-world systems and created independent entities that could react to information from the outside. They created "classes" in their Simula language, which were a bit closer to an actor process than they were to a C++ class.
- 1972: Alan Kay defines the term Object-oriented. He was thinking of independent cells or organisms that reacted to electro-chemical "messages" from surrounding organisms. The language Smalltalk that he created didn't quite live up to the vision, but is still very interesting.
- 1973: Carl Hewitt defines the actor model.
- 1986: Joe Armstrong (and colleagues) create the Erlang language to run things like telephone exchanges that must keep serving all requests they can. Erlang processes happened to become a near-perfect implementation of the actor model, although they did not know it at the time (ironically, they were trying to steer as far away from what they thought of as object-orientation as possible).
Data types versus objects
When C++ added "objects" to the C language, they were just abstract datatypes with some functions attached. This is really just an alternate way to organize functional-style code. In the Universal Function Call Syntax, a.f(b) is just syntax sugar for f(a, b). That said, organizing all functions that apply to a data type together has proven to be quite beneficial to program maintainability.
Alan Kay has expressly stated that when he coined the term "object-oriented", C++ was not what he had in mind. Even so, there has been much confusion, even in academia, between data types and "real" objects. It wasn't until 2009 that William Cook started to clear things up with the paper "On Understanding Data Abstraction, Revisited"
A triangle of primitive programs
These three models of computation can be plotted as a triangle. At the top is procedural programming corresponding to the "turing machine" where the programmer works with verbs and nouns and general data structures. Down the left side, the programmer becomes more and more specific about "what things ARE" in relation to the current program and ends up in the pure functional or "lambda calculus" corner. Moving down the right side instead, the programmer defines more and more accurately "what things DO" in the current context and ends up in the "actor model" on the bottom right.
All programs written in any mainstream-ish language can be plotted inside the triangle, depending on how that program is expressed in terms of the primitive expressions of the models of computation, but you can't escape the primitive mechanics like if-statements, map and filter functions and message sending.
We can also note that having a language that allows all three modes of expression is not an advantage. Two programs that do exactly the same thing can look radically different and people struggle to understand each others' code.
The intent-to-implementation gap
I'm sure it isn't difficult to convince you that C-style procedural programming at the top of the triangle has a large gap, forcing you to transform your intent into somewhat obscure sequences of variable reads, writes, comparisons and jumps. C programmers actually prefer this for the control it gives of the machine itself.
When you get down to the base of the triangle, it's easier to squint at it and claim that it matches your intent. Here is haskell, pretty much down in the left corner.
But this isn't as close to the intent as we want to believe. Previously I claimed that everything was a noun, but you can of course use verbs to name a function. It still doesn't quite make it a verb, because by referential transparency it can be replaced by its result. Another thing to note is that you don't really want to map and filter, those are forced upon you, just as you have to supply a function as a parameter to them. The order things happen is the opposite of the order they are written. And what if you don't know what a certain function does? You have to jump out and figure it out, before you come back to complete your analysis.
Gliding over to the right, somewhere in the middle of the baseline we have the first-parameter-grouped organization of functional code, a.k.a. mainstream OO functional-style in typescript
This fixes the order problem, but the other problems remain, we are still stuck with map and filter mechanisms. Also, if you want to do something that hasn't been added to the array object, you have to switch over and write the code very differently.
In a pure actor model we could send a message asynchronously to a length-counter, with a return address. When the result is sent back we could send a message to the by-length-selector and so on. Or however we wish to organize the actors and messages they handle. All that distribution and asynchronicity makes it very hard for the human brain to understand the system and sequentiality is basically non-existent. We may not want actors to be so fine-grained, which means defining chunks of functionality in terms of another model.
Sequentiality
Beyond the sequence problems mentioned above, reverse ordering, asynchronicity and having to jump to a different context to understand the definition of a function, there are other things that disrupt sequentiality.
The ideal to strive for here is really to reduce cognitive complexity, so just being able to read code left to right, top to bottom isn't exactly enough. If you at any point need to keep more than a couple of things in your head, there is a risk the working memory is blown and the reader needs to jump back and forth. Of course, LLMs can manage more for reading, but are more sensitive when writing, as mentioned previously.
Being able to finish off each step in logical sequence with all "fallout" managed locally is much less error-prone than having to fit things in a certain structure of nested blocks.
For-loops and nested ifs require holding accumulated context across the block. Try-catch has error handling physically separated from the point where the error occurs, which is the non-locality problem in another form.
In functional code, higher-order functions impose a structure that isn't sequential.
Type systems and sequentiality
In terms of the Cognitive Dimensions of Notation, a framework for analysing usability, forcing definition of types causes a premature commitment. In terms of sequentiality, a small error in the type declaration cannot be fixed later.
Any changes needed to a type cause a huge knock-on effect (another Cognitive Dimension) , rippling through the code.
That said, type systems are excellent for detecting certain types of programming errors (and LLM hallucinations), so we would still like to have that. If we also can get the documentation effect that explicit type declarations provide, so much the better.
Attempts to rise above the models of computation
There are several different things that are being used to try to rise above the computation primitives and close the intent-to-implementation gap. They have been helpful, but none have been completely successful.
There will probably always be some need to be able to create abstractions and re-usable functionality.
Especially, there is a desire to define constructs more appropriate to the specific program's domain in order to move down in the triangle.
The key to maintaining sequentiality when verifying the implementation of abstractions is similar to referential transparency. An abstraction should be easily replaceable by its implementation without disrupting the sequentiality.
The abstractions form a new language, separate from the base language. If there are too many different abstractions or they compose differently from the base language, friction arises.
Libraries
A library shares functionality more widely, which helps recognition and trust. It reduces the need to jump out and verify the implementation.The problem here is in the plural form of the word. New libraries need to be learned and more libraries increase the cognitive burden.
Standard libraries increase familiarity, but they are also by necessity more general, which keeps programmers up in the top corner of the triangle. When the standard library implementation is no longer preferred, it becomes a burden to be carried forever.
Macros
A macro can create a very precise expression of intent. This can be very powerful.
The downside is that the macro itself is hard to understand, with two separate transformations creating a very wide intent-to-implementation gap.
Hard to understand means difficult to use correctly, hard to debug and tricky to modify.
Frameworks
I like to say that frameworks make easy things trivial and difficult things impossible.
On the one hand, an annotation or other framework expression can perfectly express intent. But this only holds as far as the standard happy path behaviour is desired.
As soon as extra configuration is needed, things go sour quickly, neither expressing intent, nor matching the base language.
Even worse are frameworks like ORMs that try to impose an incompatible view on another model.
When things don't work as you expect or you need to do something the framework didn't expect, the experience becomes very jarring. The framework implementation is usually extremely complex, in order to do something that would be fairly simple with plain code and a few utility libraries.
Internal DSLs
It is often possible to compose base language constructs in a way to make the intent more directly expressible.
I think these are generally successful, and when well crafted the learnability is better than a regular library. I think AssertJ is a successful example. Less successful examples are query builders intended to replace SQL because these tend to just be incomplete, underdocumented and buggy.
The problems of libraries still apply and the experience can crack badly when there is a need to go beyond what is provided in the DSL.
External DSLs
SQL and regular expressions are two specialized languages that have proven to be useful and resilient.
Perhaps it isn't the worst to have to use a different syntax that more closely reflects the intent of a specific activity. Things like query builders don't help much because you still need to learn the same concepts.
The biggest downside is that these expressions often are given as dead text strings that aren't examined until runtime. Perhaps tagged string templates will be a good path forward and allow first-class treatment in the toolchain.
FLWOR > SQL
A note here is that SQL got the order subtly wrong, the select is often the part you want to fill in last. In XQuery, a better ordering called FLWOR (FOR, LET, WHERE, ORDER BY, RETURN) was used. This is also used in the Ballerina language.
LINQ also uses FLWOR without the LET. Unfortunately, LINQ comes with two syntaxes, where the intent-based syntax has mostly been scorned by developers in favour of the more familiar-looking but noisier and more "primitive" mechanism-function calls. Of course, the majority usage usually spreads because new programmers tend to copy the patterns they see, even when it is subtly inferior.
EDIT: I have been informed that the function call version of LINQ has a practical use because each function returns a partial query. The partial query can then be sent on to be executed later. Perhaps this is a workaround to compensate for the lack of projection paths (see below)?
Logic languages
Prolog, Datalog and their relatives step away from the three models of computation almost entirely. Instead of describing how to compute, the programmer describes relations that hold between inputs and outputs, and unification does the rest.
Logic languages could be seen as clear expressions of intent for problems that map cleanly onto relations, like search, constraint satisfaction, parsing, type inference, graph queries and theorem proving. An example is computing the transitive closure of paths, Path(A, B) :- Step(A, X), Path(X, B).
The problem comes when very mundane things can become insanely difficult to express. Even basic arithmetic needs a special operator, and anything that involves reading a file, accumulating state or producing output in order becomes awkward.
A different tack could be to embed a logic sub-language within the language, as has been tried in some languages, like Oz, Curry, Flix, Shen and Clojure (through core.logic). The biggest problem with this is the huge mind-shift between the two very different modes of expression.
A deeper problem is the unification syntax itself. When the same variable appears in multiple relations to join them, understanding the full query requires holding every variable and every place it binds in working memory at once. Most of the time, that is too much.
Datomic is a successful integration of Datalog which elegantly solves some problems that SQL handles poorly. Still, experienced users generally report that queries are harder to read than SQL counterparts.
Switching the point of view
Instead of blindly accepting the primitives arising from the models of computation as being inevitable, can we try to start from another angle? That's what SQL, XSLT and spreadsheets do. And regular expressions, for that matter.
GraphBLAS
It has been considered very difficult to create libraries for graph algorithms because there are so many different representations and performance concerns.
But by finding the right mathematical basis, and framing algorithms according to those concepts, it is possible to create an intent-based "language" that also allows for efficient implementations under the hood.
The trick lies in viewing graph algorithms as matrix multiplications over a semi-ring (a structure with an addition and a multiplication operator), which enables leveraging existing optimizations for various matrix representations.
Instead of digging around in computation model primitives to implement Dijkstra's algorithm, you simply create an appropriate matrix representation and select + as the extension operator (multiplication in the semi-ring) and MIN as the aggregation operator (addition in the semi-ring). For reachability calculation, you extend with AND and aggregate with OR. For counting paths, you extend with * and aggregate with +.
These concepts could be worth bringing in to the base language, or perhaps one could do a language extension with a focused DSL.
Relational Algebra
While query builders end up being inferior to just extending with SQL, there is another option: add the basic concepts of relational algebra to the base language.
The relational algebra concepts form a solid base and implementations could do whatever is needed under the covers, even translating to SQL if that is deemed the best option. But the SQL stays under the hood because the concepts are complete.
Pipelines and streams
Pipelines and streams already exist in many languages, either directly as in F# and other languages, through a threaded macro as in Clojure, or as a fluent Object-API in many languages.
Although this fixes the ordering problem from function calls, the "primitive" mechanism-functions like map and filter are still present.
If each pipeline step is implicitly a flatMap over individual elements, receiving one value and emitting zero or more, then filter is simply a step that sometimes emits nothing, and map is a step that always emits exactly one transformed value. Neither needs to exist as a named mechanism. The programmer expresses what they want to produce, not which higher-order function governs the production.
An aggregation, or a collector if you like, is fundamentally different from a transform step because it must see the whole stream before it can emit. This is worth making explicit in the syntax. An aggregation then becomes visibly a different kind of thing, not just another function passed to a mechanism or another per-value transform. The stream narrows here, and the reader should be able to see that at a glance.
One final consideration is to consider whether each step of a pipeline emits a list (or other collection) containing the entire stream, or if the elements are "flowing free" in the stream. Having them flow, by some syntax, e.g. ..., to remove the surrounding collection and then recombining when needed in a collector seems more logical and composable.
Template literals
In XSLT, the result is output as a literal representation of the relevant XML fragment, possibly with expressions generating a value in relevant positions.
A regular expression is a template for the searched-for (sub-)string.
A spreadsheet can be considered to be a template of a calculation in a grid of cells where all you have to do is enter values and specify relations to other cells.
String templates for interpolation are increasingly popular and with pluggable processors they are a nice way to integrate an external DSL.
In the kind of pipeline envisioned above, one can simply discard the idea of a function wrapper in each processing step and just output a literal template for the desired result where the transformation can be expressed as the shape of the desired output.
With templates, the intent is expressed directly, no distracting mechanisms.
Compare 'Alice' |> (s) -> 'Hello ' + s to something like 'Alice' |> 'Hello ${it}' (Using it as a default name of the current value as in Kotlin)
Pattern matching
Often a transform cannot be directly expressed as an output shape and needs to vary behaviour according to properties of the input. In this case there needs to be a set of selection expressions each coupled to a specification of how to create appropriate output(s) back into the pipeline
The fundamental idea is to present an image or pattern representing the input data, with conditions inserted at the relevant locations, to determine branch selection instead of supplying comparison expressions digging into the data one piece at a time.
The ideal would be to have something corresponding to a template literal, like when querying a Mongo collection. Example {name: "Alice"} and {name: {$ne: "Alice"}}
For string values, a regular expression would be a reasonable choice for a template pattern, eliminating the need for the extra level. Assuming the expression is supposed to match the entire string, the above examples become {name: "Alice"} and {name: "(?!Alice)"}, although in my opinion the regex syntax could perhaps be improved. Familiarity does improve the situation so it is better to use regular expressions a lot than use them a little.
XSLT doesn't have a template pattern, but achieves a similar effect by using its projection path expression language (XPath) to provide hooks for condition insertion. The examples above would be ./name[. = 'Alice'] and ./name[not(. = 'Alice')]. Projection path expression languages are useful in their own right and are discussed next.
In some languages, the pattern match binds variables at the selected locations instead of executing conditions. If the shape matches and variables get bound, they can then be used in boolean expressions. In logic languages, the variable binding is essential to be able to show the desired relationship between input and output, as in the classic path relation Path(A, B) :- Step(A, X) and Path(X, B)
It is probably desirable to keep applicable patterns close together. In XSLT, the processing templates are registered independently, which makes it difficult to get an overview of which of multiple matching templates would be chosen in each particular case.
Projections (and paths for them)
As mentioned above, XPath is the canonical example of a language to specify a path for reaching deeply embedded data in complex structures.
Jsonpath copies that example and tools like jq also allow editing the selected positions.
In Haskell, lenses provide a way to reach deep into data structures for both selection and modification.
A projection is simply a view of part(s) of a data structure, possibly transformed. XPath does not necessarily select just one node or value, but all nodes that correspond to the same path.
Thinking of projection expressions as a first-class entity instead of being primitive code expressions opens up interesting possibilities to get rid of wrapping functions for these purposes.
Consider v |> max(by: (v) -> v.engine.thrust) versus v |> max(by: .engine.thrust) or combined with some kind of streaming and aggregation syntax v... |> ..=max(by: .engine.thrust)
The interplay between projections and pattern matching is interesting. While a projection itself is normally considered to be an extraction of values, a pattern match is based on the existence of (any) value matching the conditions of the path or pattern. The same syntax could profitably be used for both, as well as for selecting modification points in a structure.
Types without type systems?
While the theory of type systems and their connection to logic is fascinating to study, the needs of a practical programmer only partly overlap, and the gap between them is often where the friction lives.
This section explores what practical value current type systems deliver and if those goals could be achieved with other mechanisms. When freed from formal theory, are there other useful properties that could be provided to the programmer?
What programmers really want
There are some desirable properties that are usually gotten from the type system:
- Mismatch detection: A wrong result silently arising when the wrong type of value is used in an operation can be disastrous. Type analysis helps detect the error before it becomes a production incident with a difficult debug session.
- Semantic consistency: Usually the programmer wants to go beyond merely physical representation types to detect semantic differences, such as being able to distinguish a
CustomerIdfrom anOrderIdeven though both are integers. Ideally, a field namedorderIdshould always carry anOrderIdas well. - Documentation: Type annotations have been proven to have a positive documentation effect, making it easier to use libraries and APIs. Beyond this, there can also be a need to be able to specify restrictions on the physical representation type, for example that a standard die roll can only take on the values from 1 to 6. Also having that property be machine verifiable is golden.
- Exhaustiveness: Whenever all possible cases have to be handled, the programmer would like a notification if there are more cases to be handled. This applies especially if modifications are needed in distant regions of code when a new case is added to a type.
- Units and dimensions: Disasters have happened because a programmer supplied a value in pounds when Newtons were expected. Being able to specify units of measure, or dimensions such as x-value versus y-value, is cumbersome in most type systems but highly valued.
- Tooling enablement: autocomplete, go-to-definition, safe rename. These are downstream benefits that programmers value highly but rarely list as a reason for types.
There are other desirable properties that can be gotten from or in conjunction with type systems, but that is not commonly or exclusively the case:
- Memory safety: Rust is famous for encoding memory safety in the type system. OCaml has optional add-on type-system-like modes for it.
- Valid operation sequences: you must open before reading, close after opening. Session types encode this formally but the practical desire is just "tell me when I've called things in the wrong order." In Rust this kind of state machine is possible but arduous. In other languages there are other mechanisms, e.g. try-with-resources, to manage some aspects of this.
- Resource management: Can be enforced with affine types.
- Enforced error handling: Result types are a prime example, but also checked exceptions.
- Effect documentation: Haskell requires non-pure functions, e.g. those performing IO, to be associated with monadic effects. Some new languages have algebraic effects.
Re-imagining type mechanisms
Programming languages have explored the whole range from completely dynamic typing to gradual type systems to optional type systems to strictly enforced static type checking, from weak typing to strong typing, from structural types to nominal types.
The trade-off in each case is "more work equals more safety".
In this section I want to re-imagine mechanisms to provide some of what programmers want, exploring if strong guarantees can be achieved by default with little effort for the programmer.
Flexible constraints, flexible checks
There is a tension between having a type system that statically type checks in reasonable time and a type system that can express all manner of constraints and relationships between types. When the type language becomes too powerful, it becomes turing complete and becomes susceptible to the halting problem.
Expressing type constraints in a different language than the actual programming language creates a large cognitive burden on the programmer.
In Eiffel, precondition contracts can impose all sorts of conditions on input values that are checked on each call. Having these checks turned on during development and system testing catches almost all related bugs, allowing the checks to be switched off in production for better performance.
Allowing types to be defined by the same literal template notation as used for pattern matching could be a great win for reducing the effort and cognitive load of defining as strict types as needed. If the static type checker can check, great, but otherwise just check at runtime on variable assignment, for example.
Units of measure
Units of measure are a particular type of semantic constraint that would particularly benefit from being lightweight and low-friction. Many bad bugs, such as the Mars Climate Orbiter crash, happen because of mix-ups concerning what values represent, but it is often just too much type-mechanism work to prevent them.
Not many languages have direct support for units of measure. F# allows you to append a unit to a number and then the type of that number is a measure of that unit. Julia allows the same with a macro from an extension package.
In other languages you may be able to define types like LengthInM or LengthInFeet, which is often too much work to be considered worth the trouble because they don't compose at all. With proper units of measure, the values should still work as numbers when the operation is compatible with the units.
Units need not always represent physical quantities to be useful. Sometimes a programmer may wish to avoid accidentally confusing x-values and y-values.
Supporting automatic unit combinations and/or conversions and to pre-populate with standard units like the SI system, as well as handling standard magnitude prefixes, and converting all these things automatically can be handy, but the lion's share of the impact comes from just being able to use some form of unit and dimension tagging at all. Even the friction of having to pre-declare custom units may harm adoption. Letting units be automatically declared at first use would make the barrier to use them much lower.
Tagged identifiers
Most programmers feel that they should declare separate types for OrderId and CustomerId so that those two numeric identifiers can't be mixed up by mistake, or mistaken for being regular numbers, but traditionally it has been slightly too much work for it to generally happen in practice.
The introduction of lightweight record declarations in Java has made it simple enough, but it is possible to make it even simpler by just allowing the addition of a tag to a numeric or text identifier and letting it become declared at first use.
Autotyping
Within a bounded context, in the terminology of Domain Driven Design, it would be helpful if every field of a record held the same type of value as all other fields with the same name.
Instead of declaring a type and then declaring every field with the corresponding name to be of that type, the type checking could infer the type of the field at first use and then check that all fields of that name indeed contain values of that type.
The annoying repetition of orderId: OrderId also disappears if the field name and the type are one and the same. Obviously, something like previousOrderId should be possible to infer as holding an orderId.
Autotagging
Whether or not autotyping is applied in general, a similar mechanism can be employed to simplify and force usage of tagged identifiers.
Whenever a plain number or string is assigned to a field, tag it with the field name. This avoids the expected repetition of field name and tag, e.g. orderId: orderId´4893 becomes just orderId: 4893 but when the orderId field value is projected, it will be as a tagged value.
An added benefit of this enforced tagging is that in order to keep using numbers mathematically, the programmer at least has to associate the scalar unit 1 with it, which removes the remaining friction against using units. This leads to almost all numbers being semantically typed instead of merely being numbers, which creates better type safety than a type system often limited to representational typing of numbers.
Sightings in the wild
I have mentioned XSLT a lot, and spreadsheets (like Excel) are also a successful model, but I have also observed examples of some interesting re-thinkings in other languages.
COBOL
COBOL was designed in 1959 with the intention of making programming available to ordinary business people.
While it did not succeed in that, and adding long English words for the mechanics ended up being annoying for programmers, it did have some interesting features along the lines discussed here.
The record descriptions with hierarchical structure and PIC statements for describing plain data fields were templates used for both parsing input and formatting output, as well as determining the type. PIC 9(6) is six digits, PIC X(20) is twenty characters, PIC S9(7)V99 is a signed fixed-point number with two decimal places, and display-oriented pictures like PIC $$$,$$9.99CR specify zero-suppression, thousands separators, currency sign and credit indicator.
MOVE CORRESPONDING A TO B moves every field of A into the field of B that shares the same name, and leaves everything else alone. COBOL decided decades ago that fields with the same name represent the same thing (have the same type), as suggested in the autotyping section above.
Good intentions, but still a child of its time.
Ada
Ada was commissioned by the US Department of Defense in the 1970s to replace the over 450 different languages they were using in different systems.
The design was done as a competition between 4 starting teams selected from about 4 times as many proposals. Strict requirements were based on the need to prevent bugs that had been observed in other systems over the years.
The Ada design rationale reads as a "what programmers want" and the design of different features is very coherent to avoid sharp mind-set shifts and make similar things work similarly.
Types in Ada are very versatile, you can declare the OrderId and CustomerId as different types of integers that each get their own arithmetic operators, which may be overridden. Dimensions can be specified. Sub-types can be defined with arbitrary constraints.
Ada also allows and encourages naming function parameters at the call-site for increased readability and avoiding mistakes in the order and meaning, especially of same-type parameters.
The semantic type safety is achieved by up-front definition, which is more heavyweight than the arguments in this essay aim for.
F#
F# is designed to be a practical language and the designer, Don Syme, has resisted adding things that could be nice in theory but complex in practice.
F# has pipelines, although not streams, and units of measure, although requiring pre-definition.
Active patterns are an interesting idea to extend pattern matching to apply to values rather than types. Unfortunately the definition step and the syntax for it feels somewhat disjointed.
Another interesting idea is type providers, that allow you to view any file as structured, typed data. Again, unfortunately, the syntax for this is extremely complex. It may have been more successful to have this as strongly typed, dynamically discovered and runtime checked, rather than going for static compile-time type-checking.
To sum up, the right intent, but still tied to the traditional models.
Golang
Go is also a language designed for practical use, where the designers resist adding features unless they are convinced of a practical benefit. Go also has a philosophy that there should only be one way to do a particular thing, which seems to be beneficial for maintainability even if it makes code boring, repetive and boiler-platey at times.
The defer mechanism in Go is a clever way to sequentialize intent: you can close the file "right after" you open it instead of relying on a finally block far away. The pattern makes it easy on reviewers as well.
Defining interfaces where they are used is a great way to keep things local. Interfaces are structurally typed, which means you only need to implement the methods corresponding to that interface when you call a function that needs it for a parameter.
The select mechanism in Go is a neat construct to sequentially express intent when handling asynchronous events. Together with the channel communication and easy spawning of concurrent processes, it provides a decent, but somewhat restricted, approximation of the actor model. However, there seems to be a jarring switch needed when the communication model doesn't fit well.
Pyret
Pyret is designed at Brown University for teaching practical programming through data analysis.
Pyret allows adding "examples" in a check block at the end of a function declaration. These are like mini-tests that verify the answer for some example sets of parameters. Nice and localized, providing both an executable verification and a documentation of intent.
There is also a nice literal-ish way to declare table values and some interesting table manipulation constructs, but not, as far as I can tell, a full relational algebra.
There are pipelines and a nice syntax using _ to show where the hole that needs to be filled is. Unfortunately no streams, so the many varieties of mechanism verbs (map, filter, fold and the like) need to be used.
Ballerina
Perhaps not surprising to see this language here, considering later versions are designed by James Clark who designed XSLT. The language is designed to facilitate business integrations by providing a more seamless experience for adding custom functionality than configuration-driven ESB software.
Ballerina has a clean special declaration for a service with resources, typically an http service, mirroring intent.
Error handling is at point of use by result type as a union of value | error. Checking for errors is mandatory before using the result, but you can conveniently pass any error up to the caller with the check keyword.
There is a pretty good match construct with binding of variables to the parts of the matched value that must exist, and the values of these variables can be checked by a following conditional statement.
JSON and XML values can be created with literal syntax. Tables are declared as lists of same type records, with a key field, tuple or structure designated.
All iterables can be queried with a declarative FLWOR-like syntax and tables can be joined relationally. Unfortunately, as far as I can tell, it doesn't seem like actual database integration entirely plugs into the same mechanism.
Spawning a concurrent worker is dead-simple and declarative. Communication channels are automatically set up, just send a message to the name of the worker.
Records are structurally typed, with a convenient ...rest syntax to refer to the parts you don't know about.
Tailspin
Tailspin is the language I am designing and I have already implemented some of these ideas, but other things can still be improved.
Pipelines handle streams of objects where each step takes one input and outputs zero or more outputs. No mechanism verbs needed.
Arrays and structures (records) are constructed literally, as are strings with interpolations. The $ sign is used to refer to the input value of the current pipeline step.
Streams are created by applying the streaming operator ... to a collection and recombined by an aggregator, e.g. ..=Sum. Aggregators unfortunately need to be pre-defined. Creating lists can be done by simply surrounding a stream pipeline with [ and ].
Tailspin implements units of measure, tagged values, autotyping and autotagging in the simplest ways described above. No pre-definitions needed.
Pattern matching is done by a syntax identical to the literal constructors, but with nested matchers instead of values. Strings are matched by regex by default. A templates instance (a function with one input and zero or more outputs) can consist of just a list of match expressions with corresponding blocks.
There exists a logical projection syntax, with an optional transform step with the regular transform pipeline syntax. A projection path is a thing that can be passed around.
Claude's summary
We began with SQL, XSLT and spreadsheets being successfully used by non-programmers. The reason is simple: they express relationships and transformations in terms that match how people think, rather than in computational primitives.
The intent-to-implementation gap is a consequence of historical choices about which primitives to expose — choices made in the 1950s and inherited ever since. They were never inevitable.
The key property that has emerged is sequential readability with local zoom: code that reads top to bottom, where any abstraction can be examined in isolation and returned from without losing the thread. This matters not just for human readers but directly for the quality of LLM-assisted coding.
The mechanisms that support this property are not speculative. Streams with implicit flatMap steps eliminate mechanism verbs. Template literals express output shapes directly. Pattern matching on input shapes, using the same notation as output templates, completes the symmetry. First-class projections eliminate wrapping functions for structural navigation. Semantic types through tagging, autotyping and units of measure deliver better practical type safety with less ceremony than formal type systems.
The sightings in the wild confirm these ideas are convergent — language designers working independently have found their way to similar conclusions. The direction is consistent enough to constitute evidence.
The question is whether a language can assemble these into a coherent whole without the friction of bolting intent-expressive features onto a foundation of computational primitives. That is what Tailspin is attempting.
For more thoughts about programming and programming languages, read more of my blog posts or check out my GitHub repo.
Comments