Beyond Big-O in AdventOfCode

December 15, 2025

Beyond Big-O in AdventOfCode

Many of the problems in AdventOfCode involve finding ways of handling a problem that is suddenly too large to handle by the naïve approach (which part1 often cleverly tries to lull you into).

Most fundamentally, the first thing to do is to find an algorithm with a better Big-O complexity. What that means, simplistically, is that given an input as a list of size N, what power of N number of operations do you need to do? A simple math formula is O(1), or constant time, an iteration through the list is O(N), or linear time, comparing each element against every other is O(N^2), or quadratic time, and so on. Sorting is O(N log N), so somewhere between O(N) and O(N^2). Some problems are very bad where you end up with N as the exponent in exponential time.

In this post I want to look at what can be done beyond finding the "optimal" algorithm in Big-O terms. It will mostly be about shaving a little bit more performance, but I also want to look at aspects of beauty, simplicity and cleverness.

First I will look at some general principles, and then dive into actual AoC puzzles to see how they apply.

None of this is (usually) necessary for solving AdventOfCode puzzles, but I work in an interpreted language I created myself called Tailspin (manual, 5-minute tour) Since Tailspin is about 1000 times slower than java, I will take all the time savings I can get.

Note, though, that while these are general guidelines, there may be other considerations in each specific case.

Considering the Constants

When it comes to Big-O, one thing to note is that constants do not matter. So comparing each unique pair in the list is half the time of comparing all pairs, but it is still O(N^2).

For a long time, sorting was considered sorted (ha!), with quicksort as the optimal O(N log N) implementation included in standard libraries. Then someone noticed that in practice, the O(N^2) bubblesort was faster than quicksort for the type of small arrays that we usually sort. (Which has led to standard libraries containing immensely complicated hybrid algorithms. On a side note, I like to code myself a bubblesort or a quicksort now and again as a kata to flex my coding muscles.)

So what is going on there?

There are a lot of things that can affect this, from the programming language used, to the specific compiler, to the processor architecture, so I cannot answer specifically, but there are a few general things to observe.

Sequential access beats random access

For the processor to do anything with a value, it needs to load it from main memory to L2 cache, to L1 cache and finally to a register. Caching is done in blocks and things close together in memory cache together.

Beyond that, the processor tries to predict which value you will want next. Accessing things sequentially is easy to predict and the value can be prefetched to be ready when you need it. Otherwise you may have to wait a little bit.

The most ridiculous thing I have worked with in this vein was a vector processor architecture. The basic idea is that if a multiplication takes 6 machine cycles, say, you can keep loading a new pair of values in every cycle. The more multiplications you do, the more time you save. Instead of taking 6000 cycles, you can do 1000 multiplications in 1005 cycles.

Anyway, when going sequentially in an array, you can do a lot of stuff "for free", so doing more could actually be faster.

Direct storage beats indirect storage

How much control of this you have depends a lot on the language you are using. This is mostly just a restatement of the above principle, but from a different angle.

A simple number is usually storable in place, so once you find the container that holds it, you have pretty much direct access to the value.

A more complex value that is fixed in size can also often by layed out inline in a container (like an array or a struct).

But once the size can vary, you need to store a reference to the object and jump to a separate location in heap memory to read the value. And that can wreak havoc on the caching.

Arrays beat Hash maps (aka Dicts)

While lookup in an array and lookup in a hash map are both O(1), there is a significant difference in the constant.

For the hash map, it first gets a hash of the value (which could just be the value itself if it is an integer, or it could be a more complex calculation of thousands of bytes of a large key object). Then it needs to check if the key stored at that hash location is the same key or if it got a hash collision and must use a backup strategy to find another spot. Note that comparing keys can sometimes also take significant time, for example if they are all like "Long-common-supercallifragilistic-prefix-0"

Of course, both the key and the value must be stored somewhere, so probably at least one indirection, perhaps more.

An array, on the other hand, is essentially a perfect hash. You know exactly where to go right away and if the value itself is simple, it's right there.

Iteration in the array is of course fast, according to the above, but if the array only has valid values at a few locations it might not be worth it.

In the hash map, iteration usually involves iterating the full backing store, which may have lots of empty space, so it's not a given that it is faster. In Python, entries are kept in a separate array and can generally do faster iteration of only existing entries. However, that changes if entries are frequently deleted because they are not actually removed, just marked as dead. The separate entry array is yet another indirection, with whatever costs that accrues for other uses.

Hash maps and lists (dynamic arrays) can grow, and resizes can take significant time. In both cases it is good if you know up front how much space you need. And of course, you always need to know the space needed for a fixed array to begin with.

Iteration beats recursion

While iteration and recursion are considered logically equivalent and the one can be mechanically transformed into the other, it is usually the case that recursion comes with more overhead in that another stack frame needs to be created in the call stack.

The exception is in cases where it is possible to write in a tail-recursive form and the language you are using actually does tail-recursion optimization, in which case they are equivalent performance-wise.

Of course, recursion often comes with elegance and nice benefits of saving intermediate state in the call stack instead of needing some messy handling in the iterative case.

Gathering beats scattering

By gathering, I mean reading some values and combining them to a new complete value.

By scattering, I mean taking a partial value and combining it into several locations where it will be used.

The simple heuristic for this is that reading is faster than writing.

In scattering, you have to initialize the scatter locations, read them, combine the new part and write again.

In gathering, you don't even need to initialize the location, you simply write the combined value there.

There is also a whole slew of improvements that become applicable if you only read and never write a value after it is created (immutability)

Padding and sentinel values beat bounds checking

One upside of using hash maps over arrays is that you don't have to worry so much about out-of-bounds access.

With arrays, you might have to check every access to see if it is in bounds, which can be a drain. And it makes the code clunky and inelegant.

There is a simple and effective technique for use with arrays, though: add some padding so out-of-bounds accesses are still valid. Put in whatever is the equivalent of a no-op value for the puzzle at hand.

A perfect case study for this is finding movable paper rolls in 2025 Day 4 where just adding a row of dots above and below, and a column of dots on eiher side, eliminates the need for bounds checking altogether. While you're at it, just go ahead and add in the paper roll itself, for a more elegant -1,0,+1 iteration. You just have to compare to 5 instead of 4.

Adding in sentinel values as a matter of routine can also work to protect you against off-by-one errors. So what does it matter if you add in another zero?

Reusing results

The art of reusing results to smaller sub-problems is an essential part of reducing that pesky exponential Big-O complexity into polynomial complexity. The name for this practice is Dynamic Programming.

You may have heard of it simply as "memoization" and in some languages there are simple ways to mark a function to cache previous results keyed by the inputs. In Python there is a simple @functools.lru_cache directive to memoize any function.

Beyond Big-O, there are multiple ways to change the constant factor of the Dynamic Programming algorithm. Memoizing in an array instead of a hash map, would be a simple example.

Memoization versus tabulation

While memoization is viewed as a "top-down" technique, by starting with the big result and caching sub-results as you go, there is another equivalent technique called tabulation, which is normally viewed as being "bottom-up", where you calculate the small results first and then use them to calculate bigger results.

Interestingly, tabulation usually ends up calculating "all the values" instead of just the required values. Despite that, tabulation is generally faster for all the reasons given in the section about considering the constants.

Turning the table in 2025 day 7

The puzzle asks us to follow a beam across a grid and count how it is affected by splitters at given positions

A first attempt at solving this would be to just do a DFS following each beam to the bottom and adding it up. While this works fine for part1, it bogs down in part2 with the exponential complexity of recalculating the same paths over and over.

It's easy enough to fix, just smack a memoizing cache in the appropriate place. Big-O solved.

Beyond Big-O, we might observe that we could just keep track of active x-locations and move down row-by-row instead. I guess this is really a BFS, but it is also looks a bit like a tabulation.

We keep track of the x-locations in a hash map and if it hits a splitter moving down, we increment the x-1 and x+1 locations instead. And then sum the last row up. Here is how I did it in Tailspin, except I use a relation (something like a database table) instead of a hash map.

More elegant than a DFS, perhaps, less state to keep track of. But what if we went the other way?

We can work in an array, start from the bottom, counting each position as one possible timeline, adding two of them together on a splitter and finally just reading the result from position S. Even better, we also know what would happen from any starting position, should we need it.

Sequential access in an array and gathering data instead of scattering could improve performance. Another strong indicator that this is the "right" way to do it is that each position holds the answer to the question if the start had been there, while the previous approach still needs a sum along the final row.

This is how it looks in Tailspin, and it runs 20% faster, despite calculating "all the things".

Preprocessing the puzzle

Changing the way the puzzle is represented by doing some preparatory work can lead to quicker execution. In some cases, that might even be what is needed to make the solution run in a reasonable time.

A simple example is in 2025 day 5, where sorting the ranges on start value ascending transforms the O(N^2) comparison into O(N log N) for the sort and a sleek O(N) to merge them. It wasn't strictly necessary for the time factor, but it also made for a prettier algorithm. (I did not see it at the time, unfortunately)

Testing tiles in 2025 day 9

In part 2 we are asked to find the largest rectangle inside a surrounding polygon, where all tiles inside the polygon are green and the corners red. The problem is that the whole thing is a little too big to just count tiles, so you need to look at it another way.

I traced out the outline, figured out what side of it was inside and which corners were concave and which were convex. This allowed me to determine the answer by just looking at corners and checking if any lines intersected the outline of my rectangle. If you're interested, here is my code.

Then I heard of people who had looked at it a different way and very cleverly avoided having to count a lot of "useless" space by simply downscaling the coordinates, or doing more advanced coordinate compression or space distortion.

Opinions may vary if it is more elegant to look at lines and corners or to simply distort, flood-fill and count tiles.

Failing faster in 2021 day 19

The problem involved identifying and arranging some very confused space scanners in what was an at least O(N^3) algorithm, maybe O(N^4).

While I managed to muddle through it in F#, I was blown away by a clever transformation by Gunnar Farnebäck in the Julia community. Julia is a very nice language and the community consists of scientifically minded people who really care about performance, so they tend to know the good tricks.

What Gunnar did was to calculate an O(N^2) "signature" of the scans to compare. While the signature was not enough to positively identify a match, it was enough to quickly sort out a lot of non-matches so that more time could be spent verifying probable matches.

You can do a lot of work in O(N^2) before you become worse than O(N^3)

Embracing elegance

Some algorithmic choices might not really be worth it for the time they save, but I can find them worthwhile just for the sake of elegance.

Handsome heaps and dashing disjoint-sets in 2025 day 8

Day 8 was about connecting strings of lights to junction boxes and seeing how many circuits were formed. A crucial instruction was to connect the boxes closest together first.

To find the 1000 closest connections, it is perfectly reasonable to just sort the connections by size and take the first 1000.

However, I have been fascinated by the elegance of the binary heap since I learned it a long time ago. A heapsort is also an optimal O(N log N) sorting algorithm, but has a bit more movement in memory so it ends up worse than quicksort on the constant factors.

A nice property of a heap is that you don't have to do all the work of a full sort in order to retrieve the elements in sorted order. You just pop them off as you need them. If you only need the 1000 smallest elements, you only have to keep a heap of size 1000, in the K-Select algorithm.

Needless to say, I jumped on the chance to implement a heap, as a kata to flex the coding muscles and relive the beauty. I prefer to put the root in position 1 to simplify child and parent calculations.

As a huge bonus, day 8 contained another pearl, the disjoint-set (or union-find) which, again is not strictly necessary. You can pull it off in other ways, but it is an exercise in elegance and another great kata.

If you're interested in how I implemented all this in Tailspin, here is the code.

Finding fixed-points in 2025 day 11

Day 11 asks us to find how many paths there are between two nodes in a graph. In part 2 we are restricted to paths passing through some particular nodes along the way.

Again a DFS along paths, with a little memoization at appropriate points will do the trick.

One question to solve in the algorithm is when to stop looking further and a big advantage of the DFS is that you can detect loops and stop searching further on them.

If you assume no loops, there are more elegant solutions. What about stopping when you have the right answer?

If you have an initial guess of the right answer and a function that can take a guess as input and return a better guess as output, you can stop when the function returns the same value as you input. This is called a fixed-point of the function.

Here, we can start with a set of nodes (or just one node, really) that we can reach and have a function that adds all nodes that can be directly reached from those. Once the set contains all the reachable nodes, the function will just return the input. The result of this particular fixed-point algorithm is called a Transitive Closure.

With a small modification, we can keep track of the number of ways to reach each node from the start node. Strikingly there is almost no state to keep track of at all. This is how it comes out in Tailspin.

Again we are actually finding out the number of paths to all other nodes that can be reached, but it still seems to go faster. Other versions of Transitive Closure fixed-point algorithms are Dijkstra's algorithm to find the shortest path from a node to all others, and the Floyd-Warshall algorithm to find all shortest paths between nodes. These are also considered to be dynamic programming algorithms, as well as being characterizable as fixed-point algorithms.

Putting it all together

Hope you enjoyed looking at alternatives of how to polish the code even after the answer has been gotten.

As a final example, I found a solution by François Févotte to 2025 day 11 in the Julia community that illustrates a lot of the things I have been talking about.

The solution starts by giving the nodes integer aliases, so that arrays and grids can be used instead of dicts. The dict is only used to look up the integer alias as needed.

The list of neighbours of all nodes is converted to an adjacency matrix for nice and fast expansion in finding the transitive closure. Note, an NxN array rather than a dict, for maximum speed.

As an added little optimization, a topological sort order is prepared on the nodes. This allows to do the minimum amount of adjacency computations without having to go all the way to out. Since we are assuming no loops, either dac needs to become before fft always, or fft is before dac. The sort also tells us this.

On top of it all, both topological sorts and adjacency matrices are kind of elegant, wouldn't you agree?

Search This Blog

Tobe expressed