'Entity' is the wrong idea

Joe Armstrong, creator of the Erlang programming language, once said 

You wanted a banana but what you got was a gorilla holding the banana and the entire jungle
He meant it as criticism of Object-oriented (OO) programming languages but ironically Erlang is one of the most OO languages that exist, having independently rediscovered the actor model which is the essence of what Alan Kay meant by Object-oriented (computers all the way down).

The criticism is still justified even if it is not really about OO. Rather it is about the idea of "entities", which I suspect is just as easy to fall prey to in a Hindley-Milner product type favoured by some functional languages as it is when representing data as objects. (Although I will concede that perhaps the object idea is easier to confuse with the entity idea).

Now I am not talking about the abstract idea of an Entity as being something that has an identity, which may or may not be associated with various attributes. The problem is rather when an attempt is made to model an entity as a data type with all possible attributes and relations it can have to other entities.

Consider a Car entity, it is of a certain make, model and year. So far so good, but then you might need to reflect that it has an owner. Maybe an owner is just represented as a name at first. Later it might become a Person entity but it is still logically represented as a part of or belonging to the Car entity. And then it actually has to be a list of owners, and it just goes on and on, creating logical entanglement.

Whenever you get a Car entity, you get a representation of everything that could be known about a car.

What is actually needed to know depends on the situation and use case. What is really needed is a way to represent the relation between different pieces of data so that they can put together in arbitrary combinations as needed. Now the Lisp programmers will slowly nod their heads because they do that all the time, its just a list of things. But that's not necessarily what I'm advocating, there is still a value in defining what each relevant combination of data is and have it type-checked.

When there is a need to reflect the ownership relation of the car,  the entity trap manifests both if there is an owner entity attached to the car or if there is a car entity attached to a person (owner). Either way creates a dependency between any user of one of the entities to the other entity. The correct way is to always keep those entities separate from each other and, when needed, represent the relationship as a data value object containing both the car and the owner (maybe an OwnedCar or a CarOwnership if you have name it).

So if you've ever wondered why it's such a horror to maintain and extend a system built with an ORM, it's mostly because of "entities". Of course it doesn't help that all the annotations and whatever else that is needed to describe the entity and it's relationships by necessity will form a language that is a buggy and incomplete implementation of SQL but in a proprietary syntax that nobody knows.

In some NoSQL databases it's called a document instead of an entity, but it tends to get bogged down in the same way.

Query builder libraries at least mostly avoid the entity trap, but they still suffer from being a buggy and incomplete implementation of SQL in a proprietary syntax that nobody knows.

So the best you can do is to embrace the relational idea of data and get the tools that will amplify your SQL usage. In typescript or javascript, Slonik is a good choice. In Java, have a look at Wrapd. For Go there is SQLC.

Perhaps you might want to take the relational idea even one step further and go full Datomic, leaving SQL for Datalog. That uses the term "entity", but in the abstract sense as just an ID that is related to an arbitrary set of attributes, so all is still good.

Comments

Popular Posts