Overdetermination in software development
Overdetermination can happen in a lot of places. Not only in mathemetics and in theory, but in real life practice too. Since I’ve studied mechatronics engineering, this topic is mostly familiar to me from mechanics, where over-constraints can cause a lot of issues.
I’m writing this post, becasue I stumble upon too often with code, that shows signs of over-determination, and the authors of such code for some reason don’t see the problem. According to some of my friends, in computer science courses unfortunately the topic overdetermination is not being discussed, so I guess, that’s the cause.
What is overdetermination?
In mathematics it’s called overdetermined system, in mechanics it’s overconstrained mechanism. In software development unfortunately I didn’t find any official definition, but if you read the definitions, you’ll see what’s common in them:
They contain more than necessary statements / constraints / information to fully describe a system.
According to the algebratic example, if you have
In mechanics when you have an overconstrained mechanism, unwanted mechanical tensions are rising, and defects will appear sooner than it was probably calculated. (If you’re interested in some detailed examples, read this post)
Examples from the software field
The following examples are real cases I encountered with, but I obfuscated them for ethical and legal reasons.
Totals
A restaurant is saving the orders into their database. But not only the orders, all items (dishes, drinks) too.
Despite there are 2 separate entities (orders, items), the backend service only accepts a PUT
request to the /orders/
endpoint.
The request body is the following:
{
"order" {
"customer_id": "customer-1",
"total": "16.65",
"items": [
{
"name": "wheat beer",
"total": "3.25"
},
{
"name": "wiener schnitzel",
"total": "10.9"
},
{
"name": "coffee",
"total": "2.5"
}
]
}
}
Let’s dive into it.
The unknowns:
The equations:
As you can see, we have 4 unknowns and 5 equations, so this system is overdetermined!
It’s enough to forget updating either the total or just one item down the process anywhere and you’ll end up with an inconsistency.
Solution: don’t send the total, calculate it when it’s needed.
Database relations
A file service company groups it’s files into sets, and it also needs to export these files to various customers.
When an export happens, a new export_job
record and multiple export_file
records are being created in the database. The DB structure is the following:
The database can defined as follwing:
So, the export_file
-> file_set
relation is defined via
And here is the problem, a file_set
can only relate to a single export_file
, which means there should be only 1 relation!
Nothing guarantees that both relations will link the export_file
to the same file_set
.
To prevent inconsistency you have to implement a lot of extra logic in the code, which can be and should be avoided.
Solution: delete the export_job.file_set_id
column.
How to avoid overdetermination
If you want to avoid creating overdetermined systems, always seek to simplicity and include only the necessary information in the definitions. Keep it simple, stupid!
Always ask yoursef - Do I really need that additional field / constraint / variable, or is the information provided by that field / … already available for me?