Last updated on

Overdetermination in software development


Overdetermination can happen in a lot of places. Not only in mathemetics and in theory, but in real life practice too. Since I’ve studied mechatronics engineering, this topic is mostly familiar to me from mechanics, where over-constraints can cause a lot of issues.


I’m writing this post, becasue I stumble upon too often with code, that shows signs of over-determination, and the authors of such code for some reason don’t see the problem. According to some of my friends, in computer science courses unfortunately the topic overdetermination is not being discussed, so I guess, that’s the cause.

What is overdetermination?

In mathematics it’s called overdetermined system, in mechanics it’s overconstrained mechanism. In software development unfortunately I didn’t find any official definition, but if you read the definitions, you’ll see what’s common in them:


They contain more than necessary statements / constraints / information to fully describe a system.


According to the algebratic example, if you have unknowns (variables), you should describe your system with equations. As soon as you’ll have equations, where , you’ll have an overdetermined system, and it can happen that one or more of the equations are incorrect, so your definition for the system can be incorrect, you can have contradictions.


In mechanics when you have an overconstrained mechanism, unwanted mechanical tensions are rising, and defects will appear sooner than it was probably calculated. (If you’re interested in some detailed examples, read this post)

Examples from the software field

The following examples are real cases I encountered with, but I obfuscated them for ethical and legal reasons.

Totals

A restaurant is saving the orders into their database. But not only the orders, all items (dishes, drinks) too.
Despite there are 2 separate entities (orders, items), the backend service only accepts a PUT request to the /orders/ endpoint.


The request body is the following:

{
    "order" {
        "customer_id": "customer-1",
        "total": "16.65",
        "items": [
            {
                "name": "wheat beer",
                "total": "3.25"
            },
            {
                "name": "wiener schnitzel",
                "total": "10.9"
            },
            {
                "name": "coffee",
                "total": "2.5"
            }
        ]
    }
}

Let’s dive into it.


The unknowns:

The equations:


As you can see, we have 4 unknowns and 5 equations, so this system is overdetermined!
It’s enough to forget updating either the total or just one item down the process anywhere and you’ll end up with an inconsistency.


Solution: don’t send the total, calculate it when it’s needed.

Database relations

A file service company groups it’s files into sets, and it also needs to export these files to various customers.
When an export happens, a new export_job record and multiple export_file records are being created in the database. The DB structure is the following:

bread

The database can defined as follwing:


So, the export_file -> file_set relation is defined via


And here is the problem, a file_set can only relate to a single export_file, which means there should be only 1 relation! Nothing guarantees that both relations will link the export_file to the same file_set.
To prevent inconsistency you have to implement a lot of extra logic in the code, which can be and should be avoided.


Solution: delete the export_job.file_set_id column.

How to avoid overdetermination

If you want to avoid creating overdetermined systems, always seek to simplicity and include only the necessary information in the definitions. Keep it simple, stupid!


Always ask yoursef - Do I really need that additional field / constraint / variable, or is the information provided by that field / … already available for me?