On the topic of duplication

profezzorn · September 28, 2024, 5:10am

If you’ve ever had me review your code, you may have heard me complain about redundant code, cut-n-paste, duplicate states or simply asking you to create a helper function. These are all forms of duplication, but why is that bad exactly?

Cut-n-paste coding

This is probably the most common form of duplication. It happens when you need something similar to some code you already wrote, so you just copy, paste, make a few modifications, and you’re done, right?

wrong

The life of a piece of code doesn’t end when you finish writing it. That’s when it begins. Lots of people will read and try to understand the code, it will get modified, copied, printed, read, abused, and, when all else fails… compiled. The life of a piece of code ends when the last copy is deleted.

At it’s very simplest, when you cut-and-paste a piece of code, it means that when you want to change some aspect of that code in the future, you now have two places to change. That might not sound so bad, but maybe someone pasted 10 times. Or a hundred? If this happens a lot, suddenly, nobody wants to ever touch the code again.

Normally, the solution is to move the repeated code into a function or class, and then replace those bits that change with variables so that you can just call a function 10 times instead of pasting the same code 10 times. This still causes some amount of duplication, but learning how to minimize the duplicated part is part of becoming a good programmer.

Redundant code

Another form of duplication is redundant code. This happens when the programmer doesn’t know the code well enough to say “this cannot happen”, so they insert code that deals with those cases. A lot of times, people later learn that those things cannot happen, but are too lazy to actually remove the redundant code. I mean, what’s the harm? It can’t happen, so the code never runs anyways, right?

while this is true, it is also wrong

There is a saying, that code gets read more often by people than by compilers. I don’t actually think that is true, but I care a lot more about people than I do about compilers.

When you read code that has redundancies in it, it makes you go: what? why is this here? can this happen? Let me go back and read it again to see if I understand it correctly… Often the offending code is small, maybe a single line, but it can slow down the reader significantly. Perfect code (which doesn’t exist, but let’s pretend for now…) should be easy to read. A reader should be able to read the file from top to bottom, without having to go read a bunch of other files and get a good grasp of what the code does and why. (The why needs to be documented in comments, since the code itself can’t really explain that.) There is a word for this, it is called “Readability”, and it’s one of the most important aspects of well written code. Redundant code hurts readability.

Linked states / denormalized data

If you’ve ever taken a database class, you already know about this. If you’re a quantum physicist, you would probably call this entanglement. If you’re a regular programmer, you would just call this “bad”.

Basically, what I’m talking about is having two variables that represent the same thing. An example might be having a bool which says if a set of files has been found, and another variable which says how many files were found. Unless you screw something up, the bool should only be false if number of files is zero, and true if the number of files anything other than zero. The bool is not needed, because you can just check how many files were found.

Another example might be having one bool for when a saber is on, and another one for when it is off. They should never both be true, right?

For anybody who has done any proffieboard programming, your first thought is probably that extra variables take up unnecessary RAM memory, and we don’t have that much of it. While this is the case, it’s not the primary reason why duplicated state is bad…

To quote myself two paragraphs ago: “Unless you screw something up…”. Duplicated states is an opportunity to screw up. What happens if you do something in the code that breaks the assumptions. What happens if the saber is both on and off at the same time? Who knows? If your state is stored in one place (which is called “normalized”) then there is no chance of messing up this way.

When to break the rules

So these rules are guidelines, not hard and fast rules. However, breaking the rules should be done thoughtfully and deliberately, and when you get something for it. If the code becomes smaller, faster, or easier to read, then it might be worth breaking the rules… and if it isn’t obvious, write a comment that explains why you broke the rules to show the reader that you thought about it and determined that it would be worth breaking the rule in this particular case.