Welcome to the first of a two-part blog where we look into duplicate code. In this first blog we’re looking at the deceptively simple question of what duplicate code actually is before getting into why it poses problems.
What is Duplicate Code?
Surely duplicate code is as simple as it sounds? Code which is literally duplicated, written with identical characters? Well yes, it can be – but code, at its purest level, is instructions for a program. And with this in mind, you realise that ‘duplicate code’ can have far broader meanings. You can write instructions in several ways. Code can use different characters, but the same tokens (a way of representing a specific thing in programming). It can even look and act entirely differently but functionally deliver the same end result in some cases.
What initially seems like quite a straightforward issue starts becoming a lot trickier to unpick. Especially when you consider the scale of some codebases and think about the manual rectification that will need to be carried out.
Human Thinking vs Computer ‘Thinking’
To understand why code duplication causes issues, you first need to think like a computer.
Imagine you have a process starting with step ‘X’. You want to write instructions for a second process which starts identically with step ‘X’. Logically you might create the second process in a separate place and copy and paste your previous work. The same step then exists in two different physical locations, but after ‘X’ the processes take different paths.

It’s usual for humans to define processes in terms of results. In instructions, it’s a matter of how you get from start to finish. When presented with irrelevant options, people can’t disregard them immediately with a logic filter. Instead, each option must be considered and ruled out. After ‘X’ in the above example, it may be possible to do ‘Y’ or ‘A’ but if you’re aiming for ‘Z’ then the instructions need to quickly and clearly identify the step in-between. ‘A’ is part of a separate path, irrelevant for the moment and the visual ‘noise’ will serve only to distract and confuse.
Computers work differently. Firstly they process complicated data flows a lot quicker than any person could. What would be confusing to a human is instantly comprehensible to a computer. Through logic, they can disregard an irrelevant path in an instant. This ability to ignore visual ‘noise’ means the most efficient processes can look quite different for a computer. A process that could well seem overwhelming to a human could well be the most scalable with computer logic.

Why then is Duplicate Code a Problem?
If we have concluded above that ‘human’ thinking is not efficient for a computer what does that mean in real terms? Well, the first and obvious consequence is in terms of performance. If a computer runs several parallel processes at once, rather than just one, then it puts a greater strain on its processing power.
Another larger concern comes into play if step ‘X’ ever changes. Imagine it refers to a person who is no longer at the company, or there’s simply a mistake. You will then have to fix ‘X’ in every single place it appears. Humans can use common sense and discretion if they come across an instance where someone has forgotten to change ‘X’. Computers are incapable of doing the same, they take their instructions literally.
In practical terms, this means duplicates can cause real headaches when you start altering or building upon them. What, for instance, happens if someone accidentally starts developing something new on top of an ‘X’ which wasn’t changed to the new version? Bugs, issues and technical debt are almost inevitable if you lose track of duplicate code.
There are security implications here too – if your code has a security vulnerability, then duplicating it will spread that same vulnerability around the system. Once discovered, if you don’t know where else the code appears then your platform is being left wide open.
Conclusion
Long story short, code duplication matters. It might not be an imminent threat, but it shouldn’t be taken lightly.
Next time we look into how duplicate code is created and how to solve the issue!