In the second part of this series, we look at how code duplicates are created and what you can do to solve duplicate code issues. In the first part we covered what code duplication actually is and why it’s an issue.
How are Code Duplicates Created?
We’ve already looked at why code duplication is an issue, but what causes it in the first place? There are a number of causes, some more worrying than others. The most obvious cause is copying and pasting. This is generally frowned upon in development. Some exceptions where it is more of a standard practice though, such as for boilerplate, loop unrolling or with certain programming idioms. A slight variation on pure copying and pasting is copying code and altering it slightly to fit the intended needs. This might mean inserting extra segments or inputting different variables.
One cause, which is harder to anticipate, is when a developer independently writes code elsewhere in a program that is similar to code that exists elsewhere. Studies show that such code is not syntactically similar, making it harder to identify once created.
Automated code generators can also produce duplicated code. Such tools can make development both easier and quicker. While the generator won’t actually have duplicates in the source code, they may appear in their output.
Why are Code Duplicates Created?
So now we’ve looked at the ways duplication occurs, let’s look at the root causes:
Lack of time – often with pressing deadlines it can be tempting to take a shortcut rather than figuring out the optimal way to do something. Sometimes it’s unavoidable. But doing this will create technical debt which will have to be addressed down the line.
Developer inexperience – one reason that a developer may copy code is that they lack knowledge of a language and training. While not ideal, this is only problematic if left unchecked, or worse remains hidden. Use tools that give an overview of where your developers’ strengths lie. It can also identify where they need additional support. This can be invaluable at identifying how to ensure you provide the right training for your team.
Lack of care – perhaps the most worrying reason why code duplication occurs. This can occur when developers become isolated in their own role and are only concerned with delivering their immediate objective, not the overall platform health. One way of resolving this issue is to establish a strong team dynamic. Have regular team meetings where you discuss the overall platform vision, ensuring developers understand their role as part of a whole and how their work connects with others.
Tackling these issues at the source will be far more cost and time effective than later rectification.
But I Already Have Duplicate Code – What Can I Do?
What you certainly shouldn’t do is just delete the offending code blocks straight away. There could be dependencies in place which means this could cause significant disruption to your platform.
Instead, to properly fix code duplication you’ll need to refactor. The refactoring process essentially means consolidating, reformatting and rewriting the code you currently have in order to create a more maintainable and scalable codebase.
One of the trickiest things about refactoring is that it requires knowledge of how a system fits together. Insufficient documentation can make it even harder to do. Having a tool that properly identifies where code duplicates occur makes the process significantly easier. With that knowledge in place, you can start to remove duplicated blocks and consolidate them. In most cases, the best practice would be to have the sites of the previous duplications calling the consolidated code. Though there is more nuance to this, and different setups may require different ways of refactoring. Once all the testing is complete, you should end up with an improved and healthier codebase which will be easier to manage and build on in the future.
The first step to solving any problem is awareness. Making sure you have the tools to identify and rectify duplication is essential to prevent it from spiralling out of control. Code Duplication can often be hidden from view, but it can have serious manageability, performance, scalability and security implications if left unchecked.