Refactoring in programming is the process of restructuring existing code without changing its external behavior. It is done to improve the readability, maintainability, and performance of the code. Refactoring can involve reorganizing code, renaming, and replacing complex code with simpler alternatives.
There are a few general things to look for when identifying 'poor' code:
- Poor readability: Poorly written code can be difficult to read and understand, making it difficult to maintain and debug.
- Poor structure: Code that is not organized properly can be difficult to maintain and debug.
- Unnecessary complexity: Code that is overly complex can be difficult to understand and maintain.
- Duplicate code: Code that is repeated multiple times can be inefficient and difficult to maintain.
- Unused code: Unused code can be inefficient and difficult to maintain.
However, this doesn't tell us much about 'how' to refactor.
That is somewhat different from small code-refactorings, where it is often obvious how code could be improved. On a more atomic level there are some (Pythonic) patterns to identify:
- small loops can often be re-written using list-comprehension.
- 'old' style lookups in loops should be replaced by
- poor function and variable names should be replaced with obvious, easy to understand names
- getter and setter methods could be replaced by
- use (build in) special methods in classes instead of custom methods
- use decorators when useful
Modern tools help to identify (and fix) most of these (and many more). These types of refactoring will probably be automated away in the near future.
Larger refactoring however requires a somewhat different approach. What is involved in refactoring complex code?
0. The Art of Continuous Refactoring:¶
Refactoring should not be a one-time event but rather an ongoing process. The key is to establish a culture of continuous refactoring within the development team. Regularly revisiting and improving existing code maintains code quality, prevents technical debt from accumulating, and promotes a healthy and maintainable codebase.
The first step in refactoring is checking the tests. Tests are essential when refactoring code. Tests provide a safety-net for developers, allowing them to make changes to the code without fear of breaking existing functionality. It boosts the self-confidence of the developer, resulting in more and more often experimental refactoring. Tests can be used to identify any errors or bugs that may have been introduced during the refactoring process. Having high test-coverage speeds up refactoring time, however it does require that tests are able to run quickly; if it takes hours for tests to finish, a lot of development-time is wasted and the refactoring process becomes frustrating.
Make a new function or module instead of trying to update the existing code. This helps to easily compare the old and new code, possibly run A/B tests, and find their occurrences using logging or Warnings.
3. import warnings¶
The Warnings module in Python is used to alert developers to potential problems in their code.
This is especially useful when refactoring code, as it can help to identify if code is used somewhere.
Additionally, it can be used to suppress specific warnings that are not relevant to the current refactoring task
protip: Python can run in a mode where all warning become errors:
python -W error my_code.py
4. Performance tests¶
Refactoring does not automatically translate to improved performance. In certain cases, ill-advised refactoring attempts can result in performance degradation. Therefore, create some speed tests before any refactoring, so you can compare results later on. Python Timeit library is a great tool for this and makes it easy to temporarily add performance tests. Having these performance test available will help to identify bottlenecks over time.
5. Small steps¶
Many incremental changes are better than one massive change. In the end, after many smaller steps, you might end up with a massive change after all. These small steps allow to continually test if any of the changes broke anything.
6. Know the libraries¶
Quite often, complex code is the result of trying to re-invent some fancy algorithm. Changes are that it already exists in some form as a design pattern or as a function in a build-in library. The modules itertools, functools, ABC and Numpy are packed with useful patterns and functions.
Nested loops (with if-statements) can result in complex code.
Therefore, sometimes it makes sense to move a loop to some function. (interestingly, this sometimes even speeds things up a bit, but not always).
But sometimes a better way is to create an iterable object. To make an iterable object, create a class that has at least the two functions:
This forces the developer to think about the object and its purpose. When implemented correctly, this results in more understandable and (re)usable objects.
8. Focus on Modularity¶
Modularity is a key aspect of usability. Creating a simple API around some more complex code often results in more developers using the code. Modular code also allows for greater flexibility and extensibility in the future.
Refactoring does not lead to self-documenting code. The best moment to update documentation is right after refactoring. By reserving some time for writing documentation, as part of the refactoring planning, you end up with more tangible results.