Keeping track of Python Code Metrics

In your career as a software developer, there will be times when you work on projects that don’t have a daily build, that don’t have unit tests, that are a mess of legacy spaghetti code. Your goal is to tame that legacy project and the only way for any project to get back under control is to have metrics and to measure how your code base fares and then to keep daily (or weekly) track of those metrics and see how they’re changing. Some of the metrics, like lines of code, are simply indicators and warnings; large complex projects will have hundreds of thousands of lines of code and that’s okay if the project is well-maintained. However in this case we’re talking about a legacy project that is out of control. Any metrics are better than none.

The Metrics

These are the metrics we will be checking for our project:

  • number of lines of code
  • test coverage
  • number of flake8 errors/warnings
  • number of clone/duplicate code snippets
  • cyclomatic complexity

The Tools

These are the tools in our toolbox for measuring and keeping track of these metrics:

  • radon, measures lines of code and cyclomatic complexity (its maintainability index also uses the Halstead volume)
  • CloneDigger, for detecting copy/paste source code or any source code that is similar
  • coverage, for running unit tests and measuring the lines of code that are covered by them
  • flake8, for checking the style of the code

Installing the Measurement Tools

For Python, you can install these tools with the following line:

pip install radon clonedigger coverage flake8

For Django you’ll want the django version of coverage:

pip install django-coverage

Tracking The Metrics

When first starting work on the legacy project we need a baseline to know what kind of code we have inherited. The baseline will help us know whether we’re moving towards maintainable high quality code which encourages rapid development or whether we’re sliding towards slower development with messier code. If the project is in a really good state, the baseline will be high and at a certain point any improvements or bug fixes will have diminishing returns. If the project is in a brittle state, the baseline will be lower and we can be sure that each improvement we make is giving us great returns on our time investment.

Measuring Consistent Style and PEP8

Programs must be written for people to read, and only incidentally for machines to execute.

— Harold Abelson, The Structure and Interpretation of Computer Programs

When confronting a legacy codebase there will be many lines of code that you will need to become familiar with as it affects how quickly you can implement bug fixes and develop new features. When the code has a poor inconsistent style, it makes it difficult to read and can lead your eyes to glaze over and miss particular details in the code.

Using flake8 we can get the number of style errors/warnings:

flake8 --count /path/to/file.py

Note on Line Length

One particular issue that comes up often is the length of a source code line. The Python PEP8 standard suggests that lines stop at 79 characters. If you have a larger monitor, you can increase that to 110 or 130 characters and have enough space when looking at two pieces of code side by side. The long length of some lines of code will be the first broken window you see in a codebase.

If this error keeps coming, feel free to ignore it but remember that eventually you’ll have to fix it as it will get in the way of reading and can hide bad code.

Ignore the error (E501 is the error number for long lines):

flake8 --ignore E501 /path/to/files.py

Fix the error using autopep8 (the -i flag changes the file in place):

autopep8 -i /path/to/file.py

Measuring Lines of Code

The reason for measuring lines of code is that we can discover some correlations between size of the code and the defect rate or the complexity. Roughly speaking, a large number of lines of code can indicate a highly complex application and give us an indication of how much effort will be required to understand and maintain the code. The easiest code base to understand is one that’s only a few lines long after all.

To measure the lines of code, we use the radon tool:

radon raw /path/to/script.py

It will show us:

  • the number of lines of code
  • the number of logical lines of code
  • the number of source lines of code
  • the number of comment lines
  • the number of blank lines
  • the ratio of comments to logical lines of code
  • the ratio of comments to source lines of code

(Click here for more information on the metrics available in the radon tool.)

Example: Lines of Code in the Django code base

We can take a look at the Django code base and measure some of the lines of code to give us a rough measure of maintainability and complexity.

One file that I remember as complex was django/contrib/admin/options.py. The file contains a base class for the ModelAdmin (frontend ORM model interface) and related classes for displaying it.

Let’s run radon on it and see how many lines of code it is (as of 29 March 2015):

radon raw django/contrib/admin/options.py

django/contrib/admin/options.py
    LOC: 1890
    LLOC: 1079
    SLOC: 1653
    Comments: 132
    Multi: 237
    Blank: 237
    - Comment Stats
        (C % L): 7%
        (C % S): 8%
        (C + M % L): 20%

The ratio of comments (including multi-line docstrings) to logical lines of code is 20%. This is fairly good and we can expect some of those comments to be helpful in our understanding of the code. However, there are 1079 logical lines of code, leading us to believe that there maybe too many classes defined in the file, or classes that do too much, or methods or functions that are too long.

Let’s check out another file, django/db/models/__init__.py which defines ORM fields that can be used in our Django model classes:

radon raw django/db/models/fields/__init__.py

django/db/models/fields/__init__.py
    LOC: 2410
    LLOC: 1446
    SLOC: 2098
    Comments: 150
    Multi: 146
    Blank: 312
    - Comment Stats
        (C % L): 6%
        (C % S): 7%
        (C + M % L): 12%

The ratio of comments to logical lines of code is 12%. This is worse than the admin/options.py file so we can’t expect as much explanation of the code to be in the comments when we’re trying to understand it.

When we compare the number of logical lines of code, 1446 in fields/__init__.py versus 1079 in admin/options.py, we can see that the fields/__init__.py may be more complex to understand when debugging it or adding new features. Both files are long enough that when it’s time to add new features, it may be a good idea to create a new file and to reconsider where the new code should live.

Doing a quick check of the commit log for options/admin.py, there’s lots of bug fixes, in fact almost every month there’s a bug fix committed.

Advertisements