Surgical Time Tracking in Python

Machine Learning 4 min read

How to profile your python code to improve performance

When your python scripts take too much time to execute, it’s crucial to profile which part of your code is responsible. It becomes increasingly tricky as your code grows both in size and complexity.

Let’s take an example:

One part of this code takes too much time

The function run loads the Titanic csv file with pandas and does some computation on the loaded data. I want to know which part takes more time: the loading or the computation.

I tried the standard python profiler but it turned out that it does not help me much to address the performance issue. Instead, I wrote a python decorator to time my code by allowing me to profile exactly what I need.

Why the standard python profiler is not what I need?

Let’s profile our piece of python code with cProfile. To do so, load the cProfile module when you run your script:

python -m cProfile

Alternatively, you can profile the “run” function directly in the python code:

import cProfile'run()')

In the end, you’ll get an output that looks like this:

Results from cProfile

Why it doesn’t help me:

  • it gives way too much information: it tracks the time of all called functions, including the one called deep in pandas. Hence I got more ~300 function calls. In my example, I am only interested in the execution time of my own functions i.e., run, load, normalize_ticket_price
  • “tottime” is not relevant: this is the actual time spent in the function excluding the time spent in subcalls. As all calls are tracked, it is almost always 0.000.
  • exploring the profiler results is hard: time results are either print to the console or written in a file. You may use pstats or tools such as cProfilev(basic display with sorting facilities), kcachegring or gprof2dot (show call graph) to explore results, but it is heavy.
  • exporting the result is not easy: in my case, I’d like to store the profiler results in a comprehensive way in an external database.

A simple python decorator for tracking time

Instead, I’ll use this simple python decorator for profiling:

This decorator can equip a function you want to profile

It computes the execution time of the decorated function fn and logs the result right after fn finished to execute:

  • the @wrap (line 6) decorator sets the meta-attributes (name, docstring, …) of the decorated function from the original function ones.
  • the try … finally ensures the execution time is always logged, even if the decorated function raised an exception.

To use it, simply decorate the functions that you decided to profile:

The functions with the profiling decorator

Executing the function run will print the following:

[SimpleTimeTracker] load 0.006
[SimpleTimeTracker] compute_max_ticket_price_per_class 0.003
[SimpleTimeTracker] normalize_ticket_price_per_class 0.004
[SimpleTimeTracker] run 0.010
  • it is simple: the decorator is 24 lines long and has no dependencies.
  • it is surgical: I only got timings of the function I decided to profile. This allows you to quickly understand which part of your script is slower.
  • it is easy to export the results: the logging function _log passed to the decorator handles the profiler results. In the example above, it simply prints timings to the console, but I could have store the profiler results to a database.

A better python decorator that profiles partial times

The partial time of a function foo consists of its total execution time i.e., the time delta between entering and exiting foo, minus the execution times of the functions that are called inside the body of foo.

In the example above, the partial execution time of the function run is zero: its total execution time (10ms) minus execution times of normalize_ticket_price (4ms) and load (6ms). It means that if I want to improve the performance of the function run, I may focus on improving load and normalize_ticket_price, as literally no time is spent in the body of the function run (which is obvious when looking at the code).

To profile partial execution times, I improved a little bit the decorator:

Profiling the real time taken by each portion of the functions

Note that we do need a global variable _total_time_call_stack to compute partial execution times.

Our functions, equipped with the newer profiling decorator

Here is the profiling results of when running the function run.

[BetterTimeTracker] load 0.006 0.006
[BetterTimeTracker] compute_max_ticket_price_per_class 0.003 0.003
[BetterTimeTracker] normalize_ticket_price_per_class 0.004 0.001
[BetterTimeTracker] run 0.010 0.000

Full code, documentation and examples are available here.


Following this comment on Reddit, I tried pyflame + flamegraph tools for profiling. It’s amazing: it shows which part of the python code takes more time in a concise way and without modifying the code (e.g., adding decorators).

To do so, you need to install pyflame and flamegraph on your computer, and to run the following command:

pyflame -t python | flamegraph > profile.svg

You’ll get an SVG file that looks like this:

You can download the SVG file here. In a nutshell:

  • Each bar represents a call to a function;
  • Colors are randomly picked among warm colors :)
  • The Y-axis represent call stacks;
  • The X-axis is not the time: pyflame checks every milisecond the current call stack and the bar width represents the overall number of ticks it was in the function.

In this example, you can see most of the time (91.17%) is spent loading pandas. To get more detail about the run function, do ctrl+f to search the SVG and click on the matching element (the tiny purple bar in the right part), you should get something like:

In the end, both pyflame and the profiling decorators allow you retrieve the execution time ratio spent in load and normalize_ticket_price_per_class functions which is about 60/40.

Thanks to Reda Boumahdi, Adil Baaj, Tristan Roussel, Adrien Lina, Louis Nicolle, and Flavian Hautbois.

Get our Battle-Tested Tutorials Delivered Straight Into Your Inbox Every Week
Sign up for our newsletter