A guide to analyzing Pythonperformance
While it’s not alwaysthe case that every Python program you write will require a rigorousperformance analysis, it is reassuring to know that there are a wide variety oftools in Python’s ecosystem that one can turn to when the time arises.
Analyzing a program’sperformance boils down to answering 4 basic questions:
1. How fast is it running?
2. Where are the speed bottlenecks?
3. How much memory is it using?
4. Where is memory leaking?
Below, we’ll dive intothe details of answering these questions using some awesome tools.
Coarse grain timingwith time
Let’s begin by using aquick and dirty method of timing our code: the good old unix utilitytime.
$ time python yourprogram.py
real 0m1.028s
user 0m0.001s
sys 0m0.003s
The meaning betweenthe three output measurements are detailed in this stackoverflow article,but in short
· real - refers to theactual elasped time
· user - refers to theamount of cpu time spent outside of kernel
· sys - refers to theamount of cpu time spent inside kernel specific functions
You can get a sense ofhow many cpu cycles your program used up regardless of other programs runningon the system by adding together the sys and user times.
If the sum of sys and user times is much less than real time, then you can guess that most yourprogram’s performance issues are most likely related to IO waits.
Fine grain timing witha timing context manager
Our next techniqueinvolves direct instrumentation of the code to get access to finer grain timinginformation. Here’s a small snippet I’ve found invaluable for making ad-hoctiming measurements:
timer.py
importtime
classTimer(object):
def__init__(self,verbose=False):
self.verbose=verbose
def__enter__(self):
self.start=time.time()
returnself
def__exit__(self,*args):
self.end=time.time()
self.secs=self.end-self.start
self.msecs=self.secs*1000 #millisecs
ifself.verbose:
print'elapsed time: %f ms'%self.msecs
In order to use it,wrap blocks of code that you want to time with Python’s with keyword and this Timer context manager. It will take care of startingthe timer when your code block begins execution and stopping the timer whenyour code block ends.
Here’s an example useof the snippet:
fromtimerimportTimer
fromredisimportRedis
rdb=Redis()
withTimer()ast:
rdb.lpush("foo","bar")
print"=> elaspedlpush: %s s"%t.secs
withTimerast:
rdb.lpop("foo")
print"=> elaspedlpop: %s s"%t.secs
I’ll often log theoutputs of these timers to a file in order to see how my program’s performanceevolves over time.
Line-by-line timingand execution frequency with a profiler
Robert Kern has a niceproject called line_profiler which I often use to see how fast and howoften each line of code is running in my scripts.
To use it, you’ll needto install the python package via pip:
$ pipinstall line_profiler
Once installed you’llhave access to a new module called “line_profiler” as well as an executablescript “kernprof.py”.
To use this tool,first modify your source code by decorating the function you want to measurewith the @profile decorator. Don’tworry, you don’t have to import anyting in order to use this decorator. The kernprof.py script automatically injects it into yourscript’s runtime during execution.
primes.py
@profile
defprimes(n):
ifn==2:
return[2]
elifn<2:
return[]
s=range(3,n+1,2)
mroot=n**0.5
half=(n+1)/2-1
i=0
m=3
whilem<=mroot:
ifs[i]:
j=(m*m-3)/2
s[j]=0
whilej<half:
s[j]=0
j+=m
i=i+1
m=2*i+3
return[2]+[xforxinsifx]
primes(100)
Once you’ve gottenyour code setup with the @profile decorator, use kernprof.py to run your script.
$ kernprof.py-l -v fib.py
The -l option tells kernprof to inject the @profile decorator into your script’s builtins, and -v tells kernprof to display timing informationonce you’re script finishes. Here’s one the output should look like for theabove script:
Wrote profile resultsto primes.py.lprof
Timer unit: 1e-06 s
File: primes.py
Function: primes atline 2
Total time: 0.00019 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
2 @profile
3 defprimes(n):
4 1 2 2.0 1.1 if n==2:
5 return [2]
6 1 1 1.0 0.5 elif n<2:
7 return []
8 1 4 4.0 2.1 s=range(3,n+1,2)
9 1 10 10.0 5.3 mroot = n ** 0.5
10 1 2 2.0 1.1 half=(n+1)/2-1
11 1 1 1.0 0.5 i=0
12 1 1 1.0 0.5 m=3
13 5 7 1.4 3.7 while m <= mroot:
14 4 4 1.0 2.1 if s[i]:
15 3 4 1.3 2.1 j=(m*m-3)/2
16 3 4 1.3 2.1 s[j]=0
17 31 31 1.0 16.3 while j<half:
18 28 28 1.0 14.7 s[j]=0
19 28 29 1.0 15.3 j+=m
20 4 4 1.0 2.1 i=i+1
21 4 4 1.0 2.1 m=2*i+3
22 50 54 1.1 28.4 return [2]+[x for x in sif x]
Look for lines with ahigh amount of hits or a high time interval. These are the areas whereoptimizations can yield the greatest improvements.
How much memory doesit use?
Now that we have agood grasp on timing our code, let’s move on to figuring out how much memoryour programs are using. Fortunately for us, Fabian Pedregosa has implemented anicememory profiler modeled after Robert Kern’s line_profiler.
First install it viapip:
$ pipinstall -U memory_profiler
$ pipinstall psutil
(Installing the psutil package here is recommended because it greatlyimproves the performance of the memory_profiler).
Like line_profiler,memory_profiler requires that you decorate your function of interest with an@profile decorator like so:
@profile
defprimes(n):
...
...
To see how much memoryyour function uses run the following:
$ python-m memory_profiler primes.py
You should see outputthat looks like this once your program exits:
Filename: primes.py
Line # Mem usage Increment Line Contents
==============================================
2 @profile
3 7.9219 MB 0.0000 MB def primes(n):
4 7.9219 MB 0.0000 MB if n==2:
5 return [2]
6 7.9219 MB 0.0000 MB elif n<2:
7 return []
8 7.9219 MB 0.0000 MB s=range(3,n+1,2)
9 7.9258 MB 0.0039 MB mroot = n ** 0.5
10 7.9258 MB 0.0000 MB half=(n+1)/2-1
11 7.9258 MB 0.0000 MB i=0
12 7.9258 MB 0.0000 MB m=3
13 7.9297 MB 0.0039 MB while m <= mroot:
14 7.9297 MB 0.0000 MB if s[i]:
15 7.9297 MB 0.0000 MB j=(m*m-3)/2
16 7.9258 MB -0.0039 MB s[j]=0
17 7.9297 MB 0.0039 MB while j<half:
18 7.9297 MB 0.0000 MB s[j]=0
19 7.9297 MB 0.0000 MB j+=m
20 7.9297 MB 0.0000 MB i=i+1
21 7.9297 MB 0.0000 MB m=2*i+3
22 7.9297 MB 0.0000 MB return [2]+[x for x in s if x]
IPython shortcuts forline_profiler and memory_profiler
A little known featureof line_profiler and memory_profiler is that both programs have shortcut commandsaccessible from within IPython. All you have to do is type the following withinan IPython session:
%load_extmemory_profiler
%load_extline_profiler
Upon doing so you’llhave access to the magic commands %lprun and %mprun which behave similarlyto their command-line counterparts. The major difference here is that you won’tneed to decorate your to-be-profiled functions with the @profile decorator. Just go ahead and run the profilingdirectly within your IPython session like so:
In [1]: from primesimport primes
In [2]: %mprun -fprimes primes(1000)
In [3]: %lprun -fprimes primes(1000)
This can save you alot of time and effort since none of your source code needs to be modified inorder to use these profiling commands.
Where’s the memoryleak?
The cPythoninterpreter uses reference counting as it’s main method of keeping track ofmemory. This means that every object contains a counter, which is incrementedwhen a reference to the object is stored somewhere, and decremented when areference to it is deleted. When the counter reaches zero, the cPythoninterpreter knows that the object is no longer in use so it deletes the objectand deallocates the occupied memory.
A memory leak canoften occur in your program if references to objects are held even though theobject is no longer in use.
The quickest way tofind these “memory leaks” is to use an awesome tool called objgraphwritten byMarius Gedminas. This tool allows you to see the number of objects in memoryand also locate all the different places in your code that hold references tothese objects.
To get started, firstinstall objgraph:
pip install objgraph
Once you have thistool installed, insert into your code a statement to invoke the debugger:
importpdb;pdb.set_trace()
Which objects are the most common?
At run time, you caninspect the top 20 most prevalent objects in your program by running:
(pdb) import objgraph
(pdb)objgraph.show_most_common_types()
MyBigFatObject 20000
tuple 16938
function 4310
dict 2790
wrapper_descriptor 1181
builtin_function_or_method934
weakref 764
list 634
method_descriptor 507
getset_descriptor 451
type 439
Which objects have been added or deleted?
We can also see whichobjects have been added or deleted between two points in time:
(pdb) import objgraph
(pdb)objgraph.show_growth()
.
.
.
(pdb)objgraph.show_growth() # this onlyshows objects that has been added or deleted since last show_growth() call
traceback 4 +2
KeyboardInterrupt 1 +1
frame 24 +1
list 667 +1
tuple 16969 +1
What is referencing this leaky object?
Continuing down thisroute, we can also see where references to any given object is being held.Let’s take as an example the simple program below:
x=[1]
y=[x,[x],{"a":x}]
importpdb;pdb.set_trace()
To see what is holdinga reference to the variable x, run the objgraph.show_backref()function:
(pdb) import objgraph
(pdb)objgraph.show_backref([x], filename="/tmp/backrefs.png")
The output of thatcommand should be a PNG image stored at /tmp/backrefs.png and it should look something like this:
The box at the bottomwith red lettering is our object of interest. We can see that it’s referencedby the symbol x once and by the list y three times. If x is the object causing a memory leak, we canuse this method to see why it’s not automatically being deallocated by trackingdown all of its references.
So to review, objgraph allows us to:
· show the top N objectsoccupying our python program’s memory
· show what objects havebeen deleted or added over a period of time
· show all references toa given object in our script
Effort vs precision
In this post, I’veshown you how to use several tools to analyze a python program’s performance.Armed with these tools and techniques you should have all the informationrequired to track down most memory leaks as well as identify speed bottlenecksin a Python program.
As with many othertopics, running a performance analysis means balancing the tradeoffs betweeneffort and precision. When in doubt, implement the simplest solution that willsuit your current needs.
Refrences
· stack overflow - time explained
· objgraph