A guide to analyzing Python performance - Python Performance, 6 of n

最新推荐文章于 2024-07-30 19:34:31 发布

CPP_CHEN

最新推荐文章于 2024-07-30 19:34:31 发布

阅读量1k

点赞数

分类专栏： Python

Python 专栏收录该内容

52 篇文章

订阅专栏

A guide to analyzing Pythonperformance

While it’s not alwaysthe case that every Python program you write will require a rigorousperformance analysis, it is reassuring to know that there are a wide variety oftools in Python’s ecosystem that one can turn to when the time arises.

Analyzing a program’sperformance boils down to answering 4 basic questions:

1. How fast is it running?

2. Where are the speed bottlenecks?

3. How much memory is it using?

4. Where is memory leaking?

Below, we’ll dive intothe details of answering these questions using some awesome tools.

Coarse grain timingwith time

Let’s begin by using aquick and dirty method of timing our code: the good old unix utilitytime.

$ time python yourprogram.py

real 0m1.028s

user 0m0.001s

sys 0m0.003s

The meaning betweenthe three output measurements are detailed in this stackoverflow article,but in short

· real - refers to theactual elasped time

· user - refers to theamount of cpu time spent outside of kernel

· sys - refers to theamount of cpu time spent inside kernel specific functions

You can get a sense ofhow many cpu cycles your program used up regardless of other programs runningon the system by adding together the sys and user times.

If the sum of sys and user times is much less than real time, then you can guess that most yourprogram’s performance issues are most likely related to IO waits.

Fine grain timing witha timing context manager

Our next techniqueinvolves direct instrumentation of the code to get access to finer grain timinginformation. Here’s a small snippet I’ve found invaluable for making ad-hoctiming measurements:

timer.py

importtime

classTimer(object):

def__init__(self,verbose=False):

self.verbose=verbose

def__enter__(self):

self.start=time.time()

returnself

def__exit__(self,*args):

self.end=time.time()

self.secs=self.end-self.start

self.msecs=self.secs*1000 #millisecs

ifself.verbose:

print'elapsed time: %f ms'%self.msecs

In order to use it,wrap blocks of code that you want to time with Python’s with keyword and this Timer context manager. It will take care of startingthe timer when your code block begins execution and stopping the timer whenyour code block ends.

Here’s an example useof the snippet:

fromtimerimportTimer

fromredisimportRedis

rdb=Redis()

withTimer()ast:

rdb.lpush("foo","bar")

print"=> elaspedlpush: %s s"%t.secs

withTimerast:

rdb.lpop("foo")

print"=> elaspedlpop: %s s"%t.secs

I’ll often log theoutputs of these timers to a file in order to see how my program’s performanceevolves over time.

Line-by-line timingand execution frequency with a profiler

Robert Kern has a niceproject called line_profiler which I often use to see how fast and howoften each line of code is running in my scripts.

To use it, you’ll needto install the python package via pip:

$ pipinstall line_profiler

Once installed you’llhave access to a new module called “line_profiler” as well as an executablescript “kernprof.py”.

To use this tool,first modify your source code by decorating the function you want to measurewith the @profile decorator. Don’tworry, you don’t have to import anyting in order to use this decorator. The kernprof.py script automatically injects it into yourscript’s runtime during execution.

primes.py

@profile

defprimes(n):

ifn==2:

return[2]

elifn<2:

return[]

s=range(3,n+1,2)

mroot=n**0.5

half=(n+1)/2-1

i=0

m=3

whilem<=mroot:

ifs[i]:

j=(m*m-3)/2

s[j]=0

whilej<half:

s[j]=0

j+=m

i=i+1

m=2*i+3

return[2]+[xforxinsifx]

primes(100)

Once you’ve gottenyour code setup with the @profile decorator, use kernprof.py to run your script.

$ kernprof.py-l -v fib.py

The -l option tells kernprof to inject the @profile decorator into your script’s builtins, and -v tells kernprof to display timing informationonce you’re script finishes. Here’s one the output should look like for theabove script:

Wrote profile resultsto primes.py.lprof

Timer unit: 1e-06 s

File: primes.py

Function: primes atline 2

Total time: 0.00019 s

Line # Hits Time Per Hit % Time Line Contents

==============================================================

2 @profile

3 defprimes(n):

4 1 2 2.0 1.1 if n==2:

5 return [2]

6 1 1 1.0 0.5 elif n<2:

7 return []

8 1 4 4.0 2.1 s=range(3,n+1,2)

9 1 10 10.0 5.3 mroot = n ** 0.5

10 1 2 2.0 1.1 half=(n+1)/2-1

11 1 1 1.0 0.5 i=0

12 1 1 1.0 0.5 m=3

13 5 7 1.4 3.7 while m <= mroot:

14 4 4 1.0 2.1 if s[i]:

15 3 4 1.3 2.1 j=(m*m-3)/2

16 3 4 1.3 2.1 s[j]=0

17 31 31 1.0 16.3 while j<half:

18 28 28 1.0 14.7 s[j]=0

19 28 29 1.0 15.3 j+=m

20 4 4 1.0 2.1 i=i+1

21 4 4 1.0 2.1 m=2*i+3

22 50 54 1.1 28.4 return [2]+[x for x in sif x]

Look for lines with ahigh amount of hits or a high time interval. These are the areas whereoptimizations can yield the greatest improvements.

How much memory doesit use?

Now that we have agood grasp on timing our code, let’s move on to figuring out how much memoryour programs are using. Fortunately for us, Fabian Pedregosa has implemented anicememory profiler modeled after Robert Kern’s line_profiler.

First install it viapip:

$ pipinstall -U memory_profiler

$ pipinstall psutil

(Installing the psutil package here is recommended because it greatlyimproves the performance of the memory_profiler).

Like line_profiler,memory_profiler requires that you decorate your function of interest with an@profile decorator like so:

@profile

defprimes(n):

...

To see how much memoryyour function uses run the following:

$ python-m memory_profiler primes.py

You should see outputthat looks like this once your program exits:

Filename: primes.py

Line # Mem usage Increment Line Contents

==============================================

2 @profile

3 7.9219 MB 0.0000 MB def primes(n):

4 7.9219 MB 0.0000 MB if n==2:

5 return [2]

6 7.9219 MB 0.0000 MB elif n<2:

7 return []

8 7.9219 MB 0.0000 MB s=range(3,n+1,2)

9 7.9258 MB 0.0039 MB mroot = n ** 0.5

10 7.9258 MB 0.0000 MB half=(n+1)/2-1

11 7.9258 MB 0.0000 MB i=0

12 7.9258 MB 0.0000 MB m=3

13 7.9297 MB 0.0039 MB while m <= mroot:

14 7.9297 MB 0.0000 MB if s[i]:

15 7.9297 MB 0.0000 MB j=(m*m-3)/2

16 7.9258 MB -0.0039 MB s[j]=0

17 7.9297 MB 0.0039 MB while j<half:

18 7.9297 MB 0.0000 MB s[j]=0

19 7.9297 MB 0.0000 MB j+=m

20 7.9297 MB 0.0000 MB i=i+1

21 7.9297 MB 0.0000 MB m=2*i+3

22 7.9297 MB 0.0000 MB return [2]+[x for x in s if x]

IPython shortcuts forline_profiler and memory_profiler

A little known featureof line_profiler and memory_profiler is that both programs have shortcut commandsaccessible from within IPython. All you have to do is type the following withinan IPython session:

%load_extmemory_profiler

%load_extline_profiler

Upon doing so you’llhave access to the magic commands %lprun and %mprun which behave similarlyto their command-line counterparts. The major difference here is that you won’tneed to decorate your to-be-profiled functions with the @profile decorator. Just go ahead and run the profilingdirectly within your IPython session like so:

In [1]: from primesimport primes

In [2]: %mprun -fprimes primes(1000)

In [3]: %lprun -fprimes primes(1000)

This can save you alot of time and effort since none of your source code needs to be modified inorder to use these profiling commands.

Where’s the memoryleak?

The cPythoninterpreter uses reference counting as it’s main method of keeping track ofmemory. This means that every object contains a counter, which is incrementedwhen a reference to the object is stored somewhere, and decremented when areference to it is deleted. When the counter reaches zero, the cPythoninterpreter knows that the object is no longer in use so it deletes the objectand deallocates the occupied memory.

A memory leak canoften occur in your program if references to objects are held even though theobject is no longer in use.

The quickest way tofind these “memory leaks” is to use an awesome tool called objgraphwritten byMarius Gedminas. This tool allows you to see the number of objects in memoryand also locate all the different places in your code that hold references tothese objects.

To get started, firstinstall objgraph:

pip install objgraph

Once you have thistool installed, insert into your code a statement to invoke the debugger:

importpdb;pdb.set_trace()

Which objects are the most common?

At run time, you caninspect the top 20 most prevalent objects in your program by running:

(pdb) import objgraph

(pdb)objgraph.show_most_common_types()

MyBigFatObject 20000

tuple 16938

function 4310

dict 2790

wrapper_descriptor 1181

builtin_function_or_method934

weakref 764

list 634

method_descriptor 507

getset_descriptor 451

type 439

Which objects have been added or deleted?

We can also see whichobjects have been added or deleted between two points in time:

(pdb) import objgraph

(pdb)objgraph.show_growth()

(pdb)objgraph.show_growth() # this onlyshows objects that has been added or deleted since last show_growth() call

traceback 4 +2

KeyboardInterrupt 1 +1

frame 24 +1

list 667 +1

tuple 16969 +1

What is referencing this leaky object?

Continuing down thisroute, we can also see where references to any given object is being held.Let’s take as an example the simple program below:

x=[1]

y=[x,[x],{"a":x}]

importpdb;pdb.set_trace()

To see what is holdinga reference to the variable x, run the objgraph.show_backref()function:

(pdb) import objgraph

(pdb)objgraph.show_backref([x], filename="/tmp/backrefs.png")

The output of thatcommand should be a PNG image stored at /tmp/backrefs.png and it should look something like this:

The box at the bottomwith red lettering is our object of interest. We can see that it’s referencedby the symbol x once and by the list y three times. If x is the object causing a memory leak, we canuse this method to see why it’s not automatically being deallocated by trackingdown all of its references.

So to review, objgraph allows us to:

· show the top N objectsoccupying our python program’s memory

· show what objects havebeen deleted or added over a period of time

· show all references toa given object in our script

Effort vs precision

In this post, I’veshown you how to use several tools to analyze a python program’s performance.Armed with these tools and techniques you should have all the informationrequired to track down most memory leaks as well as identify speed bottlenecksin a Python program.

As with many othertopics, running a performance analysis means balancing the tradeoffs betweeneffort and precision. When in doubt, implement the simplest solution that willsuit your current needs.

Refrences

· stack overflow - time explained

· line_profiler

· memory_profiler

· objgraph