A guide to analyzing Python performance - Python Performance, 6 of n

A guide to analyzing Pythonperformance

While it’s not alwaysthe case that every Python program you write will require a rigorousperformance analysis, it is reassuring to know that there are a wide variety oftools in Python’s ecosystem that one can turn to when the time arises.

Analyzing a program’sperformance boils down to answering 4 basic questions:

1.       How fast is it running?

2.       Where are the speed bottlenecks?

3.       How much memory is it using?

4.       Where is memory leaking?

Below, we’ll dive intothe details of answering these questions using some awesome tools.

Coarse grain timingwith time

Let’s begin by using aquick and dirty method of timing our code: the good old unix utilitytime.

$ time python yourprogram.py

 

real    0m1.028s

user    0m0.001s

sys     0m0.003s

The meaning betweenthe three output measurements are detailed in this stackoverflow article,but in short

·        real - refers to theactual elasped time

·        user - refers to theamount of cpu time spent outside of kernel

·        sys - refers to theamount of cpu time spent inside kernel specific functions

You can get a sense ofhow many cpu cycles your program used up regardless of other programs runningon the system by adding together the sys and user times.

If the sum of sys and user times is much less than real time, then you can guess that most yourprogram’s performance issues are most likely related to IO waits.

Fine grain timing witha timing context manager

Our next techniqueinvolves direct instrumentation of the code to get access to finer grain timinginformation. Here’s a small snippet I’ve found invaluable for making ad-hoctiming measurements:

timer.py

importtime

 

classTimer(object):

    def__init__(self,verbose=False):

        self.verbose=verbose

 

    def__enter__(self):

        self.start=time.time()

        returnself

 

    def__exit__(self,*args):

        self.end=time.time()

        self.secs=self.end-self.start

        self.msecs=self.secs*1000 #millisecs

        ifself.verbose:

            print'elapsed time: %f ms'%self.msecs

In order to use it,wrap blocks of code that you want to time with Python’s with keyword and this Timer context manager. It will take care of startingthe timer when your code block begins execution and stopping the timer whenyour code block ends.

Here’s an example useof the snippet:

fromtimerimportTimer

fromredisimportRedis

rdb=Redis()

 

withTimer()ast:

    rdb.lpush("foo","bar")

print"=> elaspedlpush: %s s"%t.secs

 

withTimerast:

    rdb.lpop("foo")

print"=> elaspedlpop: %s s"%t.secs

I’ll often log theoutputs of these timers to a file in order to see how my program’s performanceevolves over time.

Line-by-line timingand execution frequency with a profiler

Robert Kern has a niceproject called line_profiler which I often use to see how fast and howoften each line of code is running in my scripts.

To use it, you’ll needto install the python package via pip:

$ pipinstall line_profiler

Once installed you’llhave access to a new module called “line_profiler” as well as an executablescript “kernprof.py”.

To use this tool,first modify your source code by decorating the function you want to measurewith the @profile decorator. Don’tworry, you don’t have to import anyting in order to use this decorator. The kernprof.py script automatically injects it into yourscript’s runtime during execution.

primes.py

@profile

defprimes(n):

    ifn==2:

        return[2]

    elifn<2:

        return[]

    s=range(3,n+1,2)

    mroot=n**0.5

    half=(n+1)/2-1

    i=0

    m=3

    whilem<=mroot:

        ifs[i]:

            j=(m*m-3)/2

            s[j]=0

            whilej<half:

                s[j]=0

                j+=m

        i=i+1

        m=2*i+3

    return[2]+[xforxinsifx]

primes(100)

Once you’ve gottenyour code setup with the @profile decorator, use kernprof.py to run your script.

$ kernprof.py-l -v fib.py

The -l option tells kernprof to inject the @profile decorator into your script’s builtins, and -v tells kernprof to display timing informationonce you’re script finishes. Here’s one the output should look like for theabove script:

Wrote profile resultsto primes.py.lprof

Timer unit: 1e-06 s

 

File: primes.py

Function: primes atline 2

Total time: 0.00019 s

 

Line #      Hits         Time Per Hit   % Time  Line Contents

==============================================================

     2                                          @profile

     3                                           defprimes(n):

     4        1            2      2.0     1.1      if n==2:

     5                                                  return [2]

     6        1            1      1.0     0.5      elif n<2:

     7                                                  return []

     8        1            4      4.0     2.1      s=range(3,n+1,2)

     9        1           10     10.0     5.3      mroot = n ** 0.5

    10        1            2      2.0     1.1      half=(n+1)/2-1

    11        1            1      1.0     0.5      i=0

    12        1            1      1.0     0.5      m=3

    13        5            7      1.4     3.7      while m <= mroot:

    14        4            4      1.0     2.1          if s[i]:

    15        3            4      1.3     2.1              j=(m*m-3)/2

    16        3            4      1.3     2.1              s[j]=0

    17       31           31      1.0    16.3              while j<half:

    18       28           28      1.0    14.7                  s[j]=0

    19       28           29      1.0    15.3                  j+=m

    20        4            4      1.0     2.1          i=i+1

    21        4            4      1.0     2.1          m=2*i+3

    22       50           54      1.1    28.4      return [2]+[x for x in sif x]

Look for lines with ahigh amount of hits or a high time interval. These are the areas whereoptimizations can yield the greatest improvements.

How much memory doesit use?

Now that we have agood grasp on timing our code, let’s move on to figuring out how much memoryour programs are using. Fortunately for us, Fabian Pedregosa has implemented anicememory profiler modeled after Robert Kern’s line_profiler.

First install it viapip:

$ pipinstall -U memory_profiler

$ pipinstall psutil

(Installing the psutil package here is recommended because it greatlyimproves the performance of the memory_profiler).

Like line_profiler,memory_profiler requires that you decorate your function of interest with an@profile decorator like so:

@profile

defprimes(n):

    ...

    ...

To see how much memoryyour function uses run the following:

$ python-m memory_profiler primes.py

You should see outputthat looks like this once your program exits:

Filename: primes.py

 

Line #    Mem usage Increment   Line Contents

==============================================

     2                           @profile

     3   7.9219 MB  0.0000 MB   def primes(n):

     4   7.9219 MB  0.0000 MB       if n==2:

     5                                   return [2]

     6   7.9219 MB  0.0000 MB       elif n<2:

     7                                   return []

     8   7.9219 MB  0.0000 MB       s=range(3,n+1,2)

     9   7.9258 MB  0.0039 MB       mroot = n ** 0.5

    10   7.9258 MB  0.0000 MB       half=(n+1)/2-1

    11   7.9258 MB  0.0000 MB       i=0

    12   7.9258 MB  0.0000 MB       m=3

    13   7.9297 MB  0.0039 MB       while m <= mroot:

    14   7.9297 MB  0.0000 MB           if s[i]:

    15   7.9297 MB  0.0000 MB               j=(m*m-3)/2

    16   7.9258 MB -0.0039 MB              s[j]=0

    17   7.9297 MB  0.0039 MB               while j<half:

    18   7.9297 MB  0.0000 MB                   s[j]=0

    19   7.9297 MB  0.0000 MB                   j+=m

    20   7.9297 MB  0.0000 MB           i=i+1

    21   7.9297 MB  0.0000 MB           m=2*i+3

    22   7.9297 MB  0.0000 MB       return [2]+[x for x in s if x]

IPython shortcuts forline_profiler and memory_profiler

A little known featureof line_profiler and memory_profiler is that both programs have shortcut commandsaccessible from within IPython. All you have to do is type the following withinan IPython session:

%load_extmemory_profiler

%load_extline_profiler

Upon doing so you’llhave access to the magic commands %lprun and %mprun which behave similarlyto their command-line counterparts. The major difference here is that you won’tneed to decorate your to-be-profiled functions with the @profile decorator. Just go ahead and run the profilingdirectly within your IPython session like so:

In [1]: from primesimport primes

In [2]: %mprun -fprimes primes(1000)

In [3]: %lprun -fprimes primes(1000)

This can save you alot of time and effort since none of your source code needs to be modified inorder to use these profiling commands.

Where’s the memoryleak?

The cPythoninterpreter uses reference counting as it’s main method of keeping track ofmemory. This means that every object contains a counter, which is incrementedwhen a reference to the object is stored somewhere, and decremented when areference to it is deleted. When the counter reaches zero, the cPythoninterpreter knows that the object is no longer in use so it deletes the objectand deallocates the occupied memory.

A memory leak canoften occur in your program if references to objects are held even though theobject is no longer in use.

The quickest way tofind these “memory leaks” is to use an awesome tool called objgraphwritten byMarius Gedminas. This tool allows you to see the number of objects in memoryand also locate all the different places in your code that hold references tothese objects.

To get started, firstinstall objgraph:

pip install objgraph

Once you have thistool installed, insert into your code a statement to invoke the debugger:

importpdb;pdb.set_trace()

Which objects are the most common?

At run time, you caninspect the top 20 most prevalent objects in your program by running:

(pdb) import objgraph

(pdb)objgraph.show_most_common_types()

 

MyBigFatObject             20000

tuple                      16938

function                   4310

dict                       2790

wrapper_descriptor         1181

builtin_function_or_method934

weakref                    764

list                       634

method_descriptor          507

getset_descriptor          451

type                       439

Which objects have been added or deleted?

We can also see whichobjects have been added or deleted between two points in time:

(pdb) import objgraph

(pdb)objgraph.show_growth()

.

.

.

(pdb)objgraph.show_growth()   # this onlyshows objects that has been added or deleted since last show_growth() call

 

traceback                4        +2

KeyboardInterrupt        1       +1

frame                   24        +1

list                   667        +1

tuple                16969       +1

What is referencing this leaky object?

Continuing down thisroute, we can also see where references to any given object is being held.Let’s take as an example the simple program below:

x=[1]

y=[x,[x],{"a":x}]

importpdb;pdb.set_trace()

To see what is holdinga reference to the variable x, run the objgraph.show_backref()function:

(pdb) import objgraph

(pdb)objgraph.show_backref([x], filename="/tmp/backrefs.png")

The output of thatcommand should be a PNG image stored at /tmp/backrefs.png and it should look something like this:


The box at the bottomwith red lettering is our object of interest. We can see that it’s referencedby the symbol x once and by the list y three times. If x is the object causing a memory leak, we canuse this method to see why it’s not automatically being deallocated by trackingdown all of its references.

So to review, objgraph allows us to:

·        show the top N objectsoccupying our python program’s memory

·        show what objects havebeen deleted or added over a period of time

·        show all references toa given object in our script

Effort vs precision

In this post, I’veshown you how to use several tools to analyze a python program’s performance.Armed with these tools and techniques you should have all the informationrequired to track down most memory leaks as well as identify speed bottlenecksin a Python program.

As with many othertopics, running a performance analysis means balancing the tradeoffs betweeneffort and precision. When in doubt, implement the simplest solution that willsuit your current needs.

Refrences

·        stack overflow - time explained

·        line_profiler

·        memory_profiler

·        objgraph

 


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值