After reading Ned Batchelder's post on his experience hunting a memory leak (which turned out to be a reference counting error) it occurred to me that even tough I have a script to check memory usage I should also really be checking reference counts with sys.gettotalrefcount(). And indeed, after adding this to my script I found one reference count leak. I still have faith in my script as it was before really since the reference leak in question was not making me loose memory - subtle bugs eh?
But how do you check an extension module for memory leaks? This seems pretty undocumented so here my approach:
First you really need a debug build of python, this helps a lot since you get to use sys.gettotalrefcount() and get more predictable memory behaviour. The most complete way to build this is something like this (the MAXFREELIST stuff adapted from this):
s="_MAXFREELIST=0" ./configure --with-pydebug --without-pymalloc --prefix=/opt/pydebug \ CPPFLAGS="-DPyDict$s -DPyTuple$s -DPyUnicode$s -DPySet$s -DPyCFunction$s -DPyList$s -DPyFrame$s -DPyMethod$s" make make install
Now run the test suite using valgrind, this is troublesome but a very useful thing to do. The valgrind memory checker will help you identify problems pretty quickly. It can be confused about Python however, but you only care about your extension module so you need to filter most of this. Luckily the python distribution ships with a valgrind suppression file in Misc/valgrind-python.supp that you can use, it's not perfect but helps. This is how I invoke valgrind:
$ /opt/pydebug/bin/python setup.py build $ valgrind --tool=memcheck \ --suppression=~/python-trunk/Misc/valgrind-python.supp \ --leak-check=full /opt/pydebug/bin/python -E -tt setup.py test ==8599== Memcheck, a memory error detector ==8599== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. ==8599== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for copyright info ==8599== Command: /opt/pydebug/bin/python -E -tt setup.py test ==8599== ==8599== Conditional jump or move depends on uninitialised value(s) ==8599== at 0x400A66E: _dl_relocate_object (do-rel.h:65) ==8599== by 0x4012492: dl_open_worker (dl-open.c:402) ==8599== by 0x400E155: _dl_catch_error (dl-error.c:178) ==8599== by 0x4011D0D: _dl_open (dl-open.c:616) ==8599== by 0x405AC0E: dlopen_doit (dlopen.c:67) ==8599== by 0x400E155: _dl_catch_error (dl-error.c:178) ==8599== by 0x405B0DB: _dlerror_run (dlerror.c:164) ==8599== by 0x405AB40: dlopen@@GLIBC_2.1 (dlopen.c:88) ==8599== by 0x8132727: _PyImport_GetDynLoadFunc (dynload_shlib.c:130) ==8599== by 0x81199D9: _PyImport_LoadDynamicModule (importdl.c:42) ==8599== by 0x81161FE: load_module (import.c:1828) ==8599== by 0x8117FAF: import_submodule (import.c:2589) ... running test ... FAILED (failures=4, errors=2) ==8599== ==8599== HEAP SUMMARY: ==8599== in use at exit: 1,228,588 bytes in 13,293 blocks ==8599== total heap usage: 280,726 allocs, 267,433 frees, 70,473,201 bytes allocated ==8599== ==8599== LEAK SUMMARY: ==8599== definitely lost: 0 bytes in 0 blocks ==8599== indirectly lost: 0 bytes in 0 blocks ==8599== possibly lost: 1,201,420 bytes in 13,014 blocks ==8599== still reachable: 27,168 bytes in 279 blocks ==8599== suppressed: 0 bytes in 0 blocks ==8599== Rerun with --leak-check=full to see details of leaked memory ==8599== ==8599== For counts of detected and suppressed errors, rerun with: -v ==8599== Use --track-origins=yes to see where uninitialised values come from ==8599== ERROR SUMMARY: 75 errors from 5 contexts (suppressed: 19 from 6)
Note that the output is very verbose, usually I actually start with --leak-check=summary. Firstly notice that valgrind gives a lot of warnings already before your extension module gets loaded, that's python's problems and not yours so skip over that. The stuff output after (and during) the output of the test suite is what interests you. Most importantly look at the definitely lost line if that's not zero the you have a leak. The possibly lost is just python's problem (which sadly might hide problems you created too). When you do have lost blocks valgrind will give you a stack trace to pinpoint it, but you'll have to swim trough lots of "possibly lost" stack traces of python to find it. Best is probably to grep for your source files in the output.
Next you should create function you want to execute in a loop, this should be exercising the code you want to tests for leaks. If you're really thourough possibly the entire test suite wrapped up in a function call would be good.
Wrap it all up in a script that checks the memory usage and reference counts on each loop and compares the start and end values. Getting memory usage might be tricky from python (or you can use PSI of course) so depending on your situation you might prefer to do this with an script from your operating system.
For PSI this is the script I currently use. I clearly have it easy since I can be sure PSI will be available :-). The reason I don't automate this script further (you could turn it into a unittest) is that I prefer to manually look at the output. Both memory and reference counting are funny and will most likely grow a little bit anyway. By looking at the output I can easily spot if it keeps growing or stabilises, there is only a problem if it keeps growing with every iteration (don't be afraid to run with many many iterations from time to time). When automating this you probably end up allowing some margin and might miss small leaks.
Hopefully some of this was useful for someone.