Python has accumulated a lot of… character over the years. We’ve got no less then 3 profiling libraries for single threaded execution and a multi-threaded profiler with an incompatible interface (Yappi). Since many applications use more then one thread, this can be a bit annoying.
Yappi works most of the time. Except, sometimes it doesn’t, and randomly causes your application to hang. (I blame signals, personally). The other issue is that Yappi doesn’t have a way of collecting call-stack information. (I don’t necessarily care that memcpy takes all of the time, I want to know who called memcpy). In particular, the lovely gprof2dot can take in pstats dumps and output a very nice profile graph.
To address this for my uses, I glom together cProfile runs from multiple threads. In case it might be useful for other people I wrote a quick gist illustrating how to do it. To make it easy to drop in, I monkey-patch the Thread.run method, but you can use a more maintainable approach if you like (I create a subclass ProfileThread in my applications).

