GNU pth instead of pthread: hardcore python tuning
Problem
I've been working on speed of an application written in python. While doing strace I found out that the most of syscalls were futex(). This futexes came from python's internal synchronization code. But the application never used any kind of threading!
Most of the users never actually realize that python is doing synchronization between threads even when threads aren't used. It's not so bad, those futex() calls are very quick about 7usecs one. But they are executed thousands times!
Sample "strace -c":
% time seconds usecs/call calls errors syscallWorkaround
------ ----------- ----------- --------- --------- ----------------
67.62 0.000570 0 15192 futex
10.91 0.000092 0 896 sendto
7.00 0.000059 7 8 stat
5.34 0.000045 0 1936 poll
3.32 0.000028 0 1076 recvfrom
2.73 0.000023 1 28 open
2.02 0.000017 3 6 write
To get rid of futex() calls one can compile python without support for threading (./configure --without-threads). The fall of this method is that some modules depend on threading, like postgresql wrapper psycopg.
There is also other method. It seems that python supports GNU pth library. This library emulates threads in userspace, using pthread compatible interface. The only non-trivial thing is to get rid of pthread and use pth while python compilation (even though pth should be supported by python out of the box).
Enabling pth is not easy, but doable. Here's the compilation procedure thanks to my friend Primitive:
$ sudo apt-get install libpth-devBenchmarks
$ wget http://python.org/ftp/python/2.5.1/Python-2.5.1.tar.bz2
$ wget http://ai.pjwstk.edu.pl/~majek/dump/pyconfig.h.in-pth-patch.diff
$ tar xjf Python-2.5.1.tar.bz2
$ cd Python-2.5.1
Python-2.5.1$ cat ../pyconfig.h.in-pth-patch.diff|patch -p0
patching file pyconfig.h.in
Python-2.5.1$ ./configure --with-pth --prefix=/usr
Python-2.5.1$ find . -name Makefile -exec sed -i "s/-lpthread/-lpth/" {} \;
Python-2.5.1$ sed -i "s/-pthread//" Makefile
Python-2.5.1$ sed -i 's/$(srcdir)\/Python\/thread_pthread.h//' Makefile
Python-2.5.1$ make
In corner case the time gained is about 25%-30%. The test program is not complicated at all. We know that importing modules needs synchronization. So let's do only imports:
#!/usr/bin/pythonAverage results for standard python 2.5.1, using pthread:
for i in xrange(1000000):
import os
real 0m2.061sAverage results for patched python 2.5.1 using pth:
user 0m1.728s
sys 0m0.332s
real 0m1.572sWhat's with those futexes?
user 0m1.560s
sys 0m0.008s
Well, as I suggested futexes are gone when using pth. Here are strace results for python with pthreads:
% time seconds usecs/call calls errors syscallAnd for fixed python with pth:
------ ----------- ----------- --------- --------- ----------------
99.99 1.014676 1 1000020 futex
0.01 0.000067 0 142 read
0.01 0.000057 0 4504 _llseek
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000050 0 4504 _llseek
0.00 0.000000 0 142 read
5 comments:
you might try something like this:
LD_PRELOAD=/path/to/pth.so ./single_threaded_python_script.py
Since the api is supposed to be the same, it should be possible to substitute libraries at startup-time
LD_PRELOAD didn't worked ;)
Didn't worked for me either.
So I know this post is old, but I am very intrigued by this idea. Seems like a big win for Python and a very logical choice.
I was browsing the Python source to see if system call mapping is enabled, because if it isn't then any I/O will block the entire process.
It doesn't appear to be enabled. Since you have this up and running, wanna give it a shot? I'm going to try it myself soon, with the hopes of speeding up my FastCGI/WSGI processes :)
> I was browsing the Python source to
> see if system call mapping is
> enabled, because if it isn't then
> any I/O will block the entire
> process.
I only tried to avoid the futex() overhead. I never really thought about wrapping read(). But it's very interesting if this would really work.
I'm not going to try it myself in near future, but I'd love to hear your results.
Post a Comment