Problem
I've been working on speed of an application written in python. While doing strace I found out that the most of syscalls were futex(). This futexes came from python's internal synchronization code. But the application never used any kind of threading!
Most of the users never actually realize that python is doing synchronization between threads even when threads aren't used. It's not so bad, those futex() calls are very quick about 7usecs one. But they are executed thousands times!
Sample "strace -c":
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
67.62 0.000570 0 15192 futex
10.91 0.000092 0 896 sendto
7.00 0.000059 7 8 stat
5.34 0.000045 0 1936 poll
3.32 0.000028 0 1076 recvfrom
2.73 0.000023 1 28 open
2.02 0.000017 3 6 write
WorkaroundTo get rid of
futex() calls one can compile python without support for threading (
./configure --without-threads). The fall of this method is that some modules depend on threading, like postgresql wrapper
psycopg.
There is also other method. It seems that python supports
GNU pth library. This library emulates threads in userspace, using
pthread compatible interface. The only non-trivial thing is to get rid of
pthread and use
pth while python compilation (even though
pth should be supported by python out of the box).
Enabling
pth is not easy, but doable. Here's the compilation procedure thanks to my friend Primitive:
$ sudo apt-get install libpth-dev
$ wget http://python.org/ftp/python/2.5.1/Python-2.5.1.tar.bz2
$ wget http://ai.pjwstk.edu.pl/~majek/dump/pyconfig.h.in-pth-patch.diff
$ tar xjf Python-2.5.1.tar.bz2
$ cd Python-2.5.1
Python-2.5.1$ cat ../pyconfig.h.in-pth-patch.diff|patch -p0
patching file pyconfig.h.in
Python-2.5.1$ ./configure --with-pth --prefix=/usr
Python-2.5.1$ find . -name Makefile -exec sed -i "s/-lpthread/-lpth/" {} \;
Python-2.5.1$ sed -i "s/-pthread//" Makefile
Python-2.5.1$ sed -i 's/$(srcdir)\/Python\/thread_pthread.h//' Makefile
Python-2.5.1$ make
BenchmarksIn corner case the time gained is about 25%-30%. The test program is not complicated at all. We know that importing modules needs synchronization. So let's do only imports:
#!/usr/bin/python
for i in xrange(1000000):
import os
Average results for standard python 2.5.1, using
pthread:
real 0m2.061s
user 0m1.728s
sys 0m0.332s
Average results for patched python 2.5.1 using
pth:
real 0m1.572s
user 0m1.560s
sys 0m0.008s
What's with those futexes?Well, as I suggested futexes are gone when using pth. Here are strace results for python with
pthreads:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.99 1.014676 1 1000020 futex
0.01 0.000067 0 142 read
0.01 0.000057 0 4504 _llseek
And for fixed python with
pth:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000050 0 4504 _llseek
0.00 0.000000 0 142 read