Problem
I've been working on speed of an application written in python. While doing strace I found out that the most of syscalls were futex(). This futexes came from python's internal synchronization code. But the application never used any kind of threading!
Most of the users never actually realize that python is doing synchronization between threads even when threads aren't used. It's not so bad, those futex() calls are very quick about 7usecs one. But they are executed thousands times!
Sample "strace -c":
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 67.62    0.000570           0     15192           futex
 10.91    0.000092           0       896           sendto
  7.00    0.000059           7         8           stat
  5.34    0.000045           0      1936           poll
  3.32    0.000028           0      1076           recvfrom
  2.73    0.000023           1        28           open
  2.02    0.000017           3         6           write
WorkaroundTo get rid of 
futex() calls one can compile python without support for threading (
./configure --without-threads). The fall of this method is that some modules depend on threading, like postgresql wrapper 
psycopg.
There is also other method. It seems that python supports 
GNU pth library. This library emulates threads in userspace, using 
pthread compatible interface. The only non-trivial thing is to get rid of 
pthread and use 
pth while python compilation (even though 
pth should be supported by python out of the box).
Enabling 
pth is not easy, but doable. Here's the compilation procedure thanks to my friend Primitive:
$ sudo apt-get install libpth-dev
$ wget http://python.org/ftp/python/2.5.1/Python-2.5.1.tar.bz2
$ wget http://ai.pjwstk.edu.pl/~majek/dump/pyconfig.h.in-pth-patch.diff
$ tar xjf Python-2.5.1.tar.bz2
$ cd Python-2.5.1
Python-2.5.1$ cat ../pyconfig.h.in-pth-patch.diff|patch -p0
patching file pyconfig.h.in
Python-2.5.1$ ./configure --with-pth  --prefix=/usr
Python-2.5.1$ find . -name Makefile -exec sed -i "s/-lpthread/-lpth/" {} \;
Python-2.5.1$ sed -i "s/-pthread//" Makefile
Python-2.5.1$ sed -i 's/$(srcdir)\/Python\/thread_pthread.h//' Makefile
Python-2.5.1$ make
BenchmarksIn corner case the time gained is about 25%-30%. The test program is not complicated at all. We know that importing modules needs synchronization. So let's do only imports:
#!/usr/bin/python
for i in xrange(1000000):
    import os
Average results for standard python 2.5.1, using 
pthread:
real    0m2.061s
user    0m1.728s
sys     0m0.332s
Average results for patched python 2.5.1 using 
pth:
real    0m1.572s
user    0m1.560s
sys     0m0.008s
What's with those futexes?Well, as I suggested futexes are gone when using pth. Here are strace results for python with 
pthreads:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.99    1.014676           1   1000020           futex
  0.01    0.000067           0       142           read
  0.01    0.000057           0      4504           _llseek
And for fixed python with 
pth:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000050           0      4504           _llseek
  0.00    0.000000           0       142           read