Majek's technical blog: September 2007

2007-09-19

Die Google! - future search engines

While talking with a friend, I realized that Google isn't the last word in the web search engines.

From technical point of view google search engine isn't magic, the topic of indexing data is nothing revolutionary.

The only interesting thing about Google success is their method of promoting good results and punishing bad in search results. In Google nomenclature it's called the (mistified) Pagerank. They keep the exact algorithm in secret, but in the beginning it was some kind of counting links to specified website.

The development of Sergey and Larry's algorithm was tied by their input data. The only thing they had was the source of web page and links between sites. Nothing more. (maybe the domain name, link url and hosting center, but
that doesn't say anything about the content)

Currently, in the time of user generated content, we have a bit more information.

We have direct link between the content and the author. In the Web 2.0 epoch we exactly know who/where/when wrote specified text.

If you search for some information and you found someone's information useful, maybe you would like to seek for more posts of this person. Maybe you would like to see what threads does this person read. Or if you read some forum, it could be interesting for you that most of the people interested in this forum like some different forum. (This is already implemented in online shops. You can see that people who bought this item also bought something else.) Or if your search for information maybe you would like to see content created by your friends first?

I want to emphasize the fact, that exact information about the author of the content is something that Google will never be able to get.

Maybe the idea is to change the meaning of search. Maybe nowadays the thing of 'search' is to link the user not with the content, but the user with the author?

I'm already using this.
When I find something interesting, I want to see other things from the author. I want to know what he is interested in, I search for his profile in LinkedIn, I search for his interesting links in del.icio.us.

How about you?

2007-09-13

Graphical representation of connections between users

Recently, I've been looking for a way to graphically show connections between users in social network. The basic idea is to show it in 3d, in the simplest possible manner.

I tried to code this, but unfortunately I didn't knew any good-enough algorithm for counting the scene.

Other social networks tried to visualize connections. For example this graph was created at 2003 Club Nexus at Stanford University.

Few days ago I found very interesting application, RTgraph3d is a tool from Philippe Biondi, the author of very successful applications like Scapy.

What I found the most surprising, is that the results sometimes show something interesting. For example this image is showing someone's friends, with connections between them. From this image you can easily recognize three groups of friends.

This is the example of the most common view, just a mess of connections between users.

This is yet another try. It shows two levels of links, but without connections between users at the same level.

RTGraph3D seems to be a good tool for such graphs. I have a tool, I have data, but I don't have an idea what other interesting things could be shown.

2007-09-08

Magic popcount (popcnt) command

From Frank de Groot blog:
Every serious hacker sooner or later needs the popcount instruction.
This "population count" instruction counts the set bits in a register, and is so useful that the NSA demands that all computers they purchase implement it in hardware.

But this command is not present at x86 architecture. AMD64 documentation (p. 188) supports popcount extension, but I never heard of any processor that implements this feature.

You may wonder why would one need popcount instruction. I needed it for a very simple task. I wanted to change data structure from bitfield to list of bit positions. Basically I needed something like this:

0x1001 -> [0, 12]
0xF000 -> [12,13,14,15]

The simplest solution is to iterate over every bit and check if it's set. More complicated algorithm could count number of set bits and index possible results using this number. I wonder if that approach is faster. It's not so simple to say, because low level performance issues could have great role in this problem.

I wonder what is the fastest possible method for this problem. I bet that popcount instruction is the key to efficient implementation.

I found some interesting popcount implementations in software.
GCC approach, sum bits for every byte:

const UQItype __popcount_tab[] =
{
    0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
    1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
    1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
    2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
    1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
    2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
    2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
    3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8,
};

int __popcountSI2 (UWtype x)
{
  UWtype i, ret = 0;
  for (i = 0; i < W_TYPE_SIZE; i += 8)
    ret += __popcount_tab[(x >> i) & 0xff];
 return ret;
}

Method based on subtraction, from GnuLib:


int popcount(unsigned int x){
    int pop;
    for (pop = 0; x != 0; pop++)
        x &= x - 1;
    return pop;
}

AMD64 Optimization Guide (p. 137): (they have also mmx version)

unsigned int popcount(unsigned int v) 
{ 
  unsigned int retVal; 
  __asm { 
  MOV EAX, [v]         ;v 
  MOV EDX, EAX         ;v 
  SHR EAX, 1           ;v >> 1 
  AND EAX, 055555555h  ;(v >> 1) & 0x55555555 
  SUB EDX, EAX         ;w = v - ((v >> 1) & 0x55555555) 
  MOV EAX, EDX         ;w 
  SHR EDX, 2           ;w >> 2 
  AND EAX, 033333333h  ;w & 0x33333333 
  AND EDX, 033333333h  ;(w >> 2) & 0x33333333 
  ADD EAX, EDX         ;x = (w & 0x33333333) + ((w >> 2) & 0x33333333) 
  MOV EDX, EAX         ;x 
  SHR EAX, 4           ;x >> 4 
  ADD EAX, EDX         ;x + (x >> 4) 
  AND EAX, 00F0F0F0Fh  ;y = (x + (x >> 4) & 0x0F0F0F0F) 
  IMUL EAX, 001010101h ;y * 0x01010101 
  SHR EAX, 24          ;population count = (y * 0x01010101) >> 24 
  MOV retVal, EAX      ;store result 
  } 
  return (retVal); 
}

There is also interesting topic aboutpopcount at programming.reddit.com. Yet another source, John-Mark Gurney tips about popcount implementaion.

2007-09-07

Python memory hack

Django and most of python frameworks eat a lot of RAM. To serve multiple pages in the same time there must be many running python processes (this is not like light www daemons which serve all requests from only one process, like lighttpd or nginx).

So every python process eats RAM. During it's live it imports many libraries and files. This imports are never freed by garbage collector. This imports are almost identical for every process, but the problem is that they are not shared between processes. If you import some library in django, it's going to be imported separately at every django process.

To reduce memory usage it's possible to import all used libraries actually before forking the python processes. Linux fork() uses copy-on-write technique, so we can think that preloaded libraries are going to be shared between python processes.

Unfortunately in python it doesn't work. That's because immutable and mutable data are stored in the same memory segment! For example header structure for immutable string is being often changed, because it contains reference counter.

I created quick memory hack for cpython. Python imports are mostly code objects, which contain code in string objects. I hacked string object, so that it allocates memory for immutable data in separate memory segment than mutable part. My hack allocates big memory segment in which immutable string data is stored. Except for storing new strings, that memory is not going to be changed during execution. I hope, that after fork(2), this memory segment will be successfully shared between threads. The bad side is that this memory is never freed.

Example usage:

x.py:
'a'.setmajekhack()   # enable hack
# do some imports here
import sys,os,time
'a'.setmajekhack()   # disable hack

Simple output, this time only 17KB of strings data were put to immutable segment:

MAJEKHACK enabled 0x2b6bc0c57000
MAJEKHACK disabled, allocated 17131 bytes

Unfortunately this hack doesn't give as good results as I thought. Although I think that python memory allocator could be highly optimized for similar cases and save our precious memory.

2007-09-04

Crisis with asynchronous programming

I'm currently designing an application. It's going to be quite simple linux network daemon which will serve some data from disk.

Looks simple. I can easily predict that the bottleneck can be either disk IO or network bandwidth. Nowadays it's obvious that high performance network daemon must be done 100% asynchronous.

The problem is that there isn't yet any technology that allows full asynchronous programming. Okay, there is libevent or liboop for C, twisted for python. But I mean full asynchronous, with fully async stat(2) and open(2) syscalls.

There are projects that try to handle this issues. For example lighttpd have specialized thread pool just for managing stat(2). I don't want to do it this way. I want simple to use, fully async, portable library that will allow me to code in the 100% right way.

And please don't tell me that there isn't a need for aio_open or aio_stat (example). Disks aren't faster than five years ago, so access to metadata on disk is becoming relatively slower. But Linus doesn't think the problem exists.

I can't believe that in 21st century there isn't any simple solution to such a trivial problem.

Majek's technical blog