2008-03-28

How many web browsers do you have on your computer?


Simple question: how many browsers do I have?

I use Opera for reading reddit and browsing, Webkit for gmail and google docs (it's really fast), Firefox for web developing (FireBug rules).
I use virtual machines, so it's not so easy to count :)
  • main operating system: opera, firefox, safari, webkit
  • secondary os: opera, firefox, safari(comes with itunes), webkit, internet explorer 7
  • 1st virtual machine: konqueror, opera, firefox (i'm not counting links, lynx, elinks...)
  • 2nd virtual machine: opera, firefox, internet explorer 6

... it's only 15 browsers on one computer. Of course most of them aren't used very often.

How many browsers do you have?


2008-03-27

Why I don't buy on ebay.

>> Hi, I want to ask if you provide deliver to Poland (Europe)?

Only after I am certain that good funds have
completely cleared my account,
cannot be reversed,
you agree to accept all shipping charges,
you understand you accept the item "as is" and
have no right to return it.

Reason for this harsh approach to international shipping?
Too many scams coming from your location!!!


2008-03-26

Useful C extensions (gcc specific)

You think that knowing ANSI C is enough? Well, okay, it's enough. But I found that some gcc extensions are very useful.

All gcc C extensions are listed in the manual.

Here are the extensions I use:

Labels as values
I remember the old assembler days, when you could jmp to to an address from an array. Nowadays in C, you can create an array of pointers to functions and call pointer from array. But jmp is faster and sometimes you don't want to create many small, almost useless functions.

In gcc you can use a trick that's called labels as values. It's like that:

int main(){
int value = 2;
const void *labels[] = {&&val_0, &&val_1, &&val_2};
goto *labels[value];
val_0:
printf("The value is 0\n");
goto end;
val_1:
printf("The value is 1\n");
goto end;
val_2:
printf("The value is 2\n");
goto end;
end:
return(0);
}

Macros as functions
Here is a standard example of a macro with statements that returns a value. It's also a great example of typeof operator in gcc.
#define MAX(a,b)                    \
({ \
typeof (a) _a = (a); \
typeof (b) _b = (b); \
_a > _b ? _a : _b; })

Case ranges
Gcc allows to use ranges in case statement. In ANSI C you'd have to enumerate all the values. This code is much prettier:
int main(){
int value = 7;
switch(value){
case 0 ... 5:
printf("The value is 0 - 5\n");
break;
case 6 ... 10:
printf("The value is 6 - 10\n");
break;
default:
printf("The value is ?\n");
}
return(0);
}

Likely
Have you ever wondered how does gcc convert if statement to assembler, how many branches are created by the compiler? For example here, how many jmp instructions processor must do in the foo() function?
int foo(int i){
if(i == 0)
return(1);
return(2);
}

int main(){
foo(0);
}
In this case you know more than a compiler. You know that all callings to the foo() are going to be with "0" as the parameter. You can suggest compiler to optimize this by using __builtin_expect and possibly reduce number of jumps.
int foo(int i){
if(__builtin_expect(i == 0, 1))
return(1);
return(2);
}
There are often defined macros:
#define likely(x)    __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

Vector extensions
This very powerful extension allows to use mmx/sse code in a portable way. Gcc would choose the best possible extensions during compile time. When you compile code without mmx/sse options, the binary will remain compatible with i386. The downside is that this extension doesn't allow to use all of the features of mmx/sse.

From my point of view the most important feature is to use the bitwise commands. For example this would do binary or on 128 bits:
typedef union{
int v __attribute__ ((vector_size (16)));
int i[4];
} __vec_int;


int main(){
__vec_int a, b, c;
a.i[0] = 1; a.i[1] = 1; a.i[2] = 1; a.i[3] = 1;
b.i[0] = 1; b.i[1] = 2; b.i[2] = 3; b.i[3] = 4;

/* do OR here on 128 bits */
c.v = b.v | a.v;
...
}
Sometimes there is a need of using more advanced mmx commands than vector extensions allow. For example you may want to do four compares at a time. Unfortunately this code is specific for sse2 (-msse2).
int main(){
__vec_int a,b;
a.v = set_vec_int(1,2,3,4);
b.v = set_vec_int(4,3,2,1);

/* compare greater than on __m128i */
__vec_int c;
c.v = _mm_cmpgt_epi32(a.v, b.v); /* 0xFFFFFFFF if gt, 0 else*/

printf("a=[%i,%i,%i,%i]\n", a.s[0], a.s[1], a.s[2], a.s[3]);
printf("b=[%i,%i,%i,%i]\n", b.s[0], b.s[1], b.s[2], b.s[3]);
printf("c=[%u,%u,%u,%u]\n", c.s[0] ? 1 : 0, c.s[1] ? 1 : 0, c.s[2] ? 1 : 0, c.s[3] ? 1 : 0);

return(0);
}
Command used here _mm_cmpgt_epi32 is ported from Intel compiler, the best documentation for this dialect I found on Microsoft sites. You may want to use gcc equivalent. By the way, don't forget that the sse2 registers (__m128i) must be aligned to 16 bytes (-mpreferred-stack-boundary=4).


Cache stuff
Sometimes it's possible to optimize code by prefetching data. It's quite simple to prefetch data to one line of cache:
__builtin_prefetch(&data[offset], 1); /* 0 - read only, 1 - rw */
On the other hand you may want to store the data without polluting the caches. If you only want to write data (and it's not currently loaded to cache) it could be faster than standard storing. I use this defs (oh. it's mmx specific):
#ifndef MMX
#define STORE(ptr, val) \
do{ \
*(ptr) = (val); \
}while(0)
#else
#define STORE(ptr, val) \
do{ \
_mm_stream_si32( (ptr), (val) ); \
}while(0)

#endif

More?
There are more interesting extensions. Like advanced assembler operands or macros with variable number of arguments.


2008-03-25

Emulating exit(3) and moving stack on the fly.

Recently, during coding embedded stuff for industrial robots I had very interesting problem.

I needed to implement exit(3) function. But the system on the robot is very simple. There aren't any kinds of syscalls.

So, the functions exit(3) must emulate the only proper way of exiting the program. It should emulate returning/exiting from main() function.

My friend, Michal Luczaj, found clever way of doing this. The basic idea is to do:

void (*exit)();

void user_main(){
int whatever;
/* bla, bla , bla */
exit();
}

int main(){
exit = &&final_quit;

/* execute the application */
user_main();

final_quit:
return(0);
}

Well, this won't work, because the stack after calling exit() in user_main() is not the stack that should be in main().

That's also my other problem. My robots have very limited stack, so I want to move the stack to other memory area.

The final code emulates exit() and moves the stack. It works like this:
int main(){
start();
}

int start()
{
SAVE_STACK_POINTER();
myexit = &&final_quit;

new_sp = malloc(128*1024);
SET_STACK_POINTER(new_sp);

user_main();
/* stack broken here - don't do anything please */
final_quit:
RESTORE_STACK_POINTER();
return(0);
}

void user_main(){
myexit();
}


2008-03-21

Web applications? Think again.

Some time ago I proved that writing comet applications can be very easy.

Now I think, I can put almost anything online. I created chat, I have working proof of concept of ytalk-like web application. I'm thinking about putting Python interpreter online. These are rather standard web applications, someone did it before. What's the next step of "evolution" after comet?

Ideas are free, so I can speak about it, right?

The simplest way of getting new ideas for web startups is to select tool you're working with, and put it online (for example Gmail, Writely, Del.icio.us).

But wait, using comet I can do more than that. I can turn browser to a frontend to any application. Maybe create universal layer for visualization applications. They tried to create it: GWT, XUL. But we don't have to create new visualization layer, we can use existing one and put in on the web.

I believe it's possible to create X11 server online. You will run applications on the remote machine, connect them to X11 tcp port on the server, and view it with the browser. Imagine X11 desktop in your browser. Tabs could be virtual desktops, in one tab you could have multiple application windows. You could even stream bitmaps by emulating overlay mode using flash streaming.

That would be fun.

Advantages of creating web based X server:

  • compatible with standard motif/kde/x11 applications
  • you have desktop application? I can easily put it on the web
  • that would be "the ultimate web application" :P
  • if it works, this could be a new "standard" of writing web apps
Disadvantages:
  • forget about Beryl/Aero (on the other hand we have OpenGL in flash)
  • not so easy to implement, we need to create full implementation of X11 server
  • could use quite a lot of resources (cpu+ram), what about scalability?
  • cross browser compatibility is hard
  • 99% of users on the web doesn't know what X11 is - not compatible with Windows
  • latency is an issue, forget about "online gaming" this way


What's the next step of "evolution" after this?


2008-03-11

Bidirectional search example



Dorian asked for implementation of bidirectional search algorithm. So I created very simple implementation of bidirectional search, not optimized at all. But I hope this is enough to understand the algorithm.

>>> from bidirectionalsearch import *
>>> bidirectional_search(graph,1,3)
[1, 2, 3]
>>> bidirectional_search(graph,1,8)
[1, 2, 3, 4, 7, 8]
>>> bidirectional_search(graph,1,9)
[]
>>> bidirectional_search(graph,1,11)
[]


2008-03-08

Why you should care about synchronous blocking I/O syscalls?

In previous post I noticed that open(2) and stat(2) aren't asynchronous. But why you should care?

Try to view a html file from CD using your favorite browser. I don't know how about you, but I switch to other browser tab while a site is loaded. But wait, when the drive starts to spin the CD the browser is blocked?

That's because the browser did an open(2) on a file from CD and system have to load metadata from it, which is done synchronously.

And your OS can't do it without blocking the whole browser...



Caching at grono

I posted article (in polish) about caching at Grono, with description of the caching decorator called memoize.

It's a very simple decorator, but it's working perfectly. Without it the site wouldn't be working :)



How to deliver an event to multiple processes? - scaling django-evserver

The hardest problem I had to solve during my work on django-evserver, was how to efficiently propagate an event to multiple listener processes.

Python, because of GIL, doesn't scale across multiple processors. One process can only use one processor.

In django-evserver we have multiple python processes and some of them could wait for some specific event. The question is how to deliver an event only to the processes that wait for it.

Signals?
The simplest thing we could do when event occurs, is to send Unix signal (for example SIGUSR1) to every process. But that doesn't scale at all, not every process must be waiting for this event. We could use some kind of central server to manage events and send it only to proper listeners. But this needs quite complicated statefull server.

Multicast?
Next idea is to bind to specific udp port and listen for multicast packets. If a process waits for event 1, it will bind to port 10001. Theoretically multiple threads could listen on one multicast udp port. The problem with tcp/udp sockets is that we're limited to 64k unique event types for one machine. Which is not too much.

Can it be done easier?
Well. Reading from named fifo file (mkfifo) is definetelly blocking. The point is that multiple processes can read from one fifo. When some process will write to this fifo, every listener would be noticed. One reader will get the written message, all others will get information that they can't listen any more becaouse writer closed pipe. But that is good enough, every waiting process got an event.

Good points of that approach:

  • you could have millions of fifo's
  • as many processes could 'bind' to one fifo as you would like to
  • writer knows if someone is reading from the fifo - you could tell if the event was delivered to at least one reader
  • very easy to implement
  • very simple to understand

Bad points:
  • doesn't scale across multiple machines (but a daemon that would forward events to other machines is quite simple to implement)
  • it's based on specific fifo implementation, probably won't work on windows