2008-02-27

NSE loop bug

Nmap Scripting Engine is an engine for running Lua scripts inside Nmap. It's based on Nmap asynchronous library NSock.

During work on NSE script I found that sometimes Lua threads were frozen for too long. For example socket_object:receive() should take few milliseconds, but it took more than a hundred. In normal usage I wouldn't even notice that events are delayed, but this time I needed exact timings.

Like every asynchronous library NSock has special event loop. The main loop in NSE looks similar:

int process_mainloop(){
...
while(unfinished_lua_threads > 0){
...
nsock_loop(50ms); // block for at most 50ms, or till the first event
...
if(running_scripts.begin() == running_scripts.end())
continue;

current = *(running_scripts.begin());
// execute thread. It should be yielded back to waiting_scripts or end.
lua_resume(current);
...
}
}
It's not obvious to spot the bug. The problem is that we handle only one Lua thread for one loop iteration. But it's possible that many events have occurred during nsock_loop.

This kind of bugs is quite common. For example PGBouncer recently had(still has?) similar bug while accepting connections. It allowed only one new connection per one event loop iteration.

The fixed loop will look like this:
int process_mainloop(){
...
while(unfinished_lua_threads > 0){
...
nsock_loop(50ms); // block for at most 50ms, or till the first event
...
while(running_scripts.begin() != running_scripts.end()){
...
current = *(running_scripts.begin());
// execute thread. It should be yielded back to waiting_scripts or end.
lua_resume(current);
...
}
}
}


UPDATE #1 Fix is commited. It is in nmap svn version newer than r6857.


No comments: