Presentation: Nmap and asynchronous programming

In this presentation I would like to talk about my adventure with extending Nmap and what I learned from it regarding asynchronous programming.

Nmap is a port scanner with many other features.

On this slide you can see an example of Nmap output.

It shows open tcp ports on some target machine. It also shows service type (if it is ssh or http or something else) and version of a daemon running behind discovered port.

Nmap can execute user defined scripts for various services. Above we can see an example result of a script that retrieves HTML title from the HTTP server.

Nmap also tries to guess the operating system on the target machine. In the example output Nmap OS detection wasn't very successful, the OS guesses aren't accurate.

Nmap isn't just a basic port scanner, it's rather fully-featured network discovery tool. It sends a lot of data to the target hosts and opens a lot of tcp/ip or udp connections to the target. To do this efficiently Nmap must be written in a specific way - in asynchronous manner, using nonblocking sockets where possible.

Some parts of Nmap, like OS detection engine, are extremely sensitive for timings.

Nmap sends a lot of custom built packets. The system API for sending custom packets is called raw sockets. Unfortunately this sockets can't be used to listen for raw packets on the wire. To read packets Nmap uses libpcap (packet capture) library. Libpcap is a part of tcpdump (graphical interface is known as Wireshark).

When a program is opening pcap descriptor to sniff the network, it passes a BPF filter to receive only the packets that match the filter. The BPF filter is something like regular expression for packets. The filter is compiled and matched against all the packets that come through the wire. This works extremely fast for a single pcap socket, but it's not really created to handle thousands of open pcap descriptors. Every packet that will come to your machine will be matched against every open pcap descriptor, so opening too many pcap sockets may slow down the whole system.

Before going on, please make sure you know what a three-way handshake on tcp/ip connections is.

Every Nmap scan is created from various stages.

First Nmap must know if the target hosts are really online. This is especially useful when you're scanning a wide range of ip addresses. We really don't want to waste time waiting for black-hole ip's. The detection if a host is up can be done using simple icmp ping packet, or using more complicated techniques.

After we know which hosts are up, Nmap does the real scanning. It sends a lot o SYN packets to discover which network ports are open or closed.

After we know which ports are open, we can start service and version detection phase. Nmap opens a lot of tcp/ip connections to detect what services and exactly which daemons are running there.

After this, Nmap can execute operating system detection, and try to guess which OS is running on a target host.

And finally Nmap can execute Nmap Scripting Engine, to grab even more detailed information about running services on open ports. In the first slide we've seen the results of a script that printed a title from html site grabbed from open http server.

Most of people don't understand how Nmap's basic -sS SYN scanning works. This is really a port scanning kindergarden, so if you know how it works, you can skip this slide.

After the host discovery phase we know that a target host is online. Then Nmap tries to determine which tcp ports are open. To do this, Nmap sends a lot of SYN packets to target host via raw socket bypassing Operating System's networking stack.

If the remote port is open, scanning host receives SYN+ACK packet. But the tcp/ip connection wasn't actually established from scanning host OS – Nmap sent the SYN packet, not OS. OS doesn't recognize the SYN+ACK response packet, so it sends RST packet which tells a target to drop a connection.

Why to do scanning this way and not just open a lot of normal operating system tcp/ip connections? Normal tcp/ip connection would need to be closed gently (FIN flag), using even more packets. This SYN scan is faster, with only 3 packets needed to know if a port is opened or closed.

I prefer this diagram over the previous slide. I would like to repeat that nmap injects first SYN packet on the wire. Than nmap sniffs for SYN+ACK packet on the wire.

Of course Nmap can scan many ports at the same time.

We can say that Nmap first sends a lot of SYN packets and then it waits for the SYN+ACK or RST packets. If packet is received from the wire, than Nmap does something with it. For example updates data structures with received information. Then Nmap goes back to the main event loop and waits for yet another packet from the wire.

The point is that every Nmap stage is waiting for something different.

Host discovery phase can wait for icmp ping response packets or many other types.

Port scanning, as I mentioned in previous slides, is waiting for SYN+ACK or RST packets.

Service detection on the other hand is using a lot of normal tcp/ip or udp connections.

OS detection engine is waiting for a lot of magic packets.

Scripting Engine is also waiting for normal tcp/ip or udp connections.

Let's now talk about Scripting Engine in details.

Nmap Scripting Engine (NSE) is a new feature in Nmap. NSE was created because Nmap Service and Version detection wasn't accurate enough. Service detection is really very basic technology. It just grabs a banner from an open port and matches it against thousands of regular expressions.

This is enough for detecting most daemons, but it's not working for more complicated things. For example Skype client opens tcp port. This port returns valid answers for http queries, so it was detected as a http server. But Skype is not an http server. We weren't able to distinguish that before NSE.

The technology behind scripts is not complicated. A script is written sequentially in LUA – convenient simple higher level scripting language.

After normal scanning Nmap discovers which scripts should be executed and runs them against open ports. A script usually is opening a network socket and it writes/receives data from it. If a script is waiting for something on a network socket, it's froze and suspended. When we receive something from a socket, we continue the script until it's suspended again (or exits).

This is called cooperative multitasking and it allows us to write scripts sequentially and execute a lot of them concurrently. Because all the scripts are mostly waiting for some data on a socket.

The scripting engine is based on Nmap asynchronous library NSOCK. This library is capable of waiting for many network descriptors at the same time.

Like every asynchronous library NSock has a main loop in which it waits for descriptors. When something happens on a descriptor Nsock executes a specific callback for the descriptor.

Connecting this with LUA engine is pretty straightforward. When something happens on a descriptor, we just continue the LUA script that is waiting for data on this descriptor.

I thought that NSE is great, but writing scripts that can only wait for normal tcp ip/udp connections is boring. I thought that it would be great to do something on lower level, like injecting custom packets into the tcp/ip connections, or listening to raw packets from the target host.

That's why I created a raw sockets/libpcap support for NSE. It turned out to be rather complicated.

Adding pcap support for Nsock was only part of success. The idea of the changes is rather clear, in one event loop we need to wait for both network descriptors and in the same moment for pcap descriptors. Internally it's rather complicated because of libpcap limitations.

As the result when a packet is received from libpcap descriptor, a callback is called. Just like with normal network sockets.

Connecting this to the Lua engine wasn't easy. The main problem is that, as I mentioned before, pcap descriptors are rather heavy. So one pcap descriptor is opened from a script and shared between all the instances of the same script. This means that when a packet is received from a network we don't actually know which Lua thread should be woken.

I needed to create a custom dispatcher of packets, to know which thread to wake up.

Finally I was able to open pcap descriptors and send raw packets from Lua scripts. I think that the possibilities of such scripts are huge. Here are some examples of my scripts.

The first scripts is able to detect which hosts are using tcpdump in the local network. The script tries to determine if a host has a network card in promiscuous mode. I'd be happy to explain how it works, but that's not in the scope of this presentation. We can say that the script sends some custom arp-request packets and waits for arp-reply answers.

The next script is an implementation of Lcamtuf Otrace tool. It's a traceroute-like tool that can go through firewalls. It injects custom packets into valid tcp/ip connection with small TTL values, so that it can record the ip addresses of the routers on the way.

Next script is an implementation of "ping -R" record route option. The record route option is a flag on ip packet that tells the routers on the way to record their ip addresses. The results are more accurate than normal traceroute but there are only 9 slots for ip addresses on an ip packet.

My favorite script is an implementation of another Lcamtuf tool called p0f. P0f is a passive fingerprinting project. Basically p0f creates an operating system fingerprint from a single SYN packet. The fingerprint can tell which os the target host is running without sending any custom packets. It's stealth and passive.

I wanted to implement this tool, but Nmap is an active scanner. The idea was to open normal tcp/ip connection to the target and create a fingerprint based on SYN+ACK response. The results aren't very granular, but they can show many interesting things. In this slide you can see the results from some host. It seems obvious that port 22 is forwarded to another machine than ports 80 and 443.

My latest work is an implementation of an active OS fingerprinting.

I think that nowadays, in the time of limited ip space, more and more ip addresses use port forwarding. It's rather common to see that one open port is forwarded from one physical machine and other ports are from different host.

I thought that it would be great to do an active OS detection for every port, rather than only for host. Normal Nmap OS detection can be only executed per host.

I implemented a bit limited Nmap OS detection algorithm in LUA, so that you can try to detect operating system for every open tcp/ip port.

The results aren't perfect yet. This is because of a timing issues inside NSE engine. But they are more granular than basic p0f results. On this slide you can see the results from the same host as on previous slide. Now it's obvious that port 22 is from forwarded from Linux box, and port 80 and 443 are from Windows machine.

Let's sum things up.

I created a support for raw sockets/libpcap for NSE. This was intended as an additional feature, but it was possible to implement one of the most complicated parts of Nmap using this feature - an OS detection.

One could ask if it would be possible to implement all Nmap in Lua?

Well. First let's talk about some other issues.

Nmap has different event loop for every scanning stage. This is because of historical reasons - programmers didn't know how to wait for both network sockets and many pcap descriptors in one event loop. Libpcap API is rather complicated, and it's not easy to mix it with normal sockets.

The problem I described can be generalized. Not only Nmap faces this issue.

The point is that asynchronous programming fails if anything else than event loop is blocking. In Nmap the problem is with specific libpcap limitations, but other projects also face some blocking elements.

Web servers have problem with blocking stat(2) syscall. They use this call to check if a file was modified on the disk and needs to be reloaded. For example lighttpd uses a lot of complicated code to address this issue. Basically lighty opens many threads, one for every physical disk, in which it does blocking stat calls.

Web browsers share similar problem. If you open a file from a slow media, like cdrom, the whole browser stops for a while. This is because open(2) syscall blocks until the metadata is read from the media. I don't like when my browser halts because one tab is slower than the others.

There are also a lot of asynchronous programs where dns queries are done using normal operating system api, which is blocking. In the test environment the software runs perfect, because dns queries are fast, but in production the program fails to scale.

In my opinion asynchronous programming should be easy and fun. Async libraries should hide low level problems from the programmer.

There are a lot of async libraries, but in my opinion there's no complete solution yet.

That’s why my dream is to create yet-another-but-better asynchronous library.

Given such, perfect, async library Nmap could be rewritten in one event loop and possibly in higher level language.

Fyodor, the author of Nmap created a book. It's still in beta, but I hope it will be in shops in few months. I highly recommend it for people interested in low level networking and Nmap.

Thanks! I haven't described how promiscuous node detection works and how my libpcap dispatcher for Lua works. I'd love to write about it some day.

Thanks to Andrzej Teleżyński for proof reading this post.


Presentation: hacking Industrial Robots

I'll talk about my work on industrial robots. The robots belong to my university - PJIIT.

Here you can see our two Motoman SK6 robots. They were bought by the Japanese government from a fund to support developing countries. The robots were produced in Japan, assembled in Sweden, serviced by Germany and located in Poland.

They are created to do welding and painting in the industry. They have no feedback - they will just do everything to move to the destination point. No matter if there's someone's head in their way. So watch out, they really could kill.

This is how our lab looks like today. Notice the big grey box on the left (and a part of another one on the right). It's the robots controller which is responsible for moving the robot. All software for the robots run on these controllers.

On this slide you can see how the robot welds something. The robot is moving to specific point in the space, it welds, then moves to another point and so on. It's doing the same thing all over again. It's similar with painting robots. They were once programmed to paint a thing and they are just repeating the movement for years. On the right you can see an example of a program for a robot. It's called a job. It's basically just a series of points to which robot shall move and some simple commands to enable or disable the welder or brush.

This is an example job on our robot. It's programmed on the controller, no computer is involved in controlling the robot. We use it to show the visitors how the robot moves.

This is the same job as on the previous slide but in full speed. When I'm watching this job I'm always impressed by the robot possibilities. But maybe you need to be in our lab to have this feeling.

This device is used to write jobs for robot. As you can see there are a lot of buttons on this console. Most of the buttons don't do anything useful. On the other hand there are a lot of cryptic shortcuts that are used quite often (like: press star button with X key).

On the right you can see an example job again. The bold part is what you can see on the console. All the other information is magic and it's hidden from the programmer.

The console has the hardest interface I've ever used. The point is that the console was never intended to be easy to use. As I mentioned before - the job is created once and it runs for years.

The problem is I don't need a repeatable job in my lab. I want the robot to do something dynamic. Grab something from the floor, catch a ball, give me a hand and so on. I don't want a robot to do the same, moves all over again. It's useless for me.

So the question is whether an industrial robot in the lab is useful at all. Most of the robots are useless and can't be controlled in dynamic way nor connected to the computer.

Fortunately our robots have an additional Turbo Function board.

The board is programmable and can be connected to the computer using serial link. The board has unusual Intel i960 processor.

The serial link isn't perfect too. It's very slow - 19.2 kbps, like '93 modem, and it's only half duplex. The link could be used to send some data from computer to robot.

The docs say that the board has only the basic interface to move the robot. Like go to the point and stop there. But we can't do any kind of dynamic control.

I've mentioned that the Turbo board is programmable. To write a software for it you needed to:
  • create a program at home, in C
  • using a floppy move the source code to our old Windows box, called Smalltalk (the PC on the left side of the picture)
  • compile the code using commercial compiler (tied to the box because of a hardware key)
  • open FC1 software to move the compiled binary to robot
  • turn on the robot, press a lot of keys
  • wait a few minutes to upload the binary to the robot
  • restart the robot, press a lot of keys to execute the binary
It took a lot of time to execute your software on the robot. This slowed down a process of writing software. Like always, software is written with bugs, so for each, even the smallest, change you had to repeat the process.

I thought that this is so time consuming, that I need to make process faster and move the compiler stack to some newer OS, like Linux.

First thing I did, was to create a clone of the FC1 software, so that you could upload your binaries from a Linux box. The communication between FC1 and robot went through a serial link. I opened up the wire, connected some wires to another computer (on the right side of the picture) and sniffed the communication. Then after some reverse-engineering I created simple replacement of FC1 software for the Linux. This was my first achievement.

The next thing I needed to do, was to move the compiler stack to Linux. I decompiled the commercial libraries on Smalltalk, tested various open libc libraries, found an old GCC version that supported this special processor and wrote my own linker scripts. Finally I was able to compile a software on Linux.

On the left side of the slide you can see a page from the documentation. It's in Japanese and yes, I don't speak Japanese either. Most of the docs are in English, but unfortunately the most interesting pars aren't.

Like always the documentation is not perfect. We were able to compile a software for a Turbo board, but we didn't know what was possible on it.

Using a Turbo board we were able to control the robot only in a basic way, move to the point and stop there.

After a lot of work, searching a lot of sources, we discovered something with a magic name "realtime mode". This mode allowed us to order a robot to do large number of smaller moves, but without stopping between them.

This was the thing we were looking for!

On the right you can see our first experiment. The red line shows a position where we ordered a robot to go, the green line shows where the robot really was. As you can see the reaction time (latency) is not small, but it works!

After this discovery we wanted to try how if it works, so we created a software that allowed us to control the robot using a joystick. It was really fun to play with a joystick, but it isn't the simplest interface. (sorry for destroying the target in the video).

This wasn't end of our low level problems. To reduce the latency we needed a faster serial link. 19.2 kbps was too slow for us.

I downloaded the firmware from the Motoman, disassembled it and discovered that the robot could use faster serial speed. The speed was unusual and using an oscilloscope we were able to detect that it was 46.6 kbps. But this is not a generic serial speed, a normal computer can't use it.

This is how our communication looks like today. The robot emits 46.6kbps, than we convert the signal to industrial rs422 format and finally we use custom-built FTDI usb-to-serial converter.

This converter can be set to arbitrary serial speeds, so it can achieve 46.6kbps. In the end we can use it as a normal serial device on the Linux.

After more than 2 years of work we were able to compile software from Linux. We disassembled a lot of robot internals. I wasn't able to fix the half-duplex problem, but I created a software library that is a workaround for this issue.

The biggest achievement was to discover the realtime mode and move the robot dynamically.

It's time to say what we were able to build on the top of this low level software!

This project is one of the oldest. The idea is that to create a cube that a robot can grab from the floor and do something with it.

For example handle it to the second robot. So that one robot can grab cubes and other can build something from them (for example a tower). When a tower crashes, the roles are swapped.

The cubes are located based on the information from the camera. This slide shows my work on the markers for the cubes. Red outline is computer generated.

This is how the marker recognition worked in practice. The numbers 777 and 778 are coded in the red squares. There's also green computer generated number which shows an angle of the cube.

In this video robot moves to the start position. Then, based on the information from camera, the robot is doing smaller and smaller moves toward the cube and finally grabs it.

This is my favorite project. Based on the information from the camera robot is trying to put recognized face in the centre of the screen. The face recognition algorithm is taken from opencv library.

After the experiment with joystick I wondered if it's possible to create better interface for controlling the robot. I discovered the Haptic device, which is basically a 3d pointing device. It's used by multimedia people to edit 3d graphic scenes. I thought it should be perfect to control the robot.

This video shows how the results look like. Playing with haptic connected to robot was a lot of fun.

Have you seen the Wiimote experiments by Johnny Chung Lee? He created 2d pointing device based on Wiimote infrared camera and an infrared diode.

I thought that it would be great to use two Wiimotes to get 3d position of diode and connect it to the robot. I wanted to achieve similar experience as with the Haptic, but using less expensive hardware.

On the right side you can see the two Wiimotes and two experimental infrared pens. Every pen has two infrared diodes, so that we don't only know the position in 3d, but also the angle of the pen.

It's not so easy to create a 3d information from two 2d views. I needed a lot of time to understand and code the mathematics behind it.

Here you can see how it worked, a cheap and quite accurate 3d pointing device.

Recently my friends, Aleksander Górski and Łukasz Hrynakowski, are working on a project, in which they put a SICK laser on the robot. The laser is basically a single infrared beam and a rotating mirror. Laser returns a distance from walls or other things in a plane. The idea is to move a laser from above to the ground and create a semi 3d view of a target thing, like a person.

Some time ago some Google Street view cars were seen in Europe. It seems that Google is using similar SICK lasers.

This is how the scan looks like. You have to hold still for about 30 seconds.

It took few months of work to move a robot from above to the ground on a straight line with constant speed. Robots aren't really created to move on straight lines.

And here you can see the results of scan. It's technically called 2.5d scan, because it's not full 3d, only 3d from one side.

I find the results very interesting. I can't wait to see final results of this project.

Finally a scan of faces of the two authors of the project and mine.

That's it! I hope you learned something from this speech.

One more slide. This is my dream, but our robots are to weak for this kind of things.


Learning Erlang

I mentioned before that I'd love to learn Erlang. Just before my holidays in mountains the Erlang book has arrived. I had to chose between the holidays and learning Erlang. I chose both.

It was very nice to read it on the meadows at the top of the hills. But the book is rather heavy when you need to carry it uphill.

I think it's one of the best written computer books I've ever read. I've learned a lot, even without writing a single line of code.