2007-07-11

Pcap support for Nmap Script Engine

Some time ago I wrote nse-pcap patch for Nmap that adds some libpcap features to Diman's NSE. Today I ported changes to current Nmap. It's definitely time to check if the code is worth time I spent on it!

I think that pcap support one of the most promising features in NSE. It gives "new experience" to advanced programmers. It also introduces easy to use and powerful way of distributing received packets to registered Lua NSE threads. I think it is very nice way of programming pcap applications. Basically it's based on callbacks from pcap, rather than iterating over sequence of pcap results.

Good example of nse-pcap power can give my implemetation of Lcamtuf's p0f SYN+ACK scan.

Try out installation of nse-pcap.

$ svn co --username=guest --password= svn://svn.insecure.org/nmap-exp/soc07/nmap nmap
$ cd nmap
$ wget http://ai.pjwstk.edu.pl/~majek/private/nmap/nse-pcap/nmap-soc07-5184-ncap-try2B2-with-whitespace.diff
$ cat nmap-soc07-5184-ncap-try2B2-with-whitespace.diff|patch -p1
$ ./configure --without-nmapfe && make
And play with the power of p0f.nse script:
$ sudo ./nmap -n -sT -PS80 -p21,22,53,80,443 --script=p0f.nse www.cisco.com

Starting Nmap 4.22SOC1 ( http://insecure.org ) at 2007-07-12 00:04 CEST
Interesting ports on 198.133.219.25:
PORT STATE SERVICE
21/tcp open ftp
|_ p0f signature: UNKNOWN [65500:61:1:64:M1436,N,N,S,N,W0,N,N,T:AT:?:?] (link: IPSec/GRE, up: 1446 hrs, ipid:54702)
22/tcp filtered ssh
53/tcp filtered domain
80/tcp open http
|_ p0f signature: UNKNOWN [8192:238:0:44:M1460:A:?:?] (link: ethernet/modem, up: disabled, fill:4008, ipid:1)
443/tcp open https
|_ p0f signature: UNKNOWN [8192:238:0:44:M1460:A:?:?] (link: ethernet/modem, up: disabled, fill:f8ef, ipid:10)
From this information I can see at least source NAT. And next try:
$ sudo ./nmap -n -sT -p21,22,53,80,443 --script=p0f.nse www.orkut.com

Starting Nmap 4.22SOC1 ( http://insecure.org ) at 2007-07-12 00:04 CEST
Interesting ports on 209.85.141.85:
PORT STATE SERVICE
21/tcp open ftp
|_ p0f signature: UNKNOWN [65500:61:1:64:M1436,N,N,S,N,W0,N,N,T:AT:?:?] (link: IPSec/GRE, up: 7282 hrs, ipid:30587)
22/tcp filtered ssh
53/tcp filtered domain
80/tcp open http
|_ p0f signature: UNKNOWN [8190:235:0:44:M1400:A:?:?] (link: sometimes DSL (2), up: disabled, fill:0f2d, ipid:28096)
443/tcp closed https


I'm waiting for feedback or bug reports!

UPDATE #1:
Okay. I'm so enthusiastic about this project, because I wrote it. But there also some other promising features in nmap, like Swen's Web application detection.

UPDATE #2:
Some usefull links:


2007-07-10

Filesystems in 21st century

At studies I was taught that there are three levels in theoretical filesystems:

  1. File name is tightly bound to the volume where file is stored (eg: on Microsoft Windows the path contains drive name, like c:\windows).
  2. File name that doesn't say anything about place. But specified directory can be mounted as disk (in unixes; you don't know on which disk file lies, but you know that every file in some directory is on one disk).
  3. Where you can't differ physical location by the directory. For example you could have every file in specified directory on different disk. It would be quite nice to store small files on fast SCSI storage, and big ones on cheap ATA drive, wouldn't it? (as far as I know no vendor implements this)
But I wasn't said anything about distributed filesystems. Well, there are some levels of developement.

Let's imagine:
You have everything on one disk. You need more space. You add disks and join them in some kind of raid or LVM. But what if you can't add any more disks? You have to buy new computer and split files somehow on both computers.

The first level is:
  1. You can say on which machine file lies just by looking at it's name
    • The simplest method is to count hash(filename) % number_of_computers. The problem is that if you want to add new computer, you have to recount hashes for _all_ of files. In case some files must be moved to new node.
    • Second method is to count hash for filename, but this time count the reminder of some arbitrary chosen number like 100 (ie: hash(filename) % 100). Each host has ranges of hashes that it serves. For example first host serves all files with hash from 0 to 50. Second from 51 to 99. If you need to add next computer to pool you need to split hash ranges. To do this you need only to recount hashes for all files but at only one node.
  2. At some point recounting hashes is just too expensive. Nowadays copying disks can last tenths hours. Simpler way to maintain namespace is to use database which contains all the metadata.

    The database knows on which node specified file lies. This adds next level of complexity, because to access file you have to first ask the database. But on the other hand once written file is not going to be moved anywhere.
    Adding new disks is costless.

  3. The next problem occurs when single database of metadata is too big. Consider having 60MLN of files. Every file metadata occupies row in the database. Roughly counting 60MLN of rows * let's say 64 bytes per row = 4GB of metadata. It would be nice to save seeks on the disk on database and store data in RAM. And what happens if such huge database goes down?
    Well, I'm not the one that believes in database replication.

    So what's the solution? Spread the metadata across the nodes.

    Google solution is to add next abstraction layer. They use single namespace database in their filesystem, but with hard requirement that the amount of metadata is relatively small. With such requirement it's possible to provide really fast fall back in case that master nameserver fails.

    Huge number of files can be served, but from next abstraction layer. They split the data and metadata into relatively small continous sets and distributed across nodes. The master knows which range of keys is served by which server.

    Let's assume we want to access file 'abc.txt'. First we must ask the master server, on which server this file lies. Master stores key ranges for every 'tablet' server.

    Next, we connect to 'tablet' server and ask for specified file. To resolve file position 'tablet' server uses in memory B-tree.

    Using this design we have to: ask master server, than tablet server. In background there can be some requests to underlaying distributed filesystem, but we have guarantee that all this requests are answered directly from memory. The only disk seek is from the tablet server that serves requested file.

I must admit I like Google solution. The more I think about it, the more I read about it, the more I think that's the way it should be.


2007-07-03

Lsi Logic site sql procedure

Anonone's interested in sql procedure used on Lsi Logic main website?