2007-08-30

Nmap nse descriptors overflow patch

I just released quick patch for Brandon bug. It's interesting that this flaw wasn't found earlier.

My patch is a bit hackish, but it should work. I wonder if anyone finds better way of fixing this issue. The descriptor overflow patch lies here.

Update 1: Hurray! Patch is included in svn version of nmap (r5739).



2007-08-28

Well, we need Google architecture, but how should we implement such a huge project?

Every fast-growing company will have to answer this issues sooner or later:

  • storage disks are full
  • it's hard to add more disks to existing systems
  • files are served too slow because disks seeks are becoming limitation
  • simple database is growing to millions rows, cost and complexity of partitioning is reaching the level of cost-effectiveness

It seems that Google architecture is solving this issues. By their architecture I mean their applications, like Google File System, Chubby lock service and the most important Big Table distributed database (not really database, just distributed multidimensional map).

It's:
  • scaling linearly (add new machine to cluster, it's just working as it should, no specific configuration or manual partitioning of existing data is needed)
  • auto partitioning (when one machine is overloaded data are split and moved to other one)
  • low cost of administration (you care only about hardware failures)
  • designed to be high-available (it's not exactly HA. Google designed everything not to reduce single point of failures, they rather focused on fast restoring failed applications. In case of failure, application will be down for few seconds most.)

Well, Google architecture have also some limitations, for example sometimes you can't afford this few seconds of system downtime. But for most users Google architecture is resolving many important problems.

Are you're going to implement Big Table?

The main problem with this architecture is that it's build from projects that are tightly linked. You can't build reliable clone of Big Table without Google file system and Chubby lock manager.

The bigger the project is, the greater chance of failure. Nobody wants to do highly risk investment. We can hear "well, it's easier to hack current architecture than to build new from scratch". It's obviously true to some point. After that you need the final solution for problems right now. And it's getting harder to move a dozen or so terabytes data from old storage system to new.

I think there is a possibility of building something like Google architecture in small steps. But it's going to have some limitations in first stages.

First step: "Administrators nightmare"
Assume that we need only auto partitioning. Every other features at this point we ignore. We're going to create basic Big Table. By that I mean standalone application, that's going to answer simple mapping from key to value. The internals must be created as simplified version of Tablet server. The only feature we truly need is autopartitioning. When Tablet size is reaching some threshold, it splits to two instances, with reduced row ranges for each instance.

What we're going to get:
  • administrators must move instances manually from machine to another
  • manually configured (hardcoded) row-ranges at clients
  • no distributed features
  • reduced disk seeks compared to filesystem based key-value mapping

Second step: Multi-dimensional mapping
The idea of column families, timestamps and not limited columns is just great.

Third step: Chubby lock manager
We need to create some automated way of informing the clients about Tablet changes. Clients shouldn't have any hardcoded configuration. At this step we should create some kind of METATABLE for handling Tablets row ranges and Chubby as the main source of basic metadata. We can assume that we have only one instance of Chubby server (no replication). But we definitely need Chubby to serve files, locking and asynchronous events.

Fourth step: The File System
We need file system with shared name space for every node. If Tablet is partitioned, it must be possible to open new Tablet data on any other server.

Fifth step. Competing with Giant :)
At some point Chubby should be replicated, the Chubby master should be chosen using famous Paxos algorithm to be fully HA. Some other features to Big Tablets are also nice, like Bloom filters or compression. Secondary indices to Big Table are very nice (yuppi, Google doesn't have that feature).


Pencils Up! I wish you happy coding.

BTW. There is a Hadoop/HBase project, you can be interested.


2007-08-26

Some thoughts on hashing

I was looking for fast hash algorithms and I found Paul Hsieh SuperFastHash implementation. I also found that hash functions in Bloom filter algorithm are used to test if an element is member of a set.

I found interesting fact, that it's possible to count any number of independent hashes using only two hash functions using this formula (f1 and f2 are hash functions, i is any number that is relatively prime to m):

Even more surprising is fact that if f1 is something like this:
than f2 can be:
Which means, that to count any number of hash functions for specified element, it's needed to count only one "real" hash(x), plus some additional sums and modulos.

Knuth suggests to use such m value, that both m and m-2 are primes, like 1021 and 1019.



2007-08-13

How a bad password generator can ruin security


This is a true story about really broken password generator that was used at my college. The good idea and a bad implementation.

On 30th of December 2005 all students at PJIIT received email which sounded like this:

From today (30.12.2005) new password policy is going to be used:

      Password must contain eight or more characters
      Password must not contain username or any part of it
      Password should contain characters from three of four specified categories:
      1. Small letters [a-z]
      2. Capital letters [A-Z]
      3. Digits [0-9]
      4. Special characters: [!#$%^&*()_+{}:";'<>,.?]

This policy seems to be reasonable. Every of at least eight letters in password could be one of 82 possible characters ([a-zA-Z0-9!#$%^&*()_+{}:";'<>,.?] = 25+25+10+22 = 82).
Number of possible passwords is huge 828 = 2044*1012.

But creating such complicated password that fits to that policy was too hard for some students. Some of them just added some characters to the end of their previous password (password bart78 became bart78##). Others used password generator supplied by our college network adminstrators.


At first glance generator seems to create reasonable passwords. Let's concentrate at passwords generated at default "complexity 3".
character family[a-zA-Z][0-9][!"#$%&'()*+,-./]
number of characters
in generated pass
4-6 1-2 1-2

Number of characters for each character family is selected randomly from specified range. There are only four possible character patterns 4-1-1, 5-2-1, 5-1-2 and 6-1-1. I haven't written this password generator myself but I can bet that every pattern have the same probability to be selected. Let's count number of possible passwords in each pattern.
lettersdigitsspecial probability
to be selected
number of uniqe
possible passwords
4 2 2 25% 504 * 102 * 152 = 0.140 * 1012
5 2 1 25% 505 * 102 * 151 = 0.468 * 1012
5 1 2 25% 505 * 101 * 152 = 0.703 * 1012
6 1 1 25%506 * 101 * 151 = 2.243 * 1012
sum: 3.656 * 1012

That's not good. From 2044 * 1012 possible passwords, generator can generate only 3.6*1012 passwords. It's less than two pro miles of possible passwords!

The generator is even worse. That's because of Microsoft's programming framework and their approach to random numbers.
From Microsoft's documentation of Randomize function:
"[...] This example uses the Randomize statement to initialize the random-number generator. Because the number argument has been omitted, Randomize uses the return value from the Timer function as the new seed value."

And the magic "Timer function" documentation:
"Timer Function returns the number of seconds that have elapsed since 12:00 AM (midnight)"

Seconds from midningt, it means 24*60*60 = 86400 possible initial seed values for random number generator. That means only 86400 unique possible generated passwords.

Take a look at previous table. Most of unique passwords are in the fourth category. But probability of selecting this category is only 25%.
If we omit this category, we will still have 75% of probability of guessing the generated password. In our situation there are only 32000 unique passwords in first three categories.

In conclusion, from 2044*1012 possible passwords, bogus password generator created situation that 75% of passwords will be in group of only 32*103 unique passwords.
In this case brute force attack will have great chances of success.

UPDATE #1: The Visual Basic random number generator is even worse. Mark Hutchinson writes that in fact there are only 64000 possible random seeds values! (Not 86000 I mentioned before).