Wednesday, September 30, 2009

Multihash support

I really didn't think I'd care to actually write multihash brute forcers. I'd need to do a lot more comparing, sorting, a lot more hash reversals, etc... But I did it!

It's still work in progress, but EmDebr currently finds hashes quite fast on huge hash lists. Let's tell you some about my progress, the things I ran into and the ideas I borrowed from others :)

So I started out with some functions to compare calculated hashes to all the input hashes. Every round EmDebr generates 12 hashes, which then need to be compared to, for example, 1000 hashes. You don't want to compare all these one by one as that would really slow down the cracker for every additional hash you are searching for. The cracker should still be decently fast cracking an input list of over 100k hashes.

A very common way to search a list for a possible match is to do a binary search on a sorted list. This works fairly nice, but still gets fairly slow on large hash lists.

An idea I borrowed from the author of 'CUDA-Multiforcer' from (probably used by others as well) is to make a shared bitmap. The idea is to map a certain part of all the input hashes to a specific location in a bitmap. For example you take the first 32 bits of the hash and map that to a certain bit and set it to '1'. You do that with every input hash and end up with a bunch of memory filled with 0's and 1's. When you calculated a hash during the cracking process you can do the same mapping and see if there is a '1' on that location. If so, there must be a hash with the same first 32 bits in the hash list. This strategy gets you to only do one lookup for every calculated hash and only start doing a binary search on the full hash list if you have a partial match.

You can specify the size of the bitmap. The larger the bitmap, the more unique bits you'll have in your bitmap, so the less matches you have and thus the less binary searches you'll do. I did some benchmarking on my own system and came up with 1MB as a very reasonable size for both small and large hash lists. I actually tried some variations as well and ended up using 2 bitmaps. The first bitmap is created from the first 32 bits (not using all these bits), the second bitmap is created from the second 32 bits. This was faster then using a single bitmap with double the size.

The next problem was 'reversing', as with my single hash EmDebr a fair amount of hash reversing was used. This reversal takes the input hash and reversely calculates the last round of MD5 (and a few steps more) using a specific part of the current plaintext keyspace. With multihash support we'd have to do this process for every hash, quite often. That's like cracking salted hashes! And it's getting worse, every time after re-reversing we have to sort the full list again, and create the bitmaps!

So that sucks a lot in terms of speed, say 1 Mhashes/s. So I started out with just doing a full MD5 calculation again, not using any reversal at all. So when I got things to work and actually find plaintexts for the input hash list, I started playing with reversal again. The least you can do is do a one time reversal for the input hashes, up to the step where you need some plaintext input. This saves you from doing a single MD5 step and 4 additions. The next step to reverse already needs a part of the plaintext input. Luckily it's just 'W2', the part that contains characters 9-12 of the plaintext. This part doesn't change that often with a regular sized character set. The next (or previous ;)) steps to reverse don't take input on our (<16 sized) plaintexts, so we can just continue up to around half of the last round of MD5. There we run into 'W1' (characters 5-8), we'll have to stop there. In the end this actually allows us to still reverse 8 MD5 steps! And with a normal charset, this still pays off, gaining some extra Mhashes/s. For a small charset it doesn't work very well, but that's just 'numeric' in EmDebr. 'loweralpha' with 26 characters is running fine.

For the sorting I started out with 'default' quicksort, but ended up borrowing the Radix sort implementation from vampyr's Mysql/Sha1/Mssql GPU cracker.

My last improvement was to speed up the binary searches. Especially with larger hash lists, you'll still have to do these binary searches quite often. I now use a small index to limit the range of the binary search. I use the first 8 bits to cut the hash list in 256 parts. This even improved search speed on smaller hash lists!

To finish off, a nice little overview of my progress in terms of speed (using a 'normal' sized hash lists of like 30-40 hashes):

* 52 Mhashes/s using a 512MB bitmap
* 99 Mhashes/s using a 1MB bitmap (seemed bigger ain't always better)
* 107 Mhashes/s using two 1MB bitmaps
* 115 Mhashes/s using partial reversal
* 125 Mhashes/s using an index

With a hash list of 200 000 hashes EmDebr still keeps up, just falling back to around 119 Mhashes/s!

To compare, single hash EmDebr did around 175 Mhashes/s. I'm actually quite satisfied with current results on multihash EmDebr, using less reversing and doing all the comparing stuff! Just have to clean things up, make it display output in a nice way and let it write out results to a file. Stay tuned!


  1. Can i download it anywhere?

  2. Are you using C++? If so, ypu might try the new std::tr1::unordered_set it's been in boost now for many years as boost::unordered_set. It averages O(1) and I've used it to test millions of hashes at once in 16crack. It's more than twice as fast as a sorted binary_search. Here's a screenshot: