Saturday, January 17, 2009

SHAbr update, I passed 60 Mhashes/s

I couldn't help optimizing SHAbr just a little more, I really wanted to get the 60 Mhashes/s on my system :)

What did I do?
A plaintext first consists of an array of characters (unsigned char[16]), then I pass it as an __m128i type to the SHA-1 function. With SSE2 I do 4 plaintexts at the same time, so actually 4 of these variables are passed to this function.

The variables get split into 4 32-bit integers (I called these types UINT4). The SHA-1 function then puts these variables in an array of 80 UINT4's, this array is called W. This array is used to add a value in every one of the 80 steps of the actual hashing part. The first 16 UINT4's in this array W are just the parts of the plaintext, and as I only support plaintexts with length < 16, only the first 4 UINT4's are filled. Then W[4]...W[14] contain zero's in my case, W[15] contains the length of the plaintext in bits.

So what happens with W[16]...W[79]? The previous values (the actual plaintext) are 'expanded' into these values, so every step has additional input and is dependent on the plaintext. This expanding isn't that special:

W[t] = W[t-3] XOR W[t-8] XOR W[t-14] XOR W[t-16]
And then ROTATE this result by one.

As W[4]...W[14] contains nothing but zero's, I didn't actually want to set them. But because W[18]...W[30] depend on these zero's, they should be set. Unless we change the EXPAND function for W[18]...W[30]. But as that needs more unrolling of the loops, my code gets bigger and maybe slower (had that before).

So now I unrolled certain parts in a strange way, but somehow it works :)
I now have some strange code like:

for(t = 20; t < 21; t++){
SSE_EXPAND_3(t); ROTATE(t)
SSE_EXPAND_3(t+1); ROTATE(t+1)
SSE_EXPAND_3(t+2); ROTATE(t+2)
SSE_EXPAND_3_8(t+3); ROTATE(t+3)
SSE_EXPAND_3_8(t+4); ROTATE(t+4)
}

This code block within the for loop gets executed only once. So you'd say that I could just write out SSE_EXPAND3(20) - (24) and such, but somehow that makes things slower...

Anyway, instead of 58.5 Mhashes/s I now get:
Length 5 - 55% in 8.39 s (60.29 Mhashes/s)

Here are the downloads:
shabr_0.2_win_32bit_binary.zip
shabr_0.2_src.zip

10 comments:

Anonymous said...

Cool, I stumbled upon this because.. well.. I fckd up...
Anyway, my password consists of certain sequences that I'm sure of, just not the order of them (and if all of them are available) also there is some *glue* of normal characters between the sequences. What I want to do is simply add a couple of new "characters" (that actually consists of x characters).
So that "after z" comes a new character which really is "abcdefg" or soemthing.

With this I think I can get the password down to about 8-9 "charactes" which would be very doable.

Now I should be able to alter the source to do this myself (really appretiate you sharing the code) however my C is a bit rusty/nonexistent so before banging my head in the wall I just wonder if this would be as easy to add as it sounds in my head or perhaps if someone knows about an application that has this functionality from the start.

Daniel Niggebrugge said...

hi there, you might want to take a look at www.cryptohaze.com, this guy has just release a new beta version of his cuda brute forcer, with sha-1 support, and per position charsets:

http://www.cryptohaze.com/forum/viewtopic.php?f=4&t=57&start=0

haven't tested it myself yet, but it looks like this might suit your needs. He will release source code later on as well, so that might be even easier to alter if you still need it a little different.

Anonymous said...

I need anything else to run this? When I execute it it gives and error.

Daniel Niggebrugge said...

You might need to install:

http://www.microsoft.com/downloads/details.aspx?familyid=D5692CE4-ADAD-4000-ABFE-64628A267EF0&displaylang=en

Anonymous said...

That was it! Thanks.

Anonymous said...

I made some changes to the code to make it work in OS X and (probably) linux/freebsd/etc.

http://s3.amazonaws.com:80/burkelibbey.baconfile.com/shabr-unix.tgz

This might require SSE4.1. I'm not quite sure. I had to use __builtin_ia32_vec_ext_v4si.

Daniel Niggebrugge said...

hi tnx for that, i might have a look at it one day :)

i kind of fixed a linux version once already, main problem there was something with the timers to calculate the speed when using multiple threads. was a while ago, i might probably fix that fairly fast by now :p

if i'm not mistaken i just used _mm_store_si128 or something to get data from sse registers to memory. not sure anymore :) shabr would actually need a fair bit of adjustment, implementing many optimizations from my other brute forcers. I guess it should be doable to get it passed 100 Mhashes/s on my system.

Anonymous said...

This is a great implementation. Great work! However, I need to modify this a little bit, and I am not able to do that. Let me explain what the problem is.

If I encrypt string "12345678" (8 bytes, in hexa 31 32 33 34 35 36 37 38) you get a hash 7c222f... (etc.). Your algorithm finds the original text in a few seconds. However, I need to crack the same string, but encrypted a little bit different. Instead of 8 bytes, it was 16, with the last 8 as 0 (null terminator). In other words it was (in hexa) 31 32 33 34 35 36 37 38 00 00 00 00 00 00 00 00. Of course, that produced another hash value and your algoritm does not find 12345678 in this case.

I tried modifying different things but nothing worked. But then my knowledge of the SHA1 algoritm are very limited and don't think I can modify it. For instance I didn't understand the meaning for this:

// padding bytes
pMuteMe1[length] = 0x80;
pMuteMe2[length] = 0x80;
pMuteMe3[length] = 0x80;
pMuteMe4[length] = 0x80;

Could you suggest the changes required for modifying this brute force algorithm for my purposes?

Thank you.

Daniel Niggebrugge said...

hi there, seems like you just want to always set 'length' to 16. Is that correct? Or is it always 2*length ?

0x80 is used to set a bit to '1' at the end of the string, it is then padded with 0's.

Anonymous said...

Hi Daniel,
thanks for this wonderfull tool. i wanted to modify it to use it for my needs. I have an SHA1 hash and also the data which is been hashed but i miss the salt. I wanted to use your source to try to find the salt of this hash. the data which is hashed is known so this could make bruteforce easyier. The data is 6 digits long but the salt is unknown. The salt could also be other chars non standard ASCII. As i am not into CPP programming what tool have you used to compile SHABR? and if you could help me to set up the program so proper to meet my needs.
wbr and good luck

Post a Comment