Saturday, January 17, 2009

SHAbr update, I passed 60 Mhashes/s

I couldn't help optimizing SHAbr just a little more, I really wanted to get the 60 Mhashes/s on my system :)

What did I do?
A plaintext first consists of an array of characters (unsigned char[16]), then I pass it as an __m128i type to the SHA-1 function. With SSE2 I do 4 plaintexts at the same time, so actually 4 of these variables are passed to this function.

The variables get split into 4 32-bit integers (I called these types UINT4). The SHA-1 function then puts these variables in an array of 80 UINT4's, this array is called W. This array is used to add a value in every one of the 80 steps of the actual hashing part. The first 16 UINT4's in this array W are just the parts of the plaintext, and as I only support plaintexts with length < 16, only the first 4 UINT4's are filled. Then W[4]...W[14] contain zero's in my case, W[15] contains the length of the plaintext in bits.

So what happens with W[16]...W[79]? The previous values (the actual plaintext) are 'expanded' into these values, so every step has additional input and is dependent on the plaintext. This expanding isn't that special:

W[t] = W[t-3] XOR W[t-8] XOR W[t-14] XOR W[t-16]
And then ROTATE this result by one.

As W[4]...W[14] contains nothing but zero's, I didn't actually want to set them. But because W[18]...W[30] depend on these zero's, they should be set. Unless we change the EXPAND function for W[18]...W[30]. But as that needs more unrolling of the loops, my code gets bigger and maybe slower (had that before).

So now I unrolled certain parts in a strange way, but somehow it works :)
I now have some strange code like:

for(t = 20; t < 21; t++){
SSE_EXPAND_3(t); ROTATE(t)
SSE_EXPAND_3(t+1); ROTATE(t+1)
SSE_EXPAND_3(t+2); ROTATE(t+2)
SSE_EXPAND_3_8(t+3); ROTATE(t+3)
SSE_EXPAND_3_8(t+4); ROTATE(t+4)
}

This code block within the for loop gets executed only once. So you'd say that I could just write out SSE_EXPAND3(20) - (24) and such, but somehow that makes things slower...

Anyway, instead of 58.5 Mhashes/s I now get:
Length 5 - 55% in 8.39 s (60.29 Mhashes/s)

Here are the downloads:
shabr_0.2_win_32bit_binary.zip
shabr_0.2_src.zip

19 comments:

  1. Cool, I stumbled upon this because.. well.. I fckd up...
    Anyway, my password consists of certain sequences that I'm sure of, just not the order of them (and if all of them are available) also there is some *glue* of normal characters between the sequences. What I want to do is simply add a couple of new "characters" (that actually consists of x characters).
    So that "after z" comes a new character which really is "abcdefg" or soemthing.

    With this I think I can get the password down to about 8-9 "charactes" which would be very doable.

    Now I should be able to alter the source to do this myself (really appretiate you sharing the code) however my C is a bit rusty/nonexistent so before banging my head in the wall I just wonder if this would be as easy to add as it sounds in my head or perhaps if someone knows about an application that has this functionality from the start.

    ReplyDelete
  2. hi there, you might want to take a look at www.cryptohaze.com, this guy has just release a new beta version of his cuda brute forcer, with sha-1 support, and per position charsets:

    http://www.cryptohaze.com/forum/viewtopic.php?f=4&t=57&start=0

    haven't tested it myself yet, but it looks like this might suit your needs. He will release source code later on as well, so that might be even easier to alter if you still need it a little different.

    ReplyDelete
  3. I need anything else to run this? When I execute it it gives and error.

    ReplyDelete
  4. You might need to install:

    http://www.microsoft.com/downloads/details.aspx?familyid=D5692CE4-ADAD-4000-ABFE-64628A267EF0&displaylang=en

    ReplyDelete
  5. That was it! Thanks.

    ReplyDelete
  6. I made some changes to the code to make it work in OS X and (probably) linux/freebsd/etc.

    http://s3.amazonaws.com:80/burkelibbey.baconfile.com/shabr-unix.tgz

    This might require SSE4.1. I'm not quite sure. I had to use __builtin_ia32_vec_ext_v4si.

    ReplyDelete
  7. hi tnx for that, i might have a look at it one day :)

    i kind of fixed a linux version once already, main problem there was something with the timers to calculate the speed when using multiple threads. was a while ago, i might probably fix that fairly fast by now :p

    if i'm not mistaken i just used _mm_store_si128 or something to get data from sse registers to memory. not sure anymore :) shabr would actually need a fair bit of adjustment, implementing many optimizations from my other brute forcers. I guess it should be doable to get it passed 100 Mhashes/s on my system.

    ReplyDelete
  8. This is a great implementation. Great work! However, I need to modify this a little bit, and I am not able to do that. Let me explain what the problem is.

    If I encrypt string "12345678" (8 bytes, in hexa 31 32 33 34 35 36 37 38) you get a hash 7c222f... (etc.). Your algorithm finds the original text in a few seconds. However, I need to crack the same string, but encrypted a little bit different. Instead of 8 bytes, it was 16, with the last 8 as 0 (null terminator). In other words it was (in hexa) 31 32 33 34 35 36 37 38 00 00 00 00 00 00 00 00. Of course, that produced another hash value and your algoritm does not find 12345678 in this case.

    I tried modifying different things but nothing worked. But then my knowledge of the SHA1 algoritm are very limited and don't think I can modify it. For instance I didn't understand the meaning for this:

    // padding bytes
    pMuteMe1[length] = 0x80;
    pMuteMe2[length] = 0x80;
    pMuteMe3[length] = 0x80;
    pMuteMe4[length] = 0x80;

    Could you suggest the changes required for modifying this brute force algorithm for my purposes?

    Thank you.

    ReplyDelete
  9. hi there, seems like you just want to always set 'length' to 16. Is that correct? Or is it always 2*length ?

    0x80 is used to set a bit to '1' at the end of the string, it is then padded with 0's.

    ReplyDelete
  10. Hi Daniel,
    thanks for this wonderfull tool. i wanted to modify it to use it for my needs. I have an SHA1 hash and also the data which is been hashed but i miss the salt. I wanted to use your source to try to find the salt of this hash. the data which is hashed is known so this could make bruteforce easyier. The data is 6 digits long but the salt is unknown. The salt could also be other chars non standard ASCII. As i am not into CPP programming what tool have you used to compile SHABR? and if you could help me to set up the program so proper to meet my needs.
    wbr and good luck

    ReplyDelete
  11. Daniel,
    There is a small syntax error at sha1.cpp:19

    - #ifdef WIN32
    + #ifdef _WIN32

    Just as your comment says, windows is stupid. :)

    Still Rampant,
    Durandal

    ReplyDelete
  12. hello, i search precomputed raibowtable for sha1, in your website you say, all is mooved at garr, but the folder for sha1 is empty :(

    ReplyDelete
  13. I took the code and fed it a 16 byte key with all 'a' characters (the other 3 keys I left as zeroes) but the resulting sha1 didn't match the sha1 generated by a known to be working but normal slow speed sha1 routine. Should it match or have I misunderstood something about how the password cracker works? Thanks, Simon

    ReplyDelete
  14. "hello, i search precomputed raibowtable for sha1, in your website you say, all is mooved at garr, but the folder for sha1 is empty :("

    tables are currently being synchronized to garr, ntlm tables are being transferred at the moment

    ReplyDelete
  15. "I took the code and fed it a 16 byte key with all 'a' characters (the other 3 keys I left as zeroes) but the resulting sha1 didn't match the sha1 generated by a known to be working but normal slow speed sha1 routine. Should it match or have I misunderstood something about how the password cracker works? Thanks, Simon"

    if i remember correctly, shabr is optimized for passwords with a length of max 15, so 16 results in a wrong hash yes

    ReplyDelete
  16. I neglected to mention that I first tried with 1 and 4 'a' characters too. The results were never as expected. I was also checking for results which were in the wrong byte order etc but no luck. Maybe I'm doing something wrong but I need help to figure out what! :-)

    ReplyDelete
  17. Hi Daniel,
    this is great tool,but maybe you can help me with compiler error
    1>.\sha1.cpp(35) : error C3820: 'm': initializers must be managed
    1> Aligned data types not supported in managed code
    I use vs 2008 and couldn't find solution for this
    Thanks

    ReplyDelete
  18. Here's a faster SSE_Endian_Reverse32. It should compile to 6 instructions including moves vs 12 instructions including moves. With AVX it's 5 vs 9 and AVX with XOP it's 4 vs 5.

    #define ROT16(n) _mm_shufflelo_epi16(_mm_shufflehi_epi16(n, 0xb1), 0xb1)
    #define SSE_Endian_Reverse32(a) _mm_or_si128(_mm_srli_epi16(ROT16(n), 8), _mm_slli_epi16(ROT16(n), 8))

    ReplyDelete
  19. tnx for that, i might give it a try some day... if i ever get back to some coding fun...
    (even too lazy to log on :P)

    ReplyDelete