What did I do?

A plaintext first consists of an array of characters (unsigned char[16]), then I pass it as an __m128i type to the SHA-1 function. With SSE2 I do 4 plaintexts at the same time, so actually 4 of these variables are passed to this function.

The variables get split into 4 32-bit integers (I called these types UINT4). The SHA-1 function then puts these variables in an array of 80 UINT4's, this array is called W. This array is used to add a value in every one of the 80 steps of the actual hashing part. The first 16 UINT4's in this array W are just the parts of the plaintext, and as I only support plaintexts with length < 16, only the first 4 UINT4's are filled. Then W[4]...W[14] contain zero's in my case, W[15] contains the length of the plaintext in bits.

So what happens with W[16]...W[79]? The previous values (the actual plaintext) are 'expanded' into these values, so every step has additional input and is dependent on the plaintext. This expanding isn't that special:

W[t] = W[t-3] XOR W[t-8] XOR W[t-14] XOR W[t-16]

And then ROTATE this result by one.

As W[4]...W[14] contains nothing but zero's, I didn't actually want to set them. But because W[18]...W[30] depend on these zero's, they should be set. Unless we change the EXPAND function for W[18]...W[30]. But as that needs more unrolling of the loops, my code gets bigger and maybe slower (had that before).

So now I unrolled certain parts in a strange way, but somehow it works :)

I now have some strange code like:

for(t = 20; t < 21; t++){

SSE_EXPAND_3(t); ROTATE(t)

SSE_EXPAND_3(t+1); ROTATE(t+1)

SSE_EXPAND_3(t+2); ROTATE(t+2)

SSE_EXPAND_3_8(t+3); ROTATE(t+3)

SSE_EXPAND_3_8(t+4); ROTATE(t+4)

}

This code block within the for loop gets executed only once. So you'd say that I could just write out SSE_EXPAND3(20) - (24) and such, but somehow that makes things slower...

Anyway, instead of 58.5 Mhashes/s I now get:

*Length 5 - 55% in 8.39 s (60.29 Mhashes/s)*

Here are the downloads:

shabr_0.2_win_32bit_binary.zip

shabr_0.2_src.zip