As promised I hereby present my MD5 password brute forcer, called EmDebr. I used reversing to speed things up, see my previous blog for more information about reversing.
I also gave interlacing SSE2 a try (did that after I posted my previous blog ;)) and came up with a speed improvement of about 40%. So on my own system (3.2Ghz quadcore) I got like 77 Mhashes/s before interlacing, with the current version I get around 100 Mhashes/s. Interlacing SSE2 is not my own idea and can be done (a lot) better then what I came up with. EmDebr is in this way still by far not the fastest SSE2 optimized MD5 password brute forcer around, as far as I know that's still BarsWF. As far as I know EmDebr is now the fastest open source one though :)
I hope this helps others to understand reversing and maybe you can use my code to write a better/faster open source cracker. Feel free to leave a comment or to post some improvements!
Oh right, the download links:
EmDebr_0.1_win32.zip
EmDebr_0.1_src.zip
You might also need to install the Microsoft Visual C++ 2008 Redistributable Package.
*Again, credits to Sc00bz for the public explanation about reversing*
Sunday, May 3, 2009
Subscribe to:
Post Comments (Atom)
Hello Daniel,
ReplyDeleteI am also programming an MD5-bruteforcer.
Its an project in our university. The goal is to show the "big" differences between the cpu and the gpu versions.
After I analysed your sourcecode I found out that you can speed-up your version by the factor of 2.
In your SSE2 interlaced part you only calculate 2 hashes at once, but the processor is capable to
do 4 operations at the same time. (When the operations NOT depend an each other)
Here is what I did:
#define FOUR2_mFunc_FF(a1,a2,a3,a4,b1,b2,b3,b4,c1,c2,c3,c4,d1,d2,d3,d4,mess1,mess2,mess3,mess4,sin_, left_,right_, t1,t2,t3,t4)\
mFunc_F(b1,c1,d1,t1);\
mFunc_F(b2,c2,d2,t2);\
mFunc_F(b3,c3,d3,t3);\
mFunc_F(b4,c4,d4,t4);\
M_Add(t1,mess1);\
M_Add(t2,mess2);\
M_Add(t3,mess3);\
M_Add(t4,mess4);\
M_Add(t1,sin_);\
M_Add(t2,sin_);\
M_Add(t3,sin_);\
M_Add(t4,sin_);\
M_Add(a1,t1);\
M_Add(a2,t2);\
M_Add(a3,t3);\
M_Add(a4,t4);\
M_Rotate_L(a1,left_,right_,t1);\
M_Rotate_L(a2,left_,right_,t2);\
M_Rotate_L(a3,left_,right_,t3);\
M_Rotate_L(a4,left_,right_,t4);\
M_Add(a1,b1);\
M_Add(a2,b2);\
M_Add(a3,b3);\
M_Add(a4,b4);\
By doing so, i got over 210 mill. Hashes per second. (Q6600 with 3.2 Ghz)
Hi Sascha, tnx for you comment. I actually have a version with more interlacing lying around already. I was still playing around with Intel compiler to see differences in speed.
ReplyDeleteWith my NTLM (MD4) brute forcer, I got best results by interlacing 4 times, with MD5 brute forcer it's with interlacing 3 times. This would strike me as odd, as I thought my cpu can execute 3 SSE2 instructions at the same time. After using Intel compiler, it was indeed 3 times that was just as fast. Apparently I just tricked Visual C's compiler to arrange instructions well enough to interlace 3x by actually writing the code as 4x.
I must say that I am really surprised by the speeds you claim to reach, 210 Mhashes/s, as even with Intel compiler I only got like 144 Mhashes/s with my Q9450@3.2 GHz. Or is that the speed you reach with your own brute forcer? If so, great work! Will you be releasing code for it? :)
(anyway, I'll release my better interlaced versions one of these days)
Hello Daniel,
ReplyDeleteI asked my prof. and he said that I am allowed to publish my code. ( I was not sure)
But my program has the BETA-status, only passwordlength up to 7 chars, and only 26 different chars are used. ('a' - 'z')
(I have still 2 months to finish my work ;-) )
here is the link:
http://rs371.rapidshare.com/files/233288217/MD5.rar
If you have some questions, you can write me an Email to : Sascha-Pfaller@web.de
Hi Sascha, nice work you did there. It's great that you are allowed (and want) to publish source code! If you can keep up that speed, you might not only have the fastest CPU MD5 password cracker that is open source, but it might also be faster then the fastest closed source cracker (I guess for now that's still BarsWF).
ReplyDeleteI browsed a little through your code, and at least it made me realize that I forgot to interlace the ROTATE_LEFT function. So at least for my MD5 brute forcer speed for VC compiled version went from 117 Mhashes/s to 135 Mhashes/s. Haven't tried the Intel compiler yet. So tnx for that.
Good luck on finishing your brute forcer, 2 months should be enough time :)
Are you also building a GPU version yourself? Care to share some information and statistics?
Feel free to keep the me (and readers of this blog) up2date about your progress.
http://rs371.rapidshare.com/files/233288217/MD5.rar
ReplyDeleteis now invalid. Could someone please provide a valid link to this?
Also, could you please provide compile instructions for Linux for EmDeBr?
Thanks for the great work! :)