I knew my plaintext generation was probably not very efficient anymore (generating 12 plaintexts per round). A nice mail exchange with Sascha Pfaller who made an even faster MD5 brute forcer, pointed me at this piece of code again. Also some other issues got clear, which I should have seen myself before (but didn't, so tnx :P).
I actually didn't want to bother with implementing a new plaintext generation and was satisfied with current speeds. But then I decided to do something completely different... I decided to go and play around with my Playstation 3!
Previous research was done by Nick Breese and presented at Black Hat. He released some benchmarking code that was supposed to be superceeded with new code. I couldn't find that code though, but I'm really curious about his new results.
Anyway, back to my story. In his presentation slides Nick Breese wrote that he easily ported his PS3 code to SSE2 code, so I thought "then it must be easy to port it the other way around as well". So I started out and indeed, I was able to produce some working code in about 2-3 days work (not fulltime). I was however fairly disappointed by the results in terms of speed. I wasn't expecting to hit very high, as I already knew Nick's setback, but it was still quite low (at first 8 Mhashes/s per core, the next day 13 Mhashes/s). As the working cores of the PS3 (the so called SPE's, you've got 6 of them available) are designed to work with vectors (SIMD), my plaintext generation might be kicking in even worse this time.
I didn't really want to do it the way Sascha is doing it, as that would take more thinking and modifications for me. And then suddenly this new idea came to mind. Instead of updating the first 4 bytes of the plaintext 12 times per round, I now do this once, and have the 5th (and beyond) bytes differ from each other. These don't have to change every round, only when updating reversal values.
To keep it short, I went from like 13 Mhashes/s to like 28 Mhashes/s with this new plaintext generation... per core! So for 6 cores that is from 78 to 168 Mhashes/s! After some more changes (3x interlacing was faster again, instead of 2, which strikes me as odd, because the cores only have 2 pipelines...) I hit around 33.3 Mhashes/s per core, which comes to 200 Mhashes/s in total.
This will be sped up some more, as the PS3 also has 2 'generic' PowerPC CPU cores that can be used. Maybe I'm not even close to revealing the full potential of the PS3, but when I'm done with the brute forcer, I'll release code again (as usual) and hope that others go and give it a try.
Now I almost forgot to bring the SSE2 news... I implemented this new plaintext generation in my SSE2 MD5 brute forcer, and this got me from around 138 Mhashes/s to 177 Mhashes/s!! I'll try and implement these new optimizations into my other brute forcers as well and hopefully release them soon.
Wednesday, May 27, 2009
Subscribe to:
Post Comments (Atom)
Hi Daniel, Great work and indeed I was going to approach you about some code for the PS3 based on your SSE2 projects, i've posted some optimized MD4/MD5 macros over at my own blog but never got around to writing any recovery tools.
ReplyDeleteIt might interest you, keep up the good work!
Awesome macros! tnx! (see comment at your own blog ;))
ReplyDelete