Distracted: May 2009

Friday, May 29, 2009

PS3 speed up!

Check out G924789's blog!

Improved my code from around 200 Mhashes/s to 250 Mhashes/s!!

Tnx for sharing G924789 (do you live there?) :)

Wednesday, May 27, 2009

Again faster brute forcers! And PS3!

I knew my plaintext generation was probably not very efficient anymore (generating 12 plaintexts per round). A nice mail exchange with Sascha Pfaller who made an even faster MD5 brute forcer, pointed me at this piece of code again. Also some other issues got clear, which I should have seen myself before (but didn't, so tnx :P).

I actually didn't want to bother with implementing a new plaintext generation and was satisfied with current speeds. But then I decided to do something completely different... I decided to go and play around with my Playstation 3!

Previous research was done by Nick Breese and presented at Black Hat. He released some benchmarking code that was supposed to be superceeded with new code. I couldn't find that code though, but I'm really curious about his new results.

Anyway, back to my story. In his presentation slides Nick Breese wrote that he easily ported his PS3 code to SSE2 code, so I thought "then it must be easy to port it the other way around as well". So I started out and indeed, I was able to produce some working code in about 2-3 days work (not fulltime). I was however fairly disappointed by the results in terms of speed. I wasn't expecting to hit very high, as I already knew Nick's setback, but it was still quite low (at first 8 Mhashes/s per core, the next day 13 Mhashes/s). As the working cores of the PS3 (the so called SPE's, you've got 6 of them available) are designed to work with vectors (SIMD), my plaintext generation might be kicking in even worse this time.

I didn't really want to do it the way Sascha is doing it, as that would take more thinking and modifications for me. And then suddenly this new idea came to mind. Instead of updating the first 4 bytes of the plaintext 12 times per round, I now do this once, and have the 5th (and beyond) bytes differ from each other. These don't have to change every round, only when updating reversal values.

To keep it short, I went from like 13 Mhashes/s to like 28 Mhashes/s with this new plaintext generation... per core! So for 6 cores that is from 78 to 168 Mhashes/s! After some more changes (3x interlacing was faster again, instead of 2, which strikes me as odd, because the cores only have 2 pipelines...) I hit around 33.3 Mhashes/s per core, which comes to 200 Mhashes/s in total.

This will be sped up some more, as the PS3 also has 2 'generic' PowerPC CPU cores that can be used. Maybe I'm not even close to revealing the full potential of the PS3, but when I'm done with the brute forcer, I'll release code again (as usual) and hope that others go and give it a try.

Now I almost forgot to bring the SSE2 news... I implemented this new plaintext generation in my SSE2 MD5 brute forcer, and this got me from around 138 Mhashes/s to 177 Mhashes/s!! I'll try and implement these new optimizations into my other brute forcers as well and hopefully release them soon.

Monday, May 25, 2009

XSS in AWStats, no 0day, 2year!

I don't really like releasing information about security bugs this way. But I'd rather have people know about a bug in the software they use if it's not fixed. The information has been publicly available in the SourceForge tracker anyway.

AWStats contains a Cross Site Scripting vulnerability (XSS) in the output parameter:

http://[domain]/awstats/awstats.pl?config=[example.com]&framename=mainright&output="%20style="width:%20expression(alert('XSS'));"

This one doesn't work in FF, it does in IE7 though.

(I tried contacting the author in several ways, no response.)

Sunday, May 17, 2009

EmDebr: Faster Windows binary and Linux version 'working'

I got a response to my blog from Sascha Pfaller who's writing another MD5 brute forcer. As it seems there is a faster open source alternative coming up!

After skipping through this new code I realized that I forgot to interlace my ROTATE_LEFT functions. As this is actually 3 operations each, VC compiler failed to interlace this for me. I now went from 117 Mhashes/s to 135 Mhashes/s on my system with VC compiler. Apparently Intel compiler already interlaced my ROTATEs before, no speed increase with my new code with Intel compiler.

I also tried to fix my source code to work with Linux. I still have some stupid problem with pthreads and signaling conditions or something. I give up for now and just release a version that at least works. If you plan on using it, mind the README:

Linux support is in a state of 'it works and should find your plaintexts'. There are some bugs; Speed doesn't always show correctly or sometimes not at all. If you specify more threads then you have cores in your system, using -t, you might see high numbers for the speed. Your system is not actually running that fast, sorry :)

New files:

EmDebr:
EmDebr_0.3_win32.zip
EmDebr_0.3_src.zip

New version of EnTibr, Cacheebr and shabr might be coming up if anyone actually wants to use those in Linux.

Please feel free to leave comments or feedback about how fast you are going with the various versions. Or if you notice any bugs, please let me know!

Friday, May 15, 2009

Cacheebr, the MS Cache password brute forcer

As requested, I built an MS Cache brute forcer. The MS Cache hashes are a little harder to optimize. They are salted and need 2x MD4. This is how you built an MS Cache hash:

* Built NTLM hash for the password: MD4(Unicode(password))
* Append Unicode&lowercase username to the NTLM hash
* MD4 that

So in short: MD4( MD4(Unicode(password)) + Unicode(tolower(username)) )

Because of this, you need the calculate the full MD4 hash for every plaintext. Because of the unknown first 16 bytes of the input for the final MD4 (the NTLM hash), you cannot really reverse steps. I only reversed partial last steps.

I've been a little lazy, this version only supports usernames with a maximum length of 19 characters. You would need to do an additional MD4 for longer usernames.

I interlaced SSE2 three times, getting to something like 72 Mhashes/s on my system.

The download links:

Cacheebr_0.1_win32.zip
Cacheebr_0.1_src.zip

You might also need to install the Microsoft Visual C++ 2008 Redistributable Package.

Thursday, May 14, 2009

rcracki_mt has a new home, and a new version!

I have created a project at SourceForge for rcracki_mt. For those who don't know what it is:

rcracki_mt can be used to perform a rainbow table attack on password hashes. It is intended for indexed&perfected rainbow tables, mainly generated by the distributed project www.freerainbowtables.com (which has been down for a few days, no idea why).

The new home is at: https://sourceforge.net/projects/rcracki/

At SourceForge you can also find the new version, version 0.6. This version should be the latest beta version and should become 1.0 after some (possible) bug fixes. Check out the README and ChangeLog (at the bottom).

Please use the trackers for bug reports and feature request.

Monday, May 11, 2009

Faster brute forcers

Hi there, as I realized just after posting EmDebr and EnTibr, I could interlace more SSE2 instructions. From what I've understood so far, my processor (Core2) should be able to execute 3 SSE2 instructions simultaneously. With the 0.1 releases of my brute forcers, I only try to do 2. That would make up for a nice speed increase :)

But I noticed something strange when I was playing around with this... my MD4 brute forcer (EnTibr) has the following speeds with the different interlacings:

1x SSE2 : 110 Mhashes/s
2x SSE2 : 150 Mhashes/s
3x SSE2 : 175 Mhashes/s
4x SSE2 : 200 Mhashes/s
(5x gets slower, like 170)

My MD5 brute forcer (EmDebr) has the following speeds with the different interlacings:

1x SSE2 : 77 Mhashes/s
2x SSE2 : 100 Mhashes/s
3x SSE2 : 116 Mhashes/s
4x SSE2 : 105 Mhashes/s

Now for MD5 this seems more logical, but it strikes me as odd that MD4 still gains speed with 4x SSE2. BarsMonster (from BarsWF) suggested to try again with Intel compiler. So I downloaded evaluation versions for VC and Intel compiler and tried... this one at least had logical results :)

So 'final' results with Intel compiler:

EnTibr: 3x SSE2 -> 200 Mhashes/s
EmDebr: 3x SSE2 -> 144 Mhashes/s

I probably had some luck with 4x SSE2 and the VC compiler. It probably arranges instructions well enough to actually perform 3 SSE2 instructions simultaneously. MD4 code is also smaller then MD5, maybe allowing it to just fit in the cache.

Both brute forcers can gain some more speed by tweaking the Intel compiler some more, but I can't really care at the moment. I will only release binaries compiled with VC Express, but feel free to compile your own faster version with the Intel compiler :)

I release 2 versions of the NTLM/MD4 code&binary, one with 3x SSE2, the other with 4x SSE2. I also fixed a bug where the brute forcers sometimes kept running after finding the plaintext. Uhm, I might have changed some other pieces of the code as well... don't remember.

New files:

EmDebr:
EmDebr_0.2_win32.zip
EmDebr_0.2_src.zip

EnTibr 3x SSE2:
EnTibr_0.2_3xSSE2_win32.zip
EnTibr_0.2_3xSSE2_src.zip

EnTibr 4x SSE2:
EnTibr_0.2_4xSSE2_win32.zip
EnTibr_0.2_4xSSE2_src.zip

Please feel free to leave comments or feedback about how fast you are going with the various versions. Or if you notice any bugs, please let me know!

Sunday, May 3, 2009

EnTibr, the NTLM password brute forcer

And hereby I also present my open source, SSE2 optimized, NTLM password brute forcer, called EnTibr. This is almost the same code as the one used in EmDebr. I also used reversing of the MD4 algorithm, skipping the full 3rd round. With my own system I get around 150 Mhashes/s using 4 cores @ 3.2Ghz.

Again, I hope this helps others to understand reversing and maybe you can use my code to write a better/faster open source cracker. Feel free to leave a comment or to post some improvements!

The download links:

EnTibr_0.1_win32.zip
EnTibr_0.1_src.zip

You might also need to install the Microsoft Visual C++ 2008 Redistributable Package.

EmDebr, the MD5 password brute forcer

As promised I hereby present my MD5 password brute forcer, called EmDebr. I used reversing to speed things up, see my previous blog for more information about reversing.

I also gave interlacing SSE2 a try (did that after I posted my previous blog ;)) and came up with a speed improvement of about 40%. So on my own system (3.2Ghz quadcore) I got like 77 Mhashes/s before interlacing, with the current version I get around 100 Mhashes/s. Interlacing SSE2 is not my own idea and can be done (a lot) better then what I came up with. EmDebr is in this way still by far not the fastest SSE2 optimized MD5 password brute forcer around, as far as I know that's still BarsWF. As far as I know EmDebr is now the fastest open source one though :)

I hope this helps others to understand reversing and maybe you can use my code to write a better/faster open source cracker. Feel free to leave a comment or to post some improvements!

Oh right, the download links:

EmDebr_0.1_win32.zip
EmDebr_0.1_src.zip

You might also need to install the Microsoft Visual C++ 2008 Redistributable Package.

*Again, credits to Sc00bz for the public explanation about reversing*

Distracted