Thursday, October 29, 2009

Analysis of 10k Hotmail Passwords - Even More Brute Force

A reader asked me through e-mail how much better John the Ripper's Markov models were compared to pure brute force or letter frequency analysis. I knew there was a reason why I put my e-mail address on the side of this blog. That's a great question, since while I'd always had more success with Markov models vs letter frequency analysis, (and certainly brute force), I had never measured the difference before. What type of researcher am I? I better fix that, so let's check it out.

Test 4: Markov Models vs. Letter Frequency Analysis vs. Pure Brute Force

So in this test I reused the data collected previously in Test 1 using JtR's -incremental mode targeting lowercase letters and numbers, (a-z0-9). I then used the popular tool crunch to run both the brute force and letter frequency analysis, (which I'm going to call LFA), attacks since JtR doesn't support pure brute force, (well there is a bit of a hack, but crunch is easier). For the pure brute force attack I ran:

./crunch 6 8 abcdefghijklmnopqrstuvwxyz0123456789

In short, I'm creating brute force guesses between 6 and 8 characters long containing a-z0-9, starting with aaaaaa and theoretically ending with 99999999, (in this run it never finished all of the six character long words). This is also the default character set that the popular password cracker Cain&Able uses.

For the LFA attack, I then ran it using the numbers based on my analysis of the password list. As we'll see, you can do better if you base your LFA on analysis of Spanish/Portuguese passwords, but I don't have another dataset of Spanish/Portuguese passwords so this is the best I can do without cheating. The command I used was:

./crunch 6 8 smp1abctdkjlhrfgnw2ei0ov3q457z968yux

A couple of notes. I used the first character LFA charset since crunch increments it's values from right to left; aka it will try 01, 02, 03, 04 etc. This means using the above character set, it will initially crack all the words starting with 's', then move on to words starting with a 'm' and so on. When there is no password creation policy, (aka users aren't forced to include numbers in their passwords), this is slightly better than password crackers such as Cain&Able who increment their values from left to right, aka 10, 20, 30, 40 ... The reason for this is because people are slightly more predicable in how they start words/passwords, vs how the end them. For example, from the dataset, over 51% of the users choose one of top ten characters to start their passwords. Likewise, 46% of users choose one of top ten characters to end their passwords. It's not a big difference, but every little bit helps. If there is a password creation policy though, this quickly falls apart, and it's vastly more preferable to use a left to right approach since people almost always put numbers/special characters at the end of their passwords. For example, just the number '1' appears 21% of the time at the end of stronger passwords.

Also, the more astute, (and/or skeptical), readers might notice the order I'm using is slightly different from the one I published before. That's because I've cracked more passwords from the dataset, (I'm nearly up to 98% of the total set cracked), so I've updated some of my statistics. That's another thing I need to get around to posting up here...

Results: Note, you can click on the image for a bigger version of it.

Analysis: The first thing that stands out is that my letter frequency analysis percentages are way off. I actually do worse initially then if I had chosen a pure brute force approach. I know it's cheating, but going back to my original analysis of the hotmail list, and comparing it to the phpbb list, you get the following:

Hotmail list: a1mbc2sp0lterdjfgn3hi6k759vo48ywzuqx
Phpbb list : smp1abctdkjlhrfgnw2ei0ov3q457z968yux

Looking at this, the best choice you could have done was to start with the letter 'a' which is coincidentally what the pure brute force approach did. The reason for the huge bump in the LFA attack around the 220 million guess mark is because it hit the '1' as the first character, followed by an 'a' which was about the equivalent of a perfect storm since people started their passwords with a number, followed by the actual word they were using. Even with my probabilities being off though, LFA still beat out pure brute force, so if you aren't at least using that, well I don't know what more I can say to convince you. That being said, this really shows how superior using Markov models is to both attacks, with the alphanumeric Markov attack cracking more than twice the amount of passwords that the LFA did. I'd also like to point out that if you look at the first several million guesses, a Markov based attack performs so much better than LFA it's ridiculous.

Test 5: LFA and Brute Force Attacks Against Different Length Passwords

The next question is how do LFA and Brute Force attacks perform as the password gets longer. As I said before, the above test only shows them attacking 6 character long passwords. What happens when we bump up the length to 7 or 8 characters minimum. To simulate this, I once again used crunch with the above settings except for increasing the minimum size guesses allowed. I also used the phpbb LFA probabilities, since I hate training and testing on the same dataset, (also I had already run the tests before I graphed out the results and realized how off those probabilities were...)


Analysis: I didn't even bother to put the eight character one up there since it was pretty boring. If I'm generous the pure brute force attack would crack one eight character long password, (it took slightly more than one billion guesses to do so), and the LFA attack would crack two. This shows though that while increasing the character set quickly makes brute force and LFA attacks infeasible, which is what you'll see in all the literature, even longer passwords are vulnerable against Markov model based attacks. One way to increase the effectiveness of LFA style attacks is to not try the full alpha numeric character set, (aka drop qxz etc). That's another test I'll try to run later. Even so, once again this shows that Markov models are the way to go. I'd like to point out too, that JtR's Markov model attack almost certainly would perform better if trained against similar passwords. That's another test I'll need to run ... after Halloween weekend.

A Quick Note on Time Required: I think it's important to state that these test runs are extremely short since I didn't want to run each for an entire day just to get a few more data points. My update schedule is already too slow ;) To give you a better idea how long it would take to bruteforce the entire key space here is some numbers I ran using Cain&Able. Note, Cain&Able is much slower than JtR, but it has a really nice time estimator, so please consider these numbers a worst case scenario.

Computer Specs:
2.4 Ghz Core Duo
3 Gigs of Ram
NVIDEA GeForce 8800 GTS
Windows 7 - 64 bit

-All tests were run against cracking plain MD5 hashes

Alpha-Numeric passwords: a-z0-9
  • 6 character: 14 minutes
  • 7 characters: 8 hours
  • 8 characters: 12 days
The Entire US Keyboard
  • 6 characters: 3 days
  • 7 characters: 272 days
  • 8 characters: 71 years
Oh, and keep sending me ideas, either via e-mail or in the comments of these posts.

No comments: