Analysis of 10k Hotmail Passwords Part 3

As promised, let's see how these Hotmail passwords would fare in a real password cracking attack. This post will cover brute force attacks, and I'll make a later post going over the effectiveness of dictionary based attacks. As always, if there is any further attacks/analysis you would like to see run against these passwords, please let me know in the comments.

Test 1: John the Ripper Incremental Modes

The first test I wanted to run was to use John the Ripper's brute force attack, (aka -incremental). As I mentioned in a previous post, JtR's incremental attack is very powerful and probably as close to an "industry" standard as you will find. For this test, I ran JtR's four different built-in incremental modes, (All, Alpha, Digits, Alnum), against the password set, and let each one run for one billion guesses. While in a real password cracking attack, the attacker would probably run their attacks much longer, (aka try trillions of guesses), I figured a run of one billion guesses was enough to give people a general idea of the effectiveness of these attacks. Since the minimum password requirement for Hotmail passwords required legitimate passwords to be at least six characters long, I modified my JtR config file so it would only try guesses of six characters or more. Also, since the built in brute force attack in JtR won't try passwords longer than eight characters long, for the Digits only attack I ran it through Middle Child to add two digits to the end of each guess. This is because JtR would only make one-hundred million guesses otherwise, (aka all digits between 0 and 99,999,999.) I know I can patch JtR to try longer guesses, but I'm lazy.

Character Set of Each Attack Type:

Alpha: a-z, all lowercase
Alnum:a-z 0-9, all lowercase
Digits:0-9
All:a-z A-Z 0-9 and keyboard characters such as !@#$%

Results:

Analysis: As you can see, bruteforcing lowercase characters and numbers was the most effective strategy, cracking a hair over 30% of the passwords. It shouldn't be surprising that it did better than the "All" attack since less than 10% of the passwords contained an uppercase character or a special character. By including those in your attack you drastically increase the search space you are attacking. It's only the fact that JtR's Markov models are so good that the "All" attack remains competitive.

It may appear that the digits attack stopped cracking passwords after around 300 Million guesses, but the truth is that it was running out of passwords to crack. After 1 billion guesses, there were 36 passwords that contained only numbers that hadn't been cracked. This also shows how weak numeric passwords can be, (since there are only ten digits vs. 26 letters).

Test 2: Running a longer cracking session

Next I wanted to try the "All", and the "Alpha Numeric" attacks again, but let them run much longer. In this case I let them run for 25 billion guesses. This took a while even without having to do any hashing just because JtR had to spend time generating the guesses, I had to check/record those guesses, (and I was running it on my laptop).

Results:

Analysis: I think this shows fairly dramatically how front-loaded a password cracking session can be. An attacker will initially see very good results, but after the easiest passwords are cracked it requires many more guesses to crack successive passwords. It almost looks like a logarithmic function. Near the end it was taking over fifty million guesses to crack each additional password. Even if that rate remains constant, (in real life it almost certainly will get worse), you're looking at having to make over 56 billion guesses total to crack 50% of the Hotmail passwords using the Alpha Numeric character set.

Test 3: Effectiveness of password length against brute force attacks

One person wondered in the comments about how password length affects dictionary based password cracking attacks. I'll get to that question in a later post, but I figured I should run some tests to demonstrate how minimum password length requirements can drastically increase the cost of brute force attacks. For this test, I ran John the Ripper's -incremental mode several times, using the Alphanumeric character set. Each session I modified the config file so it would only attempt to crack passwords of a specified length, from 6 to 8 characters long. The results of this, (measuring percentage of passwords of that length cracked), can be seen below.

Results:

Analysis: Quite honestly, if your password is six characters or less, chances are,it's going to be cracked in under a billion guesses. Please note, none of the attacks will ever reach 100% of the passwords cracked since I'm only bruteforcing lowercase letters and numbers, (and there are a few passwords that used uppercase letters and symbols). Once again, we see that cracking 30% of the passwords is fairly easy even when the password length is longer, but as the attacker tries to start cracking 40-50+ of the passwords the effort required really starts to ramp up. This also shows that the minimum password requirement should be at least seven characters long to defend against any sort of offline password cracking attack.

One interesting thing that is hard to see on the graph is that I only needed 31 thousand guesses to crack 5% of the six character long passwords. I needed 60 thousand guesses to crack 5% of the seven character long passwords. Finally I needed 536 thousand guesses to crack 5% of the eight character long passwords. The reason why I'm pointing this out is to look at the password's resistance to an online attack, (aka where the attacker does not have the password hash and is trying to log into the system). Now admittedly in most online attacks the attacker is almost certainly going to use a dictionary attack vs. brute force. That being said, it was surprising that seven character long passwords only took twice as many guesses to hit that 5% mark, and the difficulty level didn't really ramp up until eight character long passwords were required.

Up Next: Dictionary Based Attacks! If there are any dictionaries you would like to see me use, please let me know, either via e-mail or in the comments. (Yes, I'll run the Wiki dictionary). As a quick preview, most of my input dictionaries have been performing horribly since they aren't targeting Spanish/Portuguese/French speakers, and the foreign language dictionaries I've found online have been ... lacking in quality. What I'm going to do is start building a custom dictionary based on these passwords to use in future research, (I've had great success with that when attacking Finnish and Swedish passwords), but that's not really an option here for obvious reasons. "Hey look, I cracked 100% of the passwords using an input dictionary created from those passwords!"

Search This Blog

Reusable Security

Analysis of 10k Hotmail Passwords Part 3 - Brute Force

Comments

Popular posts from this blog

The RockYou 32 Million Password List Top 100

Tool Deep Dive: PRINCE

New Paper on Password Security Metrics