Tuesday, December 1, 2009

Google Wave Invite

I've been playing around with Google Wave, and received a couple of extra invites to the free beta. If you are interested, let me know and I'll send one your way.

My short review: It looks like one of those tools where it takes a lot of work to gain any benefit from it. That being said, if you are collaborating with a lot of people on several different projects it has real potential.

Friday, November 27, 2009

Biometrics Are Not Going to Save Us - Or Get Used to Your Password

Abstract:
It generally is taken as common wisdom that one day, everyone is going to switch to biometrics so we can finally get rid of those pesky passwords. This post is an attempt to stave off that future. This isn't because I'm in the password cracking business, (I'm sure horse-buggy salesmen thought that cars were a poor substitute as well). Instead it's because biometrics are a really bad solution. In fact, over the long run, biometrics will make it even easier for an attacker to break your authentication schemes. The rest of this post is an attempt to explain why this is the case, along with some other reasons why you probably shouldn't go out and buy that thumbprint reader quite yet.

If Biometrics Are So Bad, Why Do People Still Like Them:
Let's face it, passwords suck. This is a blog pretty much devoted to password cracking. I'm aware of the fact that they suck. Passwords are a pain to use, a pain to remember, and pain to make secure. Who doesn't want to get rid of them? At the same time, biometrics seem like the perfect solution. Just about everyone has something to scan, (we'll make a special use case for the double amputee). You won't forget them. You won't leave them at home. User's can't share them. Oh, and with all that fancy equipment, how's a hacker supposed to break it? In addition, it's an authentication scheme that everyone understands. Try to explain public key cryptography to someone, and all you'll get are blank stares. Everyone knows about fingerprints and iris scans. Heck, we use biometrics every day to distinguish between people like Bob and Alice, (though please stop staring at Alice's chest and calling it an authentication scheme...) So why shouldn't we use biometrics with computers to authenticate people?

Identification, not Authentication:
If you've ever had to suffer through a CISSP prep course, you'll probably remember the instructor droning on about the IAA triad, (Identification, Authentication, and Authorization). The thing is, keeping those three ideas separate really is important, since they perform different jobs. Identification is a way to keep multiple users distinct, and as a statement of who you think that person might be - "Hey that looks like Alice". Authentication is where you verify someone's identity - "Hey that is Alice because she knows the secret codeword". Biometrics are extremely useful for identification. Yes, you heard me right - I'm a fan of biometrics when it comes to identifying people in certain cases. It's using biometrics as a form of authentication that I have a problem with.

My very first internship back in 1997 was working on the National Automated Fingerprint Identification System. It was one of the best summers of my life. From moments such as, "Wait these fingerprints were taken from someone who died in the tub and laid there for three weeks - Ewww", to having to pass through an almost "Get Smart" level of security when going to work, it was amazingly fun. Beyond that though, I gained respect and a better understanding of how biometrics like fingerprints can be useful, (though I also obtained a better understanding of false positives). It's not just law enforcement where biometrics as a form of identification is used. From all accounts, biometric identification has also proven vital to supporting our troops in Iraq and Afghanistan. So I want to state for the record I'm not against using biometrics. As I'll go into much more details in the later sections though, due to certain inherent weaknesses, biometrics perform very poorly when used for authentication in computer systems. In short, biometrics for authentication = fail.

The Number #1 Reason Why Biometrics Fail - It's all Digital:
It's easy to get carried away talking about the lack of false positives, liveness sensors, or the inability to duplicate the blood vessels at the back of an eye when discussing the security of biometrics. In reality, very rarely does any of that matter since they are all factors of the biometric sensor itself. This is important. If you can bypass the sensor, then all you are dealing with is a digital representation of the data. Probably the best example of this was the excellent Defcon 15 talk by Zac Franken - "Biometric and Token Based Access Control Systems: Are you protected by two screws and a plastic cover? Probably". In it, he demonstrated grabbing biometric "signatures" from devices and then replaying those signatures to allow him access. It's a lot like a pass-the-hash attack but easier. This gets even worse for remote authentication, where the sensor is completely out of the control of the defender. Now, a defender can try to make the sensor harder to bypass, for example by using some form of trusted computing, but this is a tough battle to fight. Let me put it this way: If we had trusted computing down pat, no-one would be able to jailbreak their I-Phones...

The fact that an attacker is dealing with a digital representation of a biometric signature means the type of biometric measurement, (fingerprint, iris, voice, etc), they are attacking generally doesn't make a difference. The one caveat I want to mention though is that the type of signature can impact the ability to perform guessing attacks. As I'll explain later though, that becomes largely irrelevant as biometric authentication becomes more prevalent.

My Mother Always Said, "Don't Use Your Online Banking Password Everywhere":
I think we all agree that password reuse is a real problem. When singles.org was hacked, some of the 4chan'ers tried the disclosed e-mail/passwords on facebook, and then proceeded to create a lot of embarrassing fake posts. Some of those posts are hilarious; until you realize that this happened to real people. I think we can all agree - using the same password for your online bank account and every other site is a bad idea. But what about biometric signatures? I only have ten fingers. What happens when a site that has a copy of my biometric signature gets hacked? Now an attacker has a digital copy of my fingerprint. That can't be good...

The sad fact is that biometric authentication is only somewhat effective as long as it is a nitch market. The second everyone starts using it, you fingerprint/iris-scan/whatever will get disclosed. I wish I was exaggerating, but if that wasn't the case, we wouldn't have any problems with password reuse either. Heck I know my password has been stolen at least once when a gaming website I used to play on was broken into. In reality, my password probably has been stolen more than that, (the Cain&Able forum may have been hacked, the FSU Alumni database was broken into a couple of years ago, etc). Since a biometric signature is digital and can be used in a replay attack, the user needs to have at least the option to select different forms of authentication across different sites.

This is Going to Hurt - Lack of Revocation in Biometric Authentication
This dovetails with the previous point. What happens when a hacker gets your biometric signature? You can't change your fingerprints, (well you can, but it'll probably be fairly painful). How do you stop someone from logging in if they have your password and you can't choose a new one? That's not a rhetorical question. I really don't know the answer to it, and I don't think anyone else does either. In short, even if there is a form of biometric authentication where it is hard/impossible to guess someone's signature, it really doesn't matter because if it achieves widespread deployment your signature is going to get disclosed.

I Was Framed - Lack of Attribution, But Don't Count on the Jury Knowing That:
There's a lot of privacy issues with using biometrics, but I'm going to skip over that. It's not that the privacy issues are unimportant, but that they can lead into conversations/arguments that wildly diverge from the subject at hand. Instead, let me ask you this question. If you were on a jury, would you convict someone if the only evidence against them was fingerprints and/or DNA samples? I hope the answer is no, but I think a vast majority of people would say yes. What happens to Bob when Alice uses his biometric signature to break into the server room and trashes the place? As much as we want them too, biometrics don't provide attribution on their own, and this fact can be used by an attacker to frame someone.

Trusting Fuzzy Hashing to Encrypt Your Harddrive - This Is Going To End Well:
The great thing about passwords is if you don't enter them in exactly, they don't work. Ok, you might not think it's great when you spent ten minutes trying to log in only to find that caps lock was on, but from a cryptographic standpoint it's really nice. You might have some trouble decrypting your AES256 encrypted hard-drive if your password is constantly changing. To combat that, something called fuzzy hashing has been proposed. What fuzzy hashing attempts to do is return the exact same output hash even when the input values are changing slightly from time to time. This fuzzy hash can then be used as the key for your encryption algorithm. The problem is that current fuzzy hashing techniques aren't that good, and have been shown to have multiple problems that make them extremely vulnerable to guessing attacks, (the attacker only needs to get close). In fact, here are two papers on this very subject: 1, 2. Even if that wasn't the case, good luck trying to keep the cops from obtaining your fingerprint. That's actually one of the fundamental flaws with biometrics. Since they are based on something you are, instead of something you know, it becomes very hard to keep them secret. At least with a RSA key, you can hide it under the floorboards.

Finally, fuzzy hashes may not be able to take into account drastic changes in someone's biometric signature. If you cut your finger, do you have to wait for it to heal before you can decrypt your hard-drive? In short, biometrics and encryption don't play very well with each other.

Conclusion:
There were quite a few additional arguments I could have made, (hey I didn't even mention this Mythbusters episode), but I'm probably beating a dead horse by now. It's actually gotten to the point where I'm not even a fan of using biometrics as one-half of a two factor authentication setup. It's not that they don't provide some additional security, but that there are better solutions out there. No form of authentication is perfect, but I'm fairly well convinced that biometrics are not the way to go. I hoped this post proved convincing, and if not, please use the comments section below as I'm always open to new ideas.

Tuesday, November 24, 2009

Analysis of 10k Hotmail Passwords Part 5: Markov Model Showdown

Don't worry; I'm still not done with this data-set. A little over a week ago I received an e-mail from Ilya Sokolov, saying:
If I'm getting the numbers right from your graphs - you've got around 3k hashes bruteforced in about 1G guesses. Assuming you used --incremental mode of John, right? I guess you should try --markov too :)
How right he is. Ilya went on to send some statistics my way, so I truly do appreciate e-mails like this. Before I talk about the results, first let me back up and spend a little time talking about the incremental and markov modes in John the Ripper. Aren't they both Markov based attacks?

Well surprisingly yes they are, though they go about it in different ways. The --incremental option actually models the probability of trigraphs appearing. In this case a trigraph represents a 1-3 character long string, such as "abc". It then uses these trigraphs to model the conditional probability of three letter groupings, (aka their Markov probability). The problem is that JtR's incremental attack doesn't model the conditional probability of trigraphs appearing together. That certainly is not a deal breaker, (as seen from the previous results). The incremental attack takes into account the password length and where a trigraph appears in a password guess. Aka, the probability that a particular trigraph will be assigned depends on if it appears at the beginning, middle or end of a password guess. Unfortunatly it doesn't take into account the previous trigraphs that appeared before it, (except when calculating the overall probability). For example, if the first trigraph is "auq", the next trigraph's probability isn't increased if it starts with a 'u', (even though 'u' almost always follows 'q'). Therefore, the incremental attack doesn't measure the conditional probability between the third and fourth letter, and the sixth and seventh letter of a password guess.

I know; Way to much info - especially since this is just my understanding of how it works. I need to spend more time looking at the code to make sure the above is actually correct. The biggest advantage of using the incremental attack is that it creates highly optimized guesses while still retaining the ability to eventually cover the entire key-space if you give it enough time. Aka it will eventually try guesses like 'bD359sl'. Also it is blazingly fast.

The --markov mode on the other hand takes a different approach. First of all, if you haven't noticed this option in your copy of John the Ripper, that is because it was added as a third party patch, (though it's currently part of the jumbo patch set). Personally I haven't used it that much since it full out segfaults on Intel based macs, which I generally use as my development/testing platform. That being said, after running some comparative tests on my Linux box at the lab, (I use that one for my long term password cracking sessions), I've really become a believer in it. However, using the markov mode is a little bit tricky.

Unlike incremental mode, the markov mode does not use trigraphs. Instead it starts with the highest probability starting letter, (in the default stats file this is a 'c'), and then appends the most probable following character. It repeats this until it hits the probability threshold specified, or it reaches 12 characters long, in which case it outputs a password guess. The attack then goes to the last character of the previous guess and changes it to the second most probable value, and outputs another guess. You can probably see where this is going as it cycles through printing out all the possible guesses that meet the probability threshold specified. There are a lot of advantages to this approach. For example, it can attack passwords up to twelve characters long, and it provides a more accurate model of how people create passwords than the incremental mode. The downsides of this approach is that it doesn't create guesses in probability order which the incremental attack does. Therefore, as we will see, if you set a high threshold the markov mode starts to resemble a letter frequency analysis attack. Finally, because of this, the markov mode won't cover the entire key-space, (aka, it won't crack the password '3Nw!lswp'). The last point isn't as important as it may seem though, since in all likelihood due to time constraints, (aka you only have several months to crack a password if that), you simply don't have enough time to cover the entire key-space even using the incremental mode. Therefore you want to do the best that you can with the limited number of guesses you can make.

So that was a really long introduction before we even talked about any results ;) Let's see how the incremental and markov options fare against some real passwords!

Test 6: Trying the default -incremental and -markov modes against the Hotmail password set

For the first test, I wanted to run both the -incremental and -markov modes against the Hotmail password set using their default settings. The one tricky part was to set the limit for the markov attack. In the readme file for the patch, it actually goes into quite a bit of detail on how to select a proper limit. Really all I did was run the included program 'genmkvpwd', and select a limit that would cause the attack to generate slightly more than one billion guesses. This turned out to be the value '214' for the limit.

Results: So here are the results. As always, you can click on the graph to get a slightly larger version. The Y-axis is the number of passwords cracked. Sorry, I didn't realize I forgot to add that until I was proof-reading this and I didn't feel like re-uploading the graphs.

-ed note: I know; It certainly doesn't look like I proof-read these posts ;)



Analysis: As you can see, the Markov mode started out slow, but finished strong, cracking 786 more passwords than the Incremental attack. As far as I can tell, the reason why it did so well near the end was that the markov mode started trying password guesses that began with numbers, which was very common in the hotmail dataset. Also, since the probability of a guess starting with a number was so low according to the default rule-set, it only tried very probable guesses following a number.

Another thing I'd like to point out: Due to the way it creates guesses, for the most part the Markov mode cracks passwords at a fairly fixed rate; while the Incremental mode is much more front-loaded. I'd like to caution you against extending the Markov mode's line though, as it has finished just about every password guess within the limit that I set. That actually leads us to the next test...

Test 7: Setting Different Limits for a Markov Based Attack

So the next question of course is, "How does the limit affect Markov based attacks?" That's a pretty easy one to solve. All I needed to do was run it with several different limits and graph the results. I also included the Incremental results for a comparison.

Results:


Analysis: See what I mean about not being able to continue a predictive line on the number of passwords that using JtR in Markov mode will crack? When using a limit of 200, it created slightly more than 250 million guesses and then finished. If we run the same attack using a limit of 214, it will create almost four times as many guesses, but it only cracks 148 more passwords. If we increase the limit to 250, it will generate roughly 46 billion guesses; But if you refer back to some of the previous tests, even a pure brute force attack performs significantly better in the first billion guesses. Don't worry though, as the Markov attack will catch up with brute force, (and then exceed it by a large margin), if you let it run long enough.

What this shows is that you really do need to spend the time to tailor your attacks using Markov mode to the resources you expect to spend cracking the password list you are attacking. One good option is you can split your attack up into multiple parts, where you try all the guesses using a very low limit, and then increase the limit and run the attack again. There is an option in the latest build of the patch to exclude guesses made previously. This can save you time since you don't have to re-hash guesses you made already. There is a slight performance hit by using this approach though as it will make a guess, realize that it made the guess before and reject it, before moving on to the next guess. Still, this can be very useful, especially when attacking salted hashes, as it makes a huge performance improvement to crack as many hashes as possible as soon as possible, (since each hash you are cracking greatly adds to the cracking time).

Test 8: Comparing Incremental and Markov Modes When Trained on the Same Password Set

Now, all of the previous tests are fairly nice, but there is the worry that we are comparing apples and oranges. I mean, both the default Markov and Incremental modes were trained on different password sets, and that can have a huge impact on their cracking ability. For example, the Markov mode was trained primarily on French passwords, and the Incremental attack was mostly trained on English passwords. That's one nice thing about being in the password cracking research business though; I certainly have a lot of passwords to use for my own training set ;)

For this test, I trained both the Incremental and Markov modes on the same set of disclosed passwords from the phpbb.com list that I had previously cracked. In technical terms, I created a new .chr and new stats file for both attacks. As I previously mentioned in part 4 of the writeup, the phpbb.com list isn't ideal for this attack since it is primarily composed of English passwords, while this list is mostly Spanish/Portuguese, but that doesn't matter for this test. What does matter is both attacks are being trained on the same list, and they have to deal with the same limitations. In addition, I ran the Markov attack twice. The first time I allowed it to attack passwords up to 12 characters long. The second time I limited it to only attack passwords up to 8 characters long, (and I assigned it a higher limit so it would generate roughly the same number of guesses). This way we can better compare it against JtR's incremental mode which only tries guesses up to 8 characters long. So, let's see how this turned out:

Results:


Analysis: Yup, the Markov mode still cracks more passwords than JtR's Incremental mode. Still, there were a couple of things that surprised me about these results. First of all, the phpbb.com list made a much better training list than using the default .chr and stat files. The Markov attack cracked 540 more passwords using the same number of guesses when it was trained on the phpbb list. The Incremental attack also got off to a much faster start, though in the end it cracked 17 less passwords than the Incremental attack using the default alphanum character set. This isn't exactly a fair comparison though since I didn't bother to filter the training set to only include lowercase letters and numbers.; Aka it actually is using pretty much the entire keyboard. In that case, it performs much better than using JtR's Incremental mode using the 'All' .chr file, and it cracked 333 more passwords overall.

The second thing that surprised me was that limiting the maximum password guess size didn't really have much of an effect on the number of passwords cracked using the Markov mode. This shouldn't have surprised me that much, since it didn't make that many guesses longer than eight characters long, (the probability of those guesses would be very low), but still it helps to see that on a graph.

Future Work: Whew, and here I thought I'd be done with this data-set by now. I still have a few miscellaneous tests I want to run using brute force style methods believe it or not. And then I need to get around to posting my results with dictionary based attacks. Oh ,then there are the topics that I've been meaning to post on for a while. They range from going over other datasets, (such as the ZF0 one), to the vulnerability of World of Warcraft accounts to password disclosure attacks, (and what Blizzard is doing about it). It should be fun ;)

Monday, November 16, 2009

Installing John the Ripper Version 1.7.3.4 Tutorial

I just upgraded to the newest version of John the Ripper so I decided to make a tutorial out of my experience, (with screen-shots), since it was a fairly time consuming ordeal. It's mostly focused on installing John the Ripper on a Mac OSX Snow Leopard, but you should be able to use most of it when installing it to various flavors of Linux as well. Besides going over the base install, I also tried to cover:
  • Which patches to install, where to find them, and how to apply them
  • Picking the right build options
  • Modifying the Makefile so it actually will install on Snow Leopard
  • Modifying the code so you can use incremental attacks against passwords longer than eight characters long
You can find it on my tools page, or by clicking on this link.

Sunday, November 15, 2009

Defcon 17 Videos Posted Online

The title says it all. You can get all of the videos here. Just a warning, I may have used some inappropriate language in my talk on password cracking, so you might not want to watch it in front of small children.

Also, my writeup of a couple of the talks can be found here, and here if you are having trouble deciding what to watch.

Thursday, October 29, 2009

Analysis of 10k Hotmail Passwords - Even More Brute Force

A reader asked me through e-mail how much better John the Ripper's Markov models were compared to pure brute force or letter frequency analysis. I knew there was a reason why I put my e-mail address on the side of this blog. That's a great question, since while I'd always had more success with Markov models vs letter frequency analysis, (and certainly brute force), I had never measured the difference before. What type of researcher am I? I better fix that, so let's check it out.

Test 4: Markov Models vs. Letter Frequency Analysis vs. Pure Brute Force

So in this test I reused the data collected previously in Test 1 using JtR's -incremental mode targeting lowercase letters and numbers, (a-z0-9). I then used the popular tool crunch to run both the brute force and letter frequency analysis, (which I'm going to call LFA), attacks since JtR doesn't support pure brute force, (well there is a bit of a hack, but crunch is easier). For the pure brute force attack I ran:

./crunch 6 8 abcdefghijklmnopqrstuvwxyz0123456789

In short, I'm creating brute force guesses between 6 and 8 characters long containing a-z0-9, starting with aaaaaa and theoretically ending with 99999999, (in this run it never finished all of the six character long words). This is also the default character set that the popular password cracker Cain&Able uses.

For the LFA attack, I then ran it using the numbers based on my analysis of the phpbb.com password list. As we'll see, you can do better if you base your LFA on analysis of Spanish/Portuguese passwords, but I don't have another dataset of Spanish/Portuguese passwords so this is the best I can do without cheating. The command I used was:

./crunch 6 8 smp1abctdkjlhrfgnw2ei0ov3q457z968yux

A couple of notes. I used the first character LFA charset since crunch increments it's values from right to left; aka it will try 01, 02, 03, 04 etc. This means using the above character set, it will initially crack all the words starting with 's', then move on to words starting with a 'm' and so on. When there is no password creation policy, (aka users aren't forced to include numbers in their passwords), this is slightly better than password crackers such as Cain&Able who increment their values from left to right, aka 10, 20, 30, 40 ... The reason for this is because people are slightly more predicable in how they start words/passwords, vs how the end them. For example, from the phpbb.com dataset, over 51% of the users choose one of top ten characters to start their passwords. Likewise, 46% of users choose one of top ten characters to end their passwords. It's not a big difference, but every little bit helps. If there is a password creation policy though, this quickly falls apart, and it's vastly more preferable to use a left to right approach since people almost always put numbers/special characters at the end of their passwords. For example, just the number '1' appears 21% of the time at the end of stronger passwords.

Also, the more astute, (and/or skeptical), readers might notice the order I'm using is slightly different from the one I published before. That's because I've cracked more passwords from the phpbb.com dataset, (I'm nearly up to 98% of the total set cracked), so I've updated some of my statistics. That's another thing I need to get around to posting up here...

Results: Note, you can click on the image for a bigger version of it.


Analysis: The first thing that stands out is that my letter frequency analysis percentages are way off. I actually do worse initially then if I had chosen a pure brute force approach. I know it's cheating, but going back to my original analysis of the hotmail list, and comparing it to the phpbb list, you get the following:

Hotmail list: a1mbc2sp0lterdjfgn3hi6k759vo48ywzuqx
Phpbb list : smp1abctdkjlhrfgnw2ei0ov3q457z968yux

Looking at this, the best choice you could have done was to start with the letter 'a' which is coincidentally what the pure brute force approach did. The reason for the huge bump in the LFA attack around the 220 million guess mark is because it hit the '1' as the first character, followed by an 'a' which was about the equivalent of a perfect storm since people started their passwords with a number, followed by the actual word they were using. Even with my probabilities being off though, LFA still beat out pure brute force, so if you aren't at least using that, well I don't know what more I can say to convince you. That being said, this really shows how superior using Markov models is to both attacks, with the alphanumeric Markov attack cracking more than twice the amount of passwords that the LFA did. I'd also like to point out that if you look at the first several million guesses, a Markov based attack performs so much better than LFA it's ridiculous.

Test 5: LFA and Brute Force Attacks Against Different Length Passwords

The next question is how do LFA and Brute Force attacks perform as the password gets longer. As I said before, the above test only shows them attacking 6 character long passwords. What happens when we bump up the length to 7 or 8 characters minimum. To simulate this, I once again used crunch with the above settings except for increasing the minimum size guesses allowed. I also used the phpbb LFA probabilities, since I hate training and testing on the same dataset, (also I had already run the tests before I graphed out the results and realized how off those probabilities were...)

Results:



Analysis: I didn't even bother to put the eight character one up there since it was pretty boring. If I'm generous the pure brute force attack would crack one eight character long password, (it took slightly more than one billion guesses to do so), and the LFA attack would crack two. This shows though that while increasing the character set quickly makes brute force and LFA attacks infeasible, which is what you'll see in all the literature, even longer passwords are vulnerable against Markov model based attacks. One way to increase the effectiveness of LFA style attacks is to not try the full alpha numeric character set, (aka drop qxz etc). That's another test I'll try to run later. Even so, once again this shows that Markov models are the way to go. I'd like to point out too, that JtR's Markov model attack almost certainly would perform better if trained against similar passwords. That's another test I'll need to run ... after Halloween weekend.

A Quick Note on Time Required: I think it's important to state that these test runs are extremely short since I didn't want to run each for an entire day just to get a few more data points. My update schedule is already too slow ;) To give you a better idea how long it would take to bruteforce the entire key space here is some numbers I ran using Cain&Able. Note, Cain&Able is much slower than JtR, but it has a really nice time estimator, so please consider these numbers a worst case scenario.

Computer Specs:
2.4 Ghz Core Duo
3 Gigs of Ram
NVIDEA GeForce 8800 GTS
Windows 7 - 64 bit

-All tests were run against cracking plain MD5 hashes

Alpha-Numeric passwords: a-z0-9
  • 6 character: 14 minutes
  • 7 characters: 8 hours
  • 8 characters: 12 days
The Entire US Keyboard
  • 6 characters: 3 days
  • 7 characters: 272 days
  • 8 characters: 71 years
Oh, and keep sending me ideas, either via e-mail or in the comments of these posts.

Sunday, October 18, 2009

Analysis of 10k Hotmail Passwords Part 3 - Brute Force

As promised, let's see how these Hotmail passwords would fare in a real password cracking attack. This post will cover brute force attacks, and I'll make a later post going over the effectiveness of dictionary based attacks. As always, if there is any further attacks/analysis you would like to see run against these passwords, please let me know in the comments.

Test 1: John the Ripper Incremental Modes

The first test I wanted to run was to use John the Ripper's brute force attack, (aka -incremental). As I mentioned in a previous post, JtR's incremental attack is very powerful and probably as close to an "industry" standard as you will find. For this test, I ran JtR's four different built-in incremental modes, (All, Alpha, Digits, Alnum), against the password set, and let each one run for one billion guesses. While in a real password cracking attack, the attacker would probably run their attacks much longer, (aka try trillions of guesses), I figured a run of one billion guesses was enough to give people a general idea of the effectiveness of these attacks. Since the minimum password requirement for Hotmail passwords required legitimate passwords to be at least six characters long, I modified my JtR config file so it would only try guesses of six characters or more. Also, since the built in brute force attack in JtR won't try passwords longer than eight characters long, for the Digits only attack I ran it through Middle Child to add two digits to the end of each guess. This is because JtR would only make one-hundred million guesses otherwise, (aka all digits between 0 and 99,999,999.) I know I can patch JtR to try longer guesses, but I'm lazy.

Character Set of Each Attack Type:
  • Alpha: a-z, all lowercase
  • Alnum:a-z 0-9, all lowercase
  • Digits:0-9
  • All:a-z A-Z 0-9 and keyboard characters such as !@#$%
Results:



Analysis: As you can see, bruteforcing lowercase characters and numbers was the most effective strategy, cracking a hair over 30% of the passwords. It shouldn't be surprising that it did better than the "All" attack since less than 10% of the passwords contained an uppercase character or a special character. By including those in your attack you drastically increase the search space you are attacking. It's only the fact that JtR's Markov models are so good that the "All" attack remains competitive.

It may appear that the digits attack stopped cracking passwords after around 300 Million guesses, but the truth is that it was running out of passwords to crack. After 1 billion guesses, there were 36 passwords that contained only numbers that hadn't been cracked. This also shows how weak numeric passwords can be, (since there are only ten digits vs. 26 letters).

Test 2: Running a longer cracking session

Next I wanted to try the "All", and the "Alpha Numeric" attacks again, but let them run much longer. In this case I let them run for 25 billion guesses. This took a while even without having to do any hashing just because JtR had to spend time generating the guesses, I had to check/record those guesses, (and I was running it on my laptop).

Results:
















Analysis: I think this shows fairly dramatically how front-loaded a password cracking session can be. An attacker will initially see very good results, but after the easiest passwords are cracked it requires many more guesses to crack successive passwords. It almost looks like a logarithmic function. Near the end it was taking over fifty million guesses to crack each additional password. Even if that rate remains constant, (in real life it almost certainly will get worse), you're looking at having to make over 56 billion guesses total to crack 50% of the Hotmail passwords using the Alpha Numeric character set.

Test 3: Effectiveness of password length against brute force attacks

One person wondered in the comments about how password length affects dictionary based password cracking attacks. I'll get to that question in a later post, but I figured I should run some tests to demonstrate how minimum password length requirements can drastically increase the cost of brute force attacks. For this test, I ran John the Ripper's -incremental mode several times, using the Alphanumeric character set. Each session I modified the config file so it would only attempt to crack passwords of a specified length, from 6 to 8 characters long. The results of this, (measuring percentage of passwords of that length cracked), can be seen below.

Results:
















Analysis: Quite honestly, if your password is six characters or less, chances are,it's going to be cracked in under a billion guesses. Please note, none of the attacks will ever reach 100% of the passwords cracked since I'm only bruteforcing lowercase letters and numbers, (and there are a few passwords that used uppercase letters and symbols). Once again, we see that cracking 30% of the passwords is fairly easy even when the password length is longer, but as the attacker tries to start cracking 40-50+ of the passwords the effort required really starts to ramp up. This also shows that the minimum password requirement should be at least seven characters long to defend against any sort of offline password cracking attack.

One interesting thing that is hard to see on the graph is that I only needed 31 thousand guesses to crack 5% of the six character long passwords. I needed 60 thousand guesses to crack 5% of the seven character long passwords. Finally I needed 536 thousand guesses to crack 5% of the eight character long passwords. The reason why I'm pointing this out is to look at the password's resistance to an online attack, (aka where the attacker does not have the password hash and is trying to log into the system). Now admittedly in most online attacks the attacker is almost certainly going to use a dictionary attack vs. brute force. That being said, it was surprising that seven character long passwords only took twice as many guesses to hit that 5% mark, and the difficulty level didn't really ramp up until eight character long passwords were required.

Up Next: Dictionary Based Attacks! If there are any dictionaries you would like to see me use, please let me know, either via e-mail or in the comments. (Yes, I'll run the Wiki dictionary). As a quick preview, most of my input dictionaries have been performing horribly since they aren't targeting Spanish/Portuguese/French speakers, and the foreign language dictionaries I've found online have been ... lacking in quality. What I'm going to do is start building a custom dictionary based on these passwords to use in future research, (I've had great success with that when attacking Finnish and Swedish passwords), but that's not really an option here for obvious reasons. "Hey look, I cracked 100% of the passwords using an input dictionary created from those passwords!"