Friday, November 27, 2009

Biometrics Are Not Going to Save Us - Or Get Used to Your Password

It generally is taken as common wisdom that one day, everyone is going to switch to biometrics so we can finally get rid of those pesky passwords. This post is an attempt to stave off that future. This isn't because I'm in the password cracking business, (I'm sure horse-buggy salesmen thought that cars were a poor substitute as well). Instead it's because biometrics are a really bad solution. In fact, over the long run, biometrics will make it even easier for an attacker to break your authentication schemes. The rest of this post is an attempt to explain why this is the case, along with some other reasons why you probably shouldn't go out and buy that thumbprint reader quite yet.

If Biometrics Are So Bad, Why Do People Still Like Them:
Let's face it, passwords suck. This is a blog pretty much devoted to password cracking. I'm aware of the fact that they suck. Passwords are a pain to use, a pain to remember, and pain to make secure. Who doesn't want to get rid of them? At the same time, biometrics seem like the perfect solution. Just about everyone has something to scan, (we'll make a special use case for the double amputee). You won't forget them. You won't leave them at home. User's can't share them. Oh, and with all that fancy equipment, how's a hacker supposed to break it? In addition, it's an authentication scheme that everyone understands. Try to explain public key cryptography to someone, and all you'll get are blank stares. Everyone knows about fingerprints and iris scans. Heck, we use biometrics every day to distinguish between people like Bob and Alice, (though please stop staring at Alice's chest and calling it an authentication scheme...) So why shouldn't we use biometrics with computers to authenticate people?

Identification, not Authentication:
If you've ever had to suffer through a CISSP prep course, you'll probably remember the instructor droning on about the IAA triad, (Identification, Authentication, and Authorization). The thing is, keeping those three ideas separate really is important, since they perform different jobs. Identification is a way to keep multiple users distinct, and as a statement of who you think that person might be - "Hey that looks like Alice". Authentication is where you verify someone's identity - "Hey that is Alice because she knows the secret codeword". Biometrics are extremely useful for identification. Yes, you heard me right - I'm a fan of biometrics when it comes to identifying people in certain cases. It's using biometrics as a form of authentication that I have a problem with.

My very first internship back in 1997 was working on the National Automated Fingerprint Identification System. It was one of the best summers of my life. From moments such as, "Wait these fingerprints were taken from someone who died in the tub and laid there for three weeks - Ewww", to having to pass through an almost "Get Smart" level of security when going to work, it was amazingly fun. Beyond that though, I gained respect and a better understanding of how biometrics like fingerprints can be useful, (though I also obtained a better understanding of false positives). It's not just law enforcement where biometrics as a form of identification is used. From all accounts, biometric identification has also proven vital to supporting our troops in Iraq and Afghanistan. So I want to state for the record I'm not against using biometrics. As I'll go into much more details in the later sections though, due to certain inherent weaknesses, biometrics perform very poorly when used for authentication in computer systems. In short, biometrics for authentication = fail.

The Number #1 Reason Why Biometrics Fail - It's all Digital:
It's easy to get carried away talking about the lack of false positives, liveness sensors, or the inability to duplicate the blood vessels at the back of an eye when discussing the security of biometrics. In reality, very rarely does any of that matter since they are all factors of the biometric sensor itself. This is important. If you can bypass the sensor, then all you are dealing with is a digital representation of the data. Probably the best example of this was the excellent Defcon 15 talk by Zac Franken - "Biometric and Token Based Access Control Systems: Are you protected by two screws and a plastic cover? Probably". In it, he demonstrated grabbing biometric "signatures" from devices and then replaying those signatures to allow him access. It's a lot like a pass-the-hash attack but easier. This gets even worse for remote authentication, where the sensor is completely out of the control of the defender. Now, a defender can try to make the sensor harder to bypass, for example by using some form of trusted computing, but this is a tough battle to fight. Let me put it this way: If we had trusted computing down pat, no-one would be able to jailbreak their I-Phones...

The fact that an attacker is dealing with a digital representation of a biometric signature means the type of biometric measurement, (fingerprint, iris, voice, etc), they are attacking generally doesn't make a difference. The one caveat I want to mention though is that the type of signature can impact the ability to perform guessing attacks. As I'll explain later though, that becomes largely irrelevant as biometric authentication becomes more prevalent.

My Mother Always Said, "Don't Use Your Online Banking Password Everywhere":
I think we all agree that password reuse is a real problem. When was hacked, some of the 4chan'ers tried the disclosed e-mail/passwords on facebook, and then proceeded to create a lot of embarrassing fake posts. Some of those posts are hilarious; until you realize that this happened to real people. I think we can all agree - using the same password for your online bank account and every other site is a bad idea. But what about biometric signatures? I only have ten fingers. What happens when a site that has a copy of my biometric signature gets hacked? Now an attacker has a digital copy of my fingerprint. That can't be good...

The sad fact is that biometric authentication is only somewhat effective as long as it is a nitch market. The second everyone starts using it, you fingerprint/iris-scan/whatever will get disclosed. I wish I was exaggerating, but if that wasn't the case, we wouldn't have any problems with password reuse either. Heck I know my password has been stolen at least once when a gaming website I used to play on was broken into. In reality, my password probably has been stolen more than that, (the Cain&Able forum may have been hacked, the FSU Alumni database was broken into a couple of years ago, etc). Since a biometric signature is digital and can be used in a replay attack, the user needs to have at least the option to select different forms of authentication across different sites.

This is Going to Hurt - Lack of Revocation in Biometric Authentication
This dovetails with the previous point. What happens when a hacker gets your biometric signature? You can't change your fingerprints, (well you can, but it'll probably be fairly painful). How do you stop someone from logging in if they have your password and you can't choose a new one? That's not a rhetorical question. I really don't know the answer to it, and I don't think anyone else does either. In short, even if there is a form of biometric authentication where it is hard/impossible to guess someone's signature, it really doesn't matter because if it achieves widespread deployment your signature is going to get disclosed.

I Was Framed - Lack of Attribution, But Don't Count on the Jury Knowing That:
There's a lot of privacy issues with using biometrics, but I'm going to skip over that. It's not that the privacy issues are unimportant, but that they can lead into conversations/arguments that wildly diverge from the subject at hand. Instead, let me ask you this question. If you were on a jury, would you convict someone if the only evidence against them was fingerprints and/or DNA samples? I hope the answer is no, but I think a vast majority of people would say yes. What happens to Bob when Alice uses his biometric signature to break into the server room and trashes the place? As much as we want them too, biometrics don't provide attribution on their own, and this fact can be used by an attacker to frame someone.

Trusting Fuzzy Hashing to Encrypt Your Harddrive - This Is Going To End Well:
The great thing about passwords is if you don't enter them in exactly, they don't work. Ok, you might not think it's great when you spent ten minutes trying to log in only to find that caps lock was on, but from a cryptographic standpoint it's really nice. You might have some trouble decrypting your AES256 encrypted hard-drive if your password is constantly changing. To combat that, something called fuzzy hashing has been proposed. What fuzzy hashing attempts to do is return the exact same output hash even when the input values are changing slightly from time to time. This fuzzy hash can then be used as the key for your encryption algorithm. The problem is that current fuzzy hashing techniques aren't that good, and have been shown to have multiple problems that make them extremely vulnerable to guessing attacks, (the attacker only needs to get close). In fact, here are two papers on this very subject: 1, 2. Even if that wasn't the case, good luck trying to keep the cops from obtaining your fingerprint. That's actually one of the fundamental flaws with biometrics. Since they are based on something you are, instead of something you know, it becomes very hard to keep them secret. At least with a RSA key, you can hide it under the floorboards.

Finally, fuzzy hashes may not be able to take into account drastic changes in someone's biometric signature. If you cut your finger, do you have to wait for it to heal before you can decrypt your hard-drive? In short, biometrics and encryption don't play very well with each other.

There were quite a few additional arguments I could have made, (hey I didn't even mention this Mythbusters episode), but I'm probably beating a dead horse by now. It's actually gotten to the point where I'm not even a fan of using biometrics as one-half of a two factor authentication setup. It's not that they don't provide some additional security, but that there are better solutions out there. No form of authentication is perfect, but I'm fairly well convinced that biometrics are not the way to go. I hoped this post proved convincing, and if not, please use the comments section below as I'm always open to new ideas.

Tuesday, November 24, 2009

Analysis of 10k Hotmail Passwords Part 5: Markov Model Showdown

Don't worry; I'm still not done with this data-set. A little over a week ago I received an e-mail from Ilya Sokolov, saying:
If I'm getting the numbers right from your graphs - you've got around 3k hashes bruteforced in about 1G guesses. Assuming you used --incremental mode of John, right? I guess you should try --markov too :)
How right he is. Ilya went on to send some statistics my way, so I truly do appreciate e-mails like this. Before I talk about the results, first let me back up and spend a little time talking about the incremental and markov modes in John the Ripper. Aren't they both Markov based attacks?

--Ed Note: The following description of how the incremental attack works has been updated since I was incorrect about how JtR used trigraphs. A copy of the original incorrect description can be found in the comments.

Well surprisingly yes they are, though they go about it in different ways. The --incremental option actually models the probability of trigraphs appearing. In this case a trigraph represents a three character long string, such as "abc". Another way to say this is that JtR uses trigraphs to represent second order Markov probabilities, aka the probability of 'c' given 'ab'. There's special code handling the first two characters, (since they don't have two characters before them), but after that John the Ripper select each successive character based on the previous characters. Therefore, for the guess 'abcdef', it would create the trigraphs 'abc', 'bcd', 'cde', and 'def', and assign a final probability to the guess based on those four different trigraphs. In addition, the probability of a trigraph is also dependent on where it shows up in the password. In this way, the --incremental attack also takes into account factors such as numbers often show up at the end of a password, and uppercase letters usually are at the front. Pretty nice, right? The only downside is that it doesn't generate these guesses in true probability order. I'm still a little fuzzy on some of it, but the incremental mode does make some shortcuts so it can cover the entire keyspace, while still being able to generate a large number of guesses quickly.

I know; Way to much info - especially since this is just my understanding of how it works. I need to spend more time looking at the code to make sure the above is actually correct. The biggest advantage of using the incremental attack is that it creates highly optimized guesses while still retaining the ability to eventually cover the entire key-space if you give it enough time. Aka it will eventually try guesses like 'bD359sl'. Also it is blazingly fast.

The --markov mode on the other hand takes a different approach. First of all, if you haven't noticed this option in your copy of John the Ripper, that is because it was added as a third party patch, (though it's currently part of the jumbo patch set). Personally I haven't used it that much since it full out segfaults on Intel based macs, which I generally use as my development/testing platform. That being said, after running some comparative tests on my Linux box at the lab, (I use that one for my long term password cracking sessions), I've really become a believer in it. However, using the markov mode is a little bit tricky.

Unlike incremental mode, the markov mode does not use trigraphs. Instead it starts with the highest probability starting letter, (in the default stats file this is a 'c'), and then appends the most probable following character. It repeats this until it hits the probability threshold specified, or it reaches 12 characters long, in which case it outputs a password guess. The attack then goes to the last character of the previous guess and changes it to the second most probable value, and outputs another guess. You can probably see where this is going as it cycles through printing out all the possible guesses that meet the probability threshold specified. There are a lot of advantages to this approach. For example, it can attack passwords up to twelve characters long. The downsides of this approach is that it doesn't create guesses in probability order which the incremental attack does. Therefore, as we will see, if you set a high threshold the markov mode starts to resemble a letter frequency analysis attack. Finally, because of this, the markov mode won't cover the entire key-space, (aka, it won't crack the password '3Nw!lswp'). The last point isn't as important as it may seem though, since in all likelihood due to time constraints, (aka you only have several months to crack a password if that), you simply don't have enough time to cover the entire key-space even using incremental mode. In short, you want to do the best that you can with the limited number of guesses you can make.

So that was a really long introduction before we even talked about any results ;) Let's see how the incremental and markov options fare against some real passwords!

Test 6: Trying the default -incremental and -markov modes against the Hotmail password set

For the first test, I wanted to run both the -incremental and -markov modes against the Hotmail password set using their default settings. The one tricky part was to set the limit for the markov attack. In the readme file for the patch, it actually goes into quite a bit of detail on how to select a proper limit. Really all I did was run the included program 'genmkvpwd', and select a limit that would cause the attack to generate slightly more than one billion guesses. This turned out to be the value '214' for the limit.

Results: So here are the results. As always, you can click on the graph to get a slightly larger version. The Y-axis is the number of passwords cracked. Sorry, I didn't realize I forgot to add that until I was proof-reading this and I didn't feel like re-uploading the graphs.

-ed note: I know; It certainly doesn't look like I proof-read these posts ;)

Analysis: As you can see, the Markov mode started out slow, but finished strong, cracking 786 more passwords than the Incremental attack. As far as I can tell, the reason why it did so well near the end was that the markov mode started trying password guesses that began with numbers, which was very common in the hotmail dataset. Also, since the probability of a guess starting with a number was so low according to the default rule-set, it only tried very probable guesses following a number.

Another thing I'd like to point out: Due to the way it creates guesses, for the most part the Markov mode cracks passwords at a fairly fixed rate; while the Incremental mode is much more front-loaded. I'd like to caution you against extending the Markov mode's line though, as it has finished just about every password guess within the limit that I set. That actually leads us to the next test...

Test 7: Setting Different Limits for a Markov Based Attack

So the next question of course is, "How does the limit affect Markov based attacks?" That's a pretty easy one to solve. All I needed to do was run it with several different limits and graph the results. I also included the Incremental results for a comparison.


Analysis: See what I mean about not being able to continue a predictive line on the number of passwords that using JtR in Markov mode will crack? When using a limit of 200, it created slightly more than 250 million guesses and then finished. If we run the same attack using a limit of 214, it will create almost four times as many guesses, but it only cracks 148 more passwords. If we increase the limit to 250, it will generate roughly 46 billion guesses; But if you refer back to some of the previous tests, even a pure brute force attack performs significantly better in the first billion guesses. Don't worry though, as the Markov attack will catch up with brute force, (and then exceed it by a large margin), if you let it run long enough.

What this shows is that you really do need to spend the time to tailor your attacks using Markov mode to the resources you expect to spend cracking the password list you are attacking. One good option is you can split your attack up into multiple parts, where you try all the guesses using a very low limit, and then increase the limit and run the attack again. There is an option in the latest build of the patch to exclude guesses made previously. This can save you time since you don't have to re-hash guesses you made already. There is a slight performance hit by using this approach though as it will make a guess, realize that it made the guess before and reject it, before moving on to the next guess. Still, this can be very useful, especially when attacking salted hashes, as it makes a huge performance improvement to crack as many hashes as possible as soon as possible, (since each hash you are cracking greatly adds to the cracking time).

Test 8: Comparing Incremental and Markov Modes When Trained on the Same Password Set

Now, all of the previous tests are fairly nice, but there is the worry that we are comparing apples and oranges. I mean, both the default Markov and Incremental modes were trained on different password sets, and that can have a huge impact on their cracking ability. For example, the Markov mode was trained primarily on French passwords, and the Incremental attack was mostly trained on English passwords. That's one nice thing about being in the password cracking research business though; I certainly have a lot of passwords to use for my own training set ;)

For this test, I trained both the Incremental and Markov modes on the same set of disclosed passwords from the list that I had previously cracked. In technical terms, I created a new .chr and new stats file for both attacks. As I previously mentioned in part 4 of the writeup, the list isn't ideal for this attack since it is primarily composed of English passwords, while this list is mostly Spanish/Portuguese, but that doesn't matter for this test. What does matter is both attacks are being trained on the same list, and they have to deal with the same limitations. In addition, I ran the Markov attack twice. The first time I allowed it to attack passwords up to 12 characters long. The second time I limited it to only attack passwords up to 8 characters long, (and I assigned it a higher limit so it would generate roughly the same number of guesses). This way we can better compare it against JtR's incremental mode which only tries guesses up to 8 characters long. So, let's see how this turned out:


Analysis: Yup, the Markov mode still cracks more passwords than JtR's Incremental mode. Still, there were a couple of things that surprised me about these results. First of all, the list made a much better training list than using the default .chr and stat files. The Markov attack cracked 540 more passwords using the same number of guesses when it was trained on the phpbb list. The Incremental attack also got off to a much faster start, though in the end it cracked 17 less passwords than the Incremental attack using the default alphanum character set. This isn't exactly a fair comparison though since I didn't bother to filter the training set to only include lowercase letters and numbers.; Aka it actually is using pretty much the entire keyboard. In that case, it performs much better than using JtR's Incremental mode using the 'All' .chr file, and it cracked 333 more passwords overall.

The second thing that surprised me was that limiting the maximum password guess size didn't really have much of an effect on the number of passwords cracked using the Markov mode. This shouldn't have surprised me that much, since it didn't make that many guesses longer than eight characters long, (the probability of those guesses would be very low), but still it helps to see that on a graph.

Future Work: Whew, and here I thought I'd be done with this data-set by now. I still have a few miscellaneous tests I want to run using brute force style methods believe it or not. And then I need to get around to posting my results with dictionary based attacks. Oh ,then there are the topics that I've been meaning to post on for a while. They range from going over other datasets, (such as the ZF0 one), to the vulnerability of World of Warcraft accounts to password disclosure attacks, (and what Blizzard is doing about it). It should be fun ;)

Monday, November 16, 2009

Installing John the Ripper Version Tutorial

I just upgraded to the newest version of John the Ripper so I decided to make a tutorial out of my experience, (with screen-shots), since it was a fairly time consuming ordeal. It's mostly focused on installing John the Ripper on a Mac OSX Snow Leopard, but you should be able to use most of it when installing it to various flavors of Linux as well. Besides going over the base install, I also tried to cover:
  • Which patches to install, where to find them, and how to apply them
  • Picking the right build options
  • Modifying the Makefile so it actually will install on Snow Leopard
  • Modifying the code so you can use incremental attacks against passwords longer than eight characters long
You can find it on my tools page, or by clicking on this link.

Sunday, November 15, 2009

Defcon 17 Videos Posted Online

The title says it all. You can get all of the videos here. Just a warning, I may have used some inappropriate language in my talk on password cracking, so you might not want to watch it in front of small children.

Also, my writeup of a couple of the talks can be found here, and here if you are having trouble deciding what to watch.