Tuesday, December 29, 2009

The RockYou 32 Million Password List Top 100

But first, a quick responses to one of the previous comments, (since it really did merit a front-page post).

Tfcx posted:
The initial vulnerability was posted 29th November on a hacking forum called darkc0de here: http://forum.darkc0de.com/index.php?action=vthread&forum=11&topic=13082
Thanks, as that really helps narrow down the timeframe, (and reading that post and related posts was interesting if a bit depressing). The hack itself appears pretty straightforward once you see it, (like most things once the solution is presented to you it's easy, but finding it in the first place is hard). I'm still interested in the hacker Igigi, and have been tossing about all sorts of theories; but I'll refrain from posting them here since they are all pure WAGs right now.

Now on to the main topic: Per Thorsheim wrote:
I would like to see a comparison of Twitters 370 banned passwords against the top 370 or so passwords stolen from rockyou (http://www.techcrunch.com/2009/12/27/twitter-banned-passwords/)
Happy to oblige! To start things off, here are the top 100 most frequently used passwords from the RockYou list. I then bolded any of the passwords that did not appear in Twitter's blacklist. As a side note: Yes I do realize I need to modify my website code to allow support for expandable post summaries. Wow do I ever miss Livejournal...

Rank | Num of Occurrences | Password
--------------------------------------------------------------
1 290729 123456
2 79076 12345
3 76789 123456789
4 59462 password
5 49952 iloveyou
6 33291 princess
7 21725 1234567
8 20901 rockyou
9 20553 12345678
10 16648 abc123
11 16227 nicole
12 15308 daniel
13 15163 babygirl
14 14726 monkey
15 14331 lovely
16 14103 jessica
17 13984 654321
18 13981 michael
19 13488 ashley
20 13456 qwerty
21 13272 111111
22 13134 iloveu
23 13028 000000
24 12714 michelle
25 11761 tigger
26 11489 sunshine
27 11289 chocolate
28 11112 password1
29 10836 soccer
30 10755 anthony
31 10731 friends
32 10560 butterfly
33 10547 purple
34 10508 angel
35 10167 jordan
36 9764 liverpool
37 9708 justin
38 9704 loveme
39 9610 fuckyou
40 9516 123123
41 9462 football
42 9310 secret
43 9153 andrea
44 9053 carlos
45 8976 jennifer
46 8960 joshua
47 8756 bubbles
48 8676 1234567890
49 8667 superman
50 8631 hannah
51 8537 amanda
52 8499 loveyou
53 8462 pretty
54 8404 basketball
55 8360 andrew
56 8310 angels
57 8285 tweety
58 8269 flower
59 8025 playboy
60 7901 hello
61 7866 elizabeth
62 7792 hottie
63 7766 tinkerbell
64 7735 charlie
65 7717 samantha
66 7654 barbie
67 7645 chelsea
68 7564 lovers
69 7536 teamo
70 7518 jasmine
71 7500 brandon
72 7419 666666
73 7333 shadow
74 7301 melissa
75 7241 eminem
76 7222 matthew
77 7206 robert
78 7148 danielle
79 7116 forever
80 6979 family
81 6775 jonathan
82 6658 987654321
83 6653 computer
84 6647 whatever
85 6598 dragon
86 6570 vanessa
87 6554 cookie
88 6547 naruto
89 6501 summer
90 6420 sweety
91 6390 spongebob
92 6320 joseph
93 6272 junior
94 6215 softball
95 6131 taylor
96 6111 yellow
97 6080 daniela
98 6079 lauren
99 6068 mickey
100 6027 princesa


Analysis: I'm hesitant to post the top 370 passwords due to privacy concerns, (also 100 is such a nice round number), but I figure this should give a good overview of the coverage of the 370 passwords that are blacklisted by Twitter. A grand total of 38 of the top 100 passwords did not appear in the Twitter blacklist. That actually really surprised me, as I expected the Twitter blacklist to perform better. So I guess what I'm trying to say is good question ;)

Just to save everyone from the math, if an attacker tried the top 100 passwords as guesses, they would have been able to crack 1,483,668 passwords from the dataset, or 4.5% of the total passwords. If the Twitter blacklist had been in place, and the attacker still tried the same 100 guesses, they would have only cracked 475,046 passwords, or 1.4% of the total passwords. So a blacklisting approach certainly can help against online password attacks, (where the attacker is severally limited in the number of guesses they can make). That being said, the Twitter list probably shouldn't be considered the gold standard as there are a lot of improvements that can be made to it.

Well, that's one question down. Keep them coming!

Thursday, December 24, 2009

RockYou Hacked: 32 Million, (yes that's Million), Passwords Stolen

As the title implies, the popular Facebook and MySpace game/widget maker RockYou was hacked, with the hack becoming public last week Tuesday, December 15th. What's worse is that RockYou stored all of their passwords in the clear, (no hashing), so 32 million plaintext passwords were stolen. I've been doing some digging into this so I can add something to the conversation, but for a great general overview I highly recommend reading TechCrunch's writeup.

First of all, if you have ever used the following social networking applications, you probably should change your password ... like right now.
  • Slideshow
  • Uploadphoto
  • Photofx
  • Glittertext
  • Funnotes
  • Countdown
  • Superhug
  • Myspace layouts
  • Stickers
  • Superwall
  • Pieces of flair
  • Speedracing
  • Likeness
  • Hugme
  • Birthday cards
Yup, that's why we're talking about 32 million user accounts, (though in all fairness, many of those user accounts are almost certainly duplicates created by the same person).

One day after the attack became public, the hacker gave an interview over at readwriteweb talking about password security in general. He, (I'm going to assume he's a guy since the nickname he gave was Tom), basically said around 30% of all websites still store user credentials in plain text. He also said the most important thing users can do is choose different passwords for different sites since they can't rely on those sites to protect them.

I'll certainly agree with that. I'll also agree that Rockyou should be held responsible for A) Not storing passwords securely, and more importantly B) Trying to cover up the hack after the fact. When Rockyou tried to downplay the impact of the hack, (by claiming it only affected some of their older programs), it was really a sleezeball move. That being said, I can't condone the Hacker, (Tom), either, since his behavior is unacceptable. It's one thing to point out a vulnerability. It's another thing to exploit it.

Probably the most interesting part of the interview was that they referred to the hacker's blog. Apparently he also goes by the handle 'igigi'. I guess that sounds better than Tom ;) As far as I can tell, he choose igigi because that's what ancient Sumerians called angels. Either that or he's really into plus sized clothing. Now is it just me, or is it weird that someone actively hacking into websites is giving interviews and keeping a blog?

The thing is, igigi probably wasn't the first person to hack into RockYou. According to Imperva, an internet security company, they originally discovered the exploit being talked about on a hacking forum about a week earlier, and several of the webmail accounts associated with those logins have since been flagged as being hijacked by spammers. Looking at igigi's blog, his first entry was December 6th 2009. I guess what I'm trying to say is there's a definite possibility that he stumbled upon an exploit that someone else had posted and decided to cash in on some fame. Either that or he isn't as white/grey hat as he claims to be. RockYou is the only English site he's claimed to hack; All of the other sites have been Slovakian or Czechian which gives you an idea of where he's from, (and/or which forums he reads). I did some additional searches and I haven't been able to find any trace of igigi before December, though my ability to do research into the other sites he's hacked has been limited because Google's Czech to English translator leaves a lot to be desired.

All this is just a run-up to the news that on Monday igigi posted the entire password list, minus the usernames and e-mail addresses. The original rapidshare links no longer work, (or I wouldn't have posted a link to his blog), but the list certainly is still out there on the net if you know where to look. Just a warning though, there are also some copies of it that contain nothing but viruses and fail. Here's a hint, if the list claims to be a self-extracting executable, don't click on it...

Dealing with such a large list has had its own challenges, (though I certainly can't complain). Even opening it with vi takes about a minute, and for searches I've resorted to catting it into grep. I've also discovered that I probably should have used some more efficient algorithms in a couple of my analysis tools.

Here are some of the basic statistics:
  • Total Plaintext Values Parsed: 32,603,387
  • Average Password Length of Parsed Passwords: 7.88
  • Average Complexity Level of Parsed Passwords: 1.94
  • Percentage of Passwords that have an uppercase character: 05.94%
  • Percentage of Passwords that have a special character: 03.42%
  • Percentage of Passwords that have a digit: 54.02%
  • Percentage of Passwords that only have lower characters: 41.68%
Overall Letter Frequency Analysis:
ae1ionrls02tm3c98dy54hu6b7kgpjvfwzAxEIOLRNSTMqC.DBYH_!UPKGJ-* @VFWZ#/X$à,¸\&+=Q?)(';%"]Ã~:[^

First Letter Frequency Analysis:
sm1cba0pljdtrk2hgfniew39v45o8y76MzSBACJLquDPTxRKGHNFIEW*VOY#Z@!Q($.UXà-_<~[/+,;="`?&%Ã:^

Last Letter Frequency Analysis:
1ae326sn5794y08roltdihgmukzc!pxwbA.fEj*SNYOvRLqDT@IHM$?KG)U_-ZC+#PXB/,WJ]%;F'~=V`^\&Q"> (:

Quick Analysis:
The first thing that stands out is that many of the passwords don't meet RockYou's password policy. RockYou requires users to create a password between 5 and 15 characters, and passwords cannot contain special characters. With this dataset though, 78,404 passwords were less than five characters long, and 258,835 passwords were longer than fifteen characters long. As already mentioned, 3.42% of the passwords also contained a special character. This seems to imply that a lot of RockYou's applications use a different password creation policy. Since many of the widgets manage Facebook/MySpace info, it's a fair bet that they require the user to enter their Facebook/MySpace password which is what we're seeing in this dataset.

Well, it's Christmas Eve, so I'm going to leave further analysis till later. As a Christmas/Hanukah/Seasonal present I would really appreciate for you guys to let me know in the comments or via e-mail what you would be interested in finding out about this list. That way I can focus on what's useful to you vs. just posting random statistics like I did above ;)

Tuesday, December 1, 2009

Google Wave Invite

I've been playing around with Google Wave, and received a couple of extra invites to the free beta. If you are interested, let me know and I'll send one your way.

My short review: It looks like one of those tools where it takes a lot of work to gain any benefit from it. That being said, if you are collaborating with a lot of people on several different projects it has real potential.

Friday, November 27, 2009

Biometrics Are Not Going to Save Us - Or Get Used to Your Password

Abstract:
It generally is taken as common wisdom that one day, everyone is going to switch to biometrics so we can finally get rid of those pesky passwords. This post is an attempt to stave off that future. This isn't because I'm in the password cracking business, (I'm sure horse-buggy salesmen thought that cars were a poor substitute as well). Instead it's because biometrics are a really bad solution. In fact, over the long run, biometrics will make it even easier for an attacker to break your authentication schemes. The rest of this post is an attempt to explain why this is the case, along with some other reasons why you probably shouldn't go out and buy that thumbprint reader quite yet.

If Biometrics Are So Bad, Why Do People Still Like Them:
Let's face it, passwords suck. This is a blog pretty much devoted to password cracking. I'm aware of the fact that they suck. Passwords are a pain to use, a pain to remember, and pain to make secure. Who doesn't want to get rid of them? At the same time, biometrics seem like the perfect solution. Just about everyone has something to scan, (we'll make a special use case for the double amputee). You won't forget them. You won't leave them at home. User's can't share them. Oh, and with all that fancy equipment, how's a hacker supposed to break it? In addition, it's an authentication scheme that everyone understands. Try to explain public key cryptography to someone, and all you'll get are blank stares. Everyone knows about fingerprints and iris scans. Heck, we use biometrics every day to distinguish between people like Bob and Alice, (though please stop staring at Alice's chest and calling it an authentication scheme...) So why shouldn't we use biometrics with computers to authenticate people?

Identification, not Authentication:
If you've ever had to suffer through a CISSP prep course, you'll probably remember the instructor droning on about the IAA triad, (Identification, Authentication, and Authorization). The thing is, keeping those three ideas separate really is important, since they perform different jobs. Identification is a way to keep multiple users distinct, and as a statement of who you think that person might be - "Hey that looks like Alice". Authentication is where you verify someone's identity - "Hey that is Alice because she knows the secret codeword". Biometrics are extremely useful for identification. Yes, you heard me right - I'm a fan of biometrics when it comes to identifying people in certain cases. It's using biometrics as a form of authentication that I have a problem with.

My very first internship back in 1997 was working on the National Automated Fingerprint Identification System. It was one of the best summers of my life. From moments such as, "Wait these fingerprints were taken from someone who died in the tub and laid there for three weeks - Ewww", to having to pass through an almost "Get Smart" level of security when going to work, it was amazingly fun. Beyond that though, I gained respect and a better understanding of how biometrics like fingerprints can be useful, (though I also obtained a better understanding of false positives). It's not just law enforcement where biometrics as a form of identification is used. From all accounts, biometric identification has also proven vital to supporting our troops in Iraq and Afghanistan. So I want to state for the record I'm not against using biometrics. As I'll go into much more details in the later sections though, due to certain inherent weaknesses, biometrics perform very poorly when used for authentication in computer systems. In short, biometrics for authentication = fail.

The Number #1 Reason Why Biometrics Fail - It's all Digital:
It's easy to get carried away talking about the lack of false positives, liveness sensors, or the inability to duplicate the blood vessels at the back of an eye when discussing the security of biometrics. In reality, very rarely does any of that matter since they are all factors of the biometric sensor itself. This is important. If you can bypass the sensor, then all you are dealing with is a digital representation of the data. Probably the best example of this was the excellent Defcon 15 talk by Zac Franken - "Biometric and Token Based Access Control Systems: Are you protected by two screws and a plastic cover? Probably". In it, he demonstrated grabbing biometric "signatures" from devices and then replaying those signatures to allow him access. It's a lot like a pass-the-hash attack but easier. This gets even worse for remote authentication, where the sensor is completely out of the control of the defender. Now, a defender can try to make the sensor harder to bypass, for example by using some form of trusted computing, but this is a tough battle to fight. Let me put it this way: If we had trusted computing down pat, no-one would be able to jailbreak their I-Phones...

The fact that an attacker is dealing with a digital representation of a biometric signature means the type of biometric measurement, (fingerprint, iris, voice, etc), they are attacking generally doesn't make a difference. The one caveat I want to mention though is that the type of signature can impact the ability to perform guessing attacks. As I'll explain later though, that becomes largely irrelevant as biometric authentication becomes more prevalent.

My Mother Always Said, "Don't Use Your Online Banking Password Everywhere":
I think we all agree that password reuse is a real problem. When singles.org was hacked, some of the 4chan'ers tried the disclosed e-mail/passwords on facebook, and then proceeded to create a lot of embarrassing fake posts. Some of those posts are hilarious; until you realize that this happened to real people. I think we can all agree - using the same password for your online bank account and every other site is a bad idea. But what about biometric signatures? I only have ten fingers. What happens when a site that has a copy of my biometric signature gets hacked? Now an attacker has a digital copy of my fingerprint. That can't be good...

The sad fact is that biometric authentication is only somewhat effective as long as it is a nitch market. The second everyone starts using it, you fingerprint/iris-scan/whatever will get disclosed. I wish I was exaggerating, but if that wasn't the case, we wouldn't have any problems with password reuse either. Heck I know my password has been stolen at least once when a gaming website I used to play on was broken into. In reality, my password probably has been stolen more than that, (the Cain&Able forum may have been hacked, the FSU Alumni database was broken into a couple of years ago, etc). Since a biometric signature is digital and can be used in a replay attack, the user needs to have at least the option to select different forms of authentication across different sites.

This is Going to Hurt - Lack of Revocation in Biometric Authentication
This dovetails with the previous point. What happens when a hacker gets your biometric signature? You can't change your fingerprints, (well you can, but it'll probably be fairly painful). How do you stop someone from logging in if they have your password and you can't choose a new one? That's not a rhetorical question. I really don't know the answer to it, and I don't think anyone else does either. In short, even if there is a form of biometric authentication where it is hard/impossible to guess someone's signature, it really doesn't matter because if it achieves widespread deployment your signature is going to get disclosed.

I Was Framed - Lack of Attribution, But Don't Count on the Jury Knowing That:
There's a lot of privacy issues with using biometrics, but I'm going to skip over that. It's not that the privacy issues are unimportant, but that they can lead into conversations/arguments that wildly diverge from the subject at hand. Instead, let me ask you this question. If you were on a jury, would you convict someone if the only evidence against them was fingerprints and/or DNA samples? I hope the answer is no, but I think a vast majority of people would say yes. What happens to Bob when Alice uses his biometric signature to break into the server room and trashes the place? As much as we want them too, biometrics don't provide attribution on their own, and this fact can be used by an attacker to frame someone.

Trusting Fuzzy Hashing to Encrypt Your Harddrive - This Is Going To End Well:
The great thing about passwords is if you don't enter them in exactly, they don't work. Ok, you might not think it's great when you spent ten minutes trying to log in only to find that caps lock was on, but from a cryptographic standpoint it's really nice. You might have some trouble decrypting your AES256 encrypted hard-drive if your password is constantly changing. To combat that, something called fuzzy hashing has been proposed. What fuzzy hashing attempts to do is return the exact same output hash even when the input values are changing slightly from time to time. This fuzzy hash can then be used as the key for your encryption algorithm. The problem is that current fuzzy hashing techniques aren't that good, and have been shown to have multiple problems that make them extremely vulnerable to guessing attacks, (the attacker only needs to get close). In fact, here are two papers on this very subject: 1, 2. Even if that wasn't the case, good luck trying to keep the cops from obtaining your fingerprint. That's actually one of the fundamental flaws with biometrics. Since they are based on something you are, instead of something you know, it becomes very hard to keep them secret. At least with a RSA key, you can hide it under the floorboards.

Finally, fuzzy hashes may not be able to take into account drastic changes in someone's biometric signature. If you cut your finger, do you have to wait for it to heal before you can decrypt your hard-drive? In short, biometrics and encryption don't play very well with each other.

Conclusion:
There were quite a few additional arguments I could have made, (hey I didn't even mention this Mythbusters episode), but I'm probably beating a dead horse by now. It's actually gotten to the point where I'm not even a fan of using biometrics as one-half of a two factor authentication setup. It's not that they don't provide some additional security, but that there are better solutions out there. No form of authentication is perfect, but I'm fairly well convinced that biometrics are not the way to go. I hoped this post proved convincing, and if not, please use the comments section below as I'm always open to new ideas.

Tuesday, November 24, 2009

Analysis of 10k Hotmail Passwords Part 5: Markov Model Showdown

Don't worry; I'm still not done with this data-set. A little over a week ago I received an e-mail from Ilya Sokolov, saying:
If I'm getting the numbers right from your graphs - you've got around 3k hashes bruteforced in about 1G guesses. Assuming you used --incremental mode of John, right? I guess you should try --markov too :)
How right he is. Ilya went on to send some statistics my way, so I truly do appreciate e-mails like this. Before I talk about the results, first let me back up and spend a little time talking about the incremental and markov modes in John the Ripper. Aren't they both Markov based attacks?

--Ed Note: The following description of how the incremental attack works has been updated since I was incorrect about how JtR used trigraphs. A copy of the original incorrect description can be found in the comments.

Well surprisingly yes they are, though they go about it in different ways. The --incremental option actually models the probability of trigraphs appearing. In this case a trigraph represents a three character long string, such as "abc". Another way to say this is that JtR uses trigraphs to represent second order Markov probabilities, aka the probability of 'c' given 'ab'. There's special code handling the first two characters, (since they don't have two characters before them), but after that John the Ripper select each successive character based on the previous characters. Therefore, for the guess 'abcdef', it would create the trigraphs 'abc', 'bcd', 'cde', and 'def', and assign a final probability to the guess based on those four different trigraphs. In addition, the probability of a trigraph is also dependent on where it shows up in the password. In this way, the --incremental attack also takes into account factors such as numbers often show up at the end of a password, and uppercase letters usually are at the front. Pretty nice, right? The only downside is that it doesn't generate these guesses in true probability order. I'm still a little fuzzy on some of it, but the incremental mode does make some shortcuts so it can cover the entire keyspace, while still being able to generate a large number of guesses quickly.

I know; Way to much info - especially since this is just my understanding of how it works. I need to spend more time looking at the code to make sure the above is actually correct. The biggest advantage of using the incremental attack is that it creates highly optimized guesses while still retaining the ability to eventually cover the entire key-space if you give it enough time. Aka it will eventually try guesses like 'bD359sl'. Also it is blazingly fast.

The --markov mode on the other hand takes a different approach. First of all, if you haven't noticed this option in your copy of John the Ripper, that is because it was added as a third party patch, (though it's currently part of the jumbo patch set). Personally I haven't used it that much since it full out segfaults on Intel based macs, which I generally use as my development/testing platform. That being said, after running some comparative tests on my Linux box at the lab, (I use that one for my long term password cracking sessions), I've really become a believer in it. However, using the markov mode is a little bit tricky.

Unlike incremental mode, the markov mode does not use trigraphs. Instead it starts with the highest probability starting letter, (in the default stats file this is a 'c'), and then appends the most probable following character. It repeats this until it hits the probability threshold specified, or it reaches 12 characters long, in which case it outputs a password guess. The attack then goes to the last character of the previous guess and changes it to the second most probable value, and outputs another guess. You can probably see where this is going as it cycles through printing out all the possible guesses that meet the probability threshold specified. There are a lot of advantages to this approach. For example, it can attack passwords up to twelve characters long. The downsides of this approach is that it doesn't create guesses in probability order which the incremental attack does. Therefore, as we will see, if you set a high threshold the markov mode starts to resemble a letter frequency analysis attack. Finally, because of this, the markov mode won't cover the entire key-space, (aka, it won't crack the password '3Nw!lswp'). The last point isn't as important as it may seem though, since in all likelihood due to time constraints, (aka you only have several months to crack a password if that), you simply don't have enough time to cover the entire key-space even using incremental mode. In short, you want to do the best that you can with the limited number of guesses you can make.

So that was a really long introduction before we even talked about any results ;) Let's see how the incremental and markov options fare against some real passwords!

Test 6: Trying the default -incremental and -markov modes against the Hotmail password set

For the first test, I wanted to run both the -incremental and -markov modes against the Hotmail password set using their default settings. The one tricky part was to set the limit for the markov attack. In the readme file for the patch, it actually goes into quite a bit of detail on how to select a proper limit. Really all I did was run the included program 'genmkvpwd', and select a limit that would cause the attack to generate slightly more than one billion guesses. This turned out to be the value '214' for the limit.

Results: So here are the results. As always, you can click on the graph to get a slightly larger version. The Y-axis is the number of passwords cracked. Sorry, I didn't realize I forgot to add that until I was proof-reading this and I didn't feel like re-uploading the graphs.

-ed note: I know; It certainly doesn't look like I proof-read these posts ;)



Analysis: As you can see, the Markov mode started out slow, but finished strong, cracking 786 more passwords than the Incremental attack. As far as I can tell, the reason why it did so well near the end was that the markov mode started trying password guesses that began with numbers, which was very common in the hotmail dataset. Also, since the probability of a guess starting with a number was so low according to the default rule-set, it only tried very probable guesses following a number.

Another thing I'd like to point out: Due to the way it creates guesses, for the most part the Markov mode cracks passwords at a fairly fixed rate; while the Incremental mode is much more front-loaded. I'd like to caution you against extending the Markov mode's line though, as it has finished just about every password guess within the limit that I set. That actually leads us to the next test...

Test 7: Setting Different Limits for a Markov Based Attack

So the next question of course is, "How does the limit affect Markov based attacks?" That's a pretty easy one to solve. All I needed to do was run it with several different limits and graph the results. I also included the Incremental results for a comparison.

Results:


Analysis: See what I mean about not being able to continue a predictive line on the number of passwords that using JtR in Markov mode will crack? When using a limit of 200, it created slightly more than 250 million guesses and then finished. If we run the same attack using a limit of 214, it will create almost four times as many guesses, but it only cracks 148 more passwords. If we increase the limit to 250, it will generate roughly 46 billion guesses; But if you refer back to some of the previous tests, even a pure brute force attack performs significantly better in the first billion guesses. Don't worry though, as the Markov attack will catch up with brute force, (and then exceed it by a large margin), if you let it run long enough.

What this shows is that you really do need to spend the time to tailor your attacks using Markov mode to the resources you expect to spend cracking the password list you are attacking. One good option is you can split your attack up into multiple parts, where you try all the guesses using a very low limit, and then increase the limit and run the attack again. There is an option in the latest build of the patch to exclude guesses made previously. This can save you time since you don't have to re-hash guesses you made already. There is a slight performance hit by using this approach though as it will make a guess, realize that it made the guess before and reject it, before moving on to the next guess. Still, this can be very useful, especially when attacking salted hashes, as it makes a huge performance improvement to crack as many hashes as possible as soon as possible, (since each hash you are cracking greatly adds to the cracking time).

Test 8: Comparing Incremental and Markov Modes When Trained on the Same Password Set

Now, all of the previous tests are fairly nice, but there is the worry that we are comparing apples and oranges. I mean, both the default Markov and Incremental modes were trained on different password sets, and that can have a huge impact on their cracking ability. For example, the Markov mode was trained primarily on French passwords, and the Incremental attack was mostly trained on English passwords. That's one nice thing about being in the password cracking research business though; I certainly have a lot of passwords to use for my own training set ;)

For this test, I trained both the Incremental and Markov modes on the same set of disclosed passwords from the phpbb.com list that I had previously cracked. In technical terms, I created a new .chr and new stats file for both attacks. As I previously mentioned in part 4 of the writeup, the phpbb.com list isn't ideal for this attack since it is primarily composed of English passwords, while this list is mostly Spanish/Portuguese, but that doesn't matter for this test. What does matter is both attacks are being trained on the same list, and they have to deal with the same limitations. In addition, I ran the Markov attack twice. The first time I allowed it to attack passwords up to 12 characters long. The second time I limited it to only attack passwords up to 8 characters long, (and I assigned it a higher limit so it would generate roughly the same number of guesses). This way we can better compare it against JtR's incremental mode which only tries guesses up to 8 characters long. So, let's see how this turned out:

Results:


Analysis: Yup, the Markov mode still cracks more passwords than JtR's Incremental mode. Still, there were a couple of things that surprised me about these results. First of all, the phpbb.com list made a much better training list than using the default .chr and stat files. The Markov attack cracked 540 more passwords using the same number of guesses when it was trained on the phpbb list. The Incremental attack also got off to a much faster start, though in the end it cracked 17 less passwords than the Incremental attack using the default alphanum character set. This isn't exactly a fair comparison though since I didn't bother to filter the training set to only include lowercase letters and numbers.; Aka it actually is using pretty much the entire keyboard. In that case, it performs much better than using JtR's Incremental mode using the 'All' .chr file, and it cracked 333 more passwords overall.

The second thing that surprised me was that limiting the maximum password guess size didn't really have much of an effect on the number of passwords cracked using the Markov mode. This shouldn't have surprised me that much, since it didn't make that many guesses longer than eight characters long, (the probability of those guesses would be very low), but still it helps to see that on a graph.

Future Work: Whew, and here I thought I'd be done with this data-set by now. I still have a few miscellaneous tests I want to run using brute force style methods believe it or not. And then I need to get around to posting my results with dictionary based attacks. Oh ,then there are the topics that I've been meaning to post on for a while. They range from going over other datasets, (such as the ZF0 one), to the vulnerability of World of Warcraft accounts to password disclosure attacks, (and what Blizzard is doing about it). It should be fun ;)

Monday, November 16, 2009

Installing John the Ripper Version 1.7.3.4 Tutorial

I just upgraded to the newest version of John the Ripper so I decided to make a tutorial out of my experience, (with screen-shots), since it was a fairly time consuming ordeal. It's mostly focused on installing John the Ripper on a Mac OSX Snow Leopard, but you should be able to use most of it when installing it to various flavors of Linux as well. Besides going over the base install, I also tried to cover:
  • Which patches to install, where to find them, and how to apply them
  • Picking the right build options
  • Modifying the Makefile so it actually will install on Snow Leopard
  • Modifying the code so you can use incremental attacks against passwords longer than eight characters long
You can find it on my tools page, or by clicking on this link.

Sunday, November 15, 2009

Defcon 17 Videos Posted Online

The title says it all. You can get all of the videos here. Just a warning, I may have used some inappropriate language in my talk on password cracking, so you might not want to watch it in front of small children.

Also, my writeup of a couple of the talks can be found here, and here if you are having trouble deciding what to watch.

Thursday, October 29, 2009

Analysis of 10k Hotmail Passwords - Even More Brute Force

A reader asked me through e-mail how much better John the Ripper's Markov models were compared to pure brute force or letter frequency analysis. I knew there was a reason why I put my e-mail address on the side of this blog. That's a great question, since while I'd always had more success with Markov models vs letter frequency analysis, (and certainly brute force), I had never measured the difference before. What type of researcher am I? I better fix that, so let's check it out.

Test 4: Markov Models vs. Letter Frequency Analysis vs. Pure Brute Force

So in this test I reused the data collected previously in Test 1 using JtR's -incremental mode targeting lowercase letters and numbers, (a-z0-9). I then used the popular tool crunch to run both the brute force and letter frequency analysis, (which I'm going to call LFA), attacks since JtR doesn't support pure brute force, (well there is a bit of a hack, but crunch is easier). For the pure brute force attack I ran:

./crunch 6 8 abcdefghijklmnopqrstuvwxyz0123456789

In short, I'm creating brute force guesses between 6 and 8 characters long containing a-z0-9, starting with aaaaaa and theoretically ending with 99999999, (in this run it never finished all of the six character long words). This is also the default character set that the popular password cracker Cain&Able uses.

For the LFA attack, I then ran it using the numbers based on my analysis of the phpbb.com password list. As we'll see, you can do better if you base your LFA on analysis of Spanish/Portuguese passwords, but I don't have another dataset of Spanish/Portuguese passwords so this is the best I can do without cheating. The command I used was:

./crunch 6 8 smp1abctdkjlhrfgnw2ei0ov3q457z968yux

A couple of notes. I used the first character LFA charset since crunch increments it's values from right to left; aka it will try 01, 02, 03, 04 etc. This means using the above character set, it will initially crack all the words starting with 's', then move on to words starting with a 'm' and so on. When there is no password creation policy, (aka users aren't forced to include numbers in their passwords), this is slightly better than password crackers such as Cain&Able who increment their values from left to right, aka 10, 20, 30, 40 ... The reason for this is because people are slightly more predicable in how they start words/passwords, vs how the end them. For example, from the phpbb.com dataset, over 51% of the users choose one of top ten characters to start their passwords. Likewise, 46% of users choose one of top ten characters to end their passwords. It's not a big difference, but every little bit helps. If there is a password creation policy though, this quickly falls apart, and it's vastly more preferable to use a left to right approach since people almost always put numbers/special characters at the end of their passwords. For example, just the number '1' appears 21% of the time at the end of stronger passwords.

Also, the more astute, (and/or skeptical), readers might notice the order I'm using is slightly different from the one I published before. That's because I've cracked more passwords from the phpbb.com dataset, (I'm nearly up to 98% of the total set cracked), so I've updated some of my statistics. That's another thing I need to get around to posting up here...

Results: Note, you can click on the image for a bigger version of it.


Analysis: The first thing that stands out is that my letter frequency analysis percentages are way off. I actually do worse initially then if I had chosen a pure brute force approach. I know it's cheating, but going back to my original analysis of the hotmail list, and comparing it to the phpbb list, you get the following:

Hotmail list: a1mbc2sp0lterdjfgn3hi6k759vo48ywzuqx
Phpbb list : smp1abctdkjlhrfgnw2ei0ov3q457z968yux

Looking at this, the best choice you could have done was to start with the letter 'a' which is coincidentally what the pure brute force approach did. The reason for the huge bump in the LFA attack around the 220 million guess mark is because it hit the '1' as the first character, followed by an 'a' which was about the equivalent of a perfect storm since people started their passwords with a number, followed by the actual word they were using. Even with my probabilities being off though, LFA still beat out pure brute force, so if you aren't at least using that, well I don't know what more I can say to convince you. That being said, this really shows how superior using Markov models is to both attacks, with the alphanumeric Markov attack cracking more than twice the amount of passwords that the LFA did. I'd also like to point out that if you look at the first several million guesses, a Markov based attack performs so much better than LFA it's ridiculous.

Test 5: LFA and Brute Force Attacks Against Different Length Passwords

The next question is how do LFA and Brute Force attacks perform as the password gets longer. As I said before, the above test only shows them attacking 6 character long passwords. What happens when we bump up the length to 7 or 8 characters minimum. To simulate this, I once again used crunch with the above settings except for increasing the minimum size guesses allowed. I also used the phpbb LFA probabilities, since I hate training and testing on the same dataset, (also I had already run the tests before I graphed out the results and realized how off those probabilities were...)

Results:



Analysis: I didn't even bother to put the eight character one up there since it was pretty boring. If I'm generous the pure brute force attack would crack one eight character long password, (it took slightly more than one billion guesses to do so), and the LFA attack would crack two. This shows though that while increasing the character set quickly makes brute force and LFA attacks infeasible, which is what you'll see in all the literature, even longer passwords are vulnerable against Markov model based attacks. One way to increase the effectiveness of LFA style attacks is to not try the full alpha numeric character set, (aka drop qxz etc). That's another test I'll try to run later. Even so, once again this shows that Markov models are the way to go. I'd like to point out too, that JtR's Markov model attack almost certainly would perform better if trained against similar passwords. That's another test I'll need to run ... after Halloween weekend.

A Quick Note on Time Required: I think it's important to state that these test runs are extremely short since I didn't want to run each for an entire day just to get a few more data points. My update schedule is already too slow ;) To give you a better idea how long it would take to bruteforce the entire key space here is some numbers I ran using Cain&Able. Note, Cain&Able is much slower than JtR, but it has a really nice time estimator, so please consider these numbers a worst case scenario.

Computer Specs:
2.4 Ghz Core Duo
3 Gigs of Ram
NVIDEA GeForce 8800 GTS
Windows 7 - 64 bit

-All tests were run against cracking plain MD5 hashes

Alpha-Numeric passwords: a-z0-9
  • 6 character: 14 minutes
  • 7 characters: 8 hours
  • 8 characters: 12 days
The Entire US Keyboard
  • 6 characters: 3 days
  • 7 characters: 272 days
  • 8 characters: 71 years
Oh, and keep sending me ideas, either via e-mail or in the comments of these posts.