For a little over a month I've been cracking passwords from two different lists.
- Phpbb.com: You may have heard of it from this posting on darkreading. Here is some background. The site got hacked via a 0-day attack, (by that I mean there was no patch available,) against their forum software. The hacker weaponized a proof of concept exploit posted on Millw0rm and then used various other escalation attacks to gain full control over the site. I guess what I'm trying to say was the attacker wasn't your regular script kiddie. Here is where things get a little convoluted. Phpbb.com had close to 400,000 user accounts on it, but there were in the process of switching users over to a more secure password hash. The problem was, they were doing it in such a way that the user would have to log in again before their password hash was changed. So about 100,000 users were protected by the stronger hash, and the remaining 259,000 users only had their passwords protected by an unsalted md5 hash. Got all that? Don't worry, it gets more complicated. You see, the attacker decided to try and crack a subsection, (about 117,000) of the md5 password hashes. He didn't go after the entire list because as I found out, cracking large password lists is a pain. Of that subsection, the attacker managed to crack 24% of the passwords, (around 28,000) and that's what you've seen everyone talking about, (like the above mentioned darkreading article). Of course, analyzing those passwords would be like analyzing the athletic ability of school children by only judging the first kids knocked out of a dodge ball game. Yes, people still choose 'password123', we know that. What's interesting though are the other 76% of the passwords which is why I decided to try my hand at it. Currently I have cracked 88.9% of the unsalted md5 hashes. I'm still cracking several hundred passwords a day so I'm hoping to get to 90% by the end of this weekend.
- The Finnish password list: This one is a little older, but it had close to 30,000 md5 password hashes so I decided to throw it into my cracking session as well. It contains the passwords from multiple sites, (most of them based in Finland,) that were broken into via an SQL attack, including the website for batmud. That actually was how I found out about this list since I had an account there, (BTW, it's way better than World of Warcraft. You should really check it out). So yes, I've had my password stolen. I would also like to state that one of the first things I did with this list was to strip out all the usernames and delete them since I really don't want that knowledge. I actually tried to crack a chunk of them before, (That time I cracked 87% of them), and gave a talk on it. You can see my slides here. Currently I have cracked a grand total of 94.5% of them which is really nice since it shows that I am making improvements in my cracking techniques.
So my question to everyone reading this is, "What do you want to know about these lists?" Here is a quick breakdown of some of the more common stats, and if you want to know anything in particular just leave a comment in this post, (or e-mail me, I'm flexible).
Please note, unless otherwise specified, this info is only for the passwords I have cracked. Aka, it underestimates the security of the password set as a whole since the passwords I haven't cracked are almost certainly stronger than the passwords I have cracked. Also these numbers will slightly change as I continue to crack more passwords.
Average Length of Password:
- Phpbb.com: 7.06 characters
- Finnish: 7.09 characters
Percentage of passwords that contained an uppercase letter:
- Phpbb.com: 4.38%
- Finnish: 6.63%
Percentage of passwords that contained a special character:
- Phpbb.com: 0.75% <-Yes that is less than 1%
- Finnish: 1.03%
Percentage of passwords that contained a digit:
- Phpbb.com: 44.18%
- Finnish: 44.07%
Percentage of passwords that ONLY had lowercase letters
- Phpbb.com: 53.00%
- Finnish: 52.13%
I'll post more info in my following posts such as common words, how often 'password' was used, letter frequency analysis info, etc... My initial reaction though are
- Boy do I love numbered lists ;)
- It's shocking how few people used special characters. I had tended to equate that as an European thing, (I've seen that in a lot of the other disclosed Scandinavian lists), but I guess that's a more universal trait.
- Most input dictionaries leave a lot to be desired. A majority of the words were not in a lot of the input dictionaries out there. On that note, I'd like to point everyone over to Sebastien Raveau's blog, Tricks of the Trade. He's put together a wordlist containing every word from Wikipedia and its sister projects, (such as Wiktionary). The actual list itself is too large to do many word mangling rules on, but it was wonderful to catch "easy" passwords that I didn't have in my dictionary and then do stronger word mangling rules on those. Also he just sent me a list containing only words from the English Wikopedia sites so I'm just now starting to play around with that. He has some other really good posts as well so I highly recommend checking his site out.
- I was also surprised by the large number of passwords that did not contain numbers. I was seeing around 80% of passwords containing numbers in a lot of the other lists I've looked at.
- Just cracked the password 'tex@s2k' about a second ago. Even though it's fairly easy to see how it was created, it was a fairly strong one since I had to only try one l33tsp33k change and then do an abnormal date at the end '2k'.
- The percentage of people who used uppercase letters matches what I have seen before. People really hate hitting that shift button.
- My probabilistic password cracker does work which is a relief. I would have never gotten this many passwords cracked without it. Also it was instrumental early on before I wrote my own password hasher/cracker to deal with the huge password list. I'm presenting a paper in a couple of weeks on it here, and after I do I'll publish it on my blog as well. The short description though is it provides a much better way of applying word mangling rules to dictionaries. Eventually I want to expand it to automatically switch between a dictionary based attack and targeted brute force which will be sweet. You can also see a brief description and a prototype of it in my defcon 16 talk.
Whew, well that's about enough typing for today ;)