Tuesday, January 26, 2010

More Analysis of the Rockyou Password List - Strong Passwords

So it's been an interesting last couple of days. First of all, it's a bit amazing how popular the Rockyou list has become after it was mentioned in the New York Times article. While I'm not going to provide a link, let's be honest, if you can't find it, you are not looking. The thing that keeps going through my head is that we may have just narrowly missed having a black swan event, (Ok, Mr. O'Conner just posted about those in his blog so the term is stuck in my head, even though I'm using it wrong.) Can you imagine what would have happened if the public RockYou list had contained e-mail addresses? While lists of this size have been distributed before, (and black hats have been able to obtain the whole list + email addresses), I really don't know what a public disclosure of a list this size containing e-mail addresses and passwords would be like. It probably won't lead to an internet apocalypse, but we would have people looking up friends/co-workers. The 4chan crowd, and associated griefers would of course get involved. How would sites deal with locking accounts/resetting passwords? When the Hotmail + various other passwords were leaked back in October, e-bay/webmail providers/etc had a hard time dealing with even 30k leaked passwords. Will secret questions be enough to re-verify people? What if the secret question answers from the hacked site are disclosed as well?

As I was joking with someone else, "Does the security of the Internet really depend on Facebook not getting completely 0wned?"

Ok, the above is overly pessimistic - but an event like this is going to happen, and I feel we as a security community need to start planning how we're going to respond instead of just making stuff up as we go along. I certainly hope Microsoft/Google/Banking websites/etc, have a coherent plan on what to do anyway.

So enough doom and gloom, on with the analysis. I was shooting e-mails back and forth today with Per Thorsheim about the Rockyou dataset and we both pretty much agreed that as great as 32 million passwords sounds, it's very hard to draw conclusions from the set since we know so little about it. In fact, he has a good writeup of the problem plus analysis of Imperva's white-paper which he posted here.

Basically the list gives us 32 million values. Are these values real passwords? Well there's numerous URL's included in the list, (several that are over 100 characters long and contain search strings for Rockyou applications). In fact, I just did a search using awk and came up with 684 passwords that were longer than 100 characters. Should we count those as legitimate passwords? How about 484 passwords that were only 1 character long? How many of the passwords are from the same person? What password policies were in place at the different sites? Why am I asking so many questions in this post?

So yah, I'm struggling to make sense of it all. Now on to some answers instead of more questions. Per wrote:

Regarding your own analysis of the RockYou password list as well as the analysis done by others, what strikes me is the "negativity" of the results. I had a long chat with a colleague/friend of mine who is also assisting me in my various analysis, and we agreed that we wanted to know a little more about the positive parts of the RockYou list..

He then went on to ask some specific questions. Once again I have to agree with him. The interesting part of this list isn't that thousands of people used '123456' as their password. We already knew that. The tougher passwords, now that's interesting.

Q) What's the longest password found? (# characters)

A) As I mentioned, that's hard to say since there's a lot of really long values in the list that probably aren't passwords as we consider them in the traditional sense. Excluding passwords with non-ascii characters, (they gave awk a bit of a problem since it counted them as two or more characters), I found 27,337 passwords that were longer than 21 characters long. Glancing through the results, most of them appeared real. I'll get more into their composition in the next question.

Q) What's the most complex password found? (all character groups, randomness, length etc)

A vast majority of what would be considered complex passwords turn out to be e-mail addresses. Some of them are even mangled, such as alice@example.com123. In other news, people still apparently hate using spaces in passphrases as well, (or more likely they don't realize they can use spaces). That all being said, very few of the passwords would meet a corporate password requirement, aka >8 characters, containing an uppercase/lowercase/special/digit. That's to be expected since I doubt any of the sites that rockyou collected the passwords for enforced such a requirement.

Q) Percentage of passwords longer than 8?

A hair over 30% of the passwords were longer than eight characters long. This is actually worse than the hotmail dataset where close to 40% of the passwords were longer than eight characters. That can probably be explained by the high number of rockyou only accounts in the list, (heck, 'rockyou' was the #8th ranked password). I don't know about you, but I certainly wouldn't use my A-game password there.

There still are a couple other questions I haven't had a chance to answer, but they will have to wait for another blog post.


1 comment:

Dmitry Evteev said...

to an analysis of passwords: http://ptresearch.blogspot.com/2009/12/over-32-million-accounts-have-been.html