10k Hotmail Passwords
Ed note: Ok, I probably should update the title to "30k E-mail Passwords". That teaches me to take my time writing these posts ;) An updated article talking about the additional passwords floating around can be found here. All of the below only deals with the initial 10K passwords that were posted originally, since I haven't been able to find the second 20k list yet. According to Google, there's a third list as well. It's still unknown at this point if all the lists are related or not.
So as some of you may have heard, a little over 10,000 hotmail e-mail account/password combinations were publicly posted online around October 1st, with the first news report about it surfacing around October 5th. First off, I'd like to give special thanks to Steve Gadd and Ilya Sokolov for alerting me about this dataset. I'm always open to any help I can get.
Luckily I managed to snag a copy of the list before it was deleted from Google cache, though I've seen some other copies floating around. The site where it was posted on, pastebin.com, has since been taken down by the owner due to the large amount of attention, (and traffic), it has received over the incident. It's also been mentioned that pastebin is putting filters in place to prevent this from happening again. If they look through their old archives though, I think they will be surprised to find that pastebin has been one of the primary sites for distributing passwords for a while. Just about every password cracking forum I've gone to I've seen people posting their password lists, (both the raw hashes, and the cracked passwords), on pastebin. That's because most forum software won't allow you to add several thousand lines of hashes into a single post. Also pastebin's original goal, of providing a way for developers to easily distribute and modify code, makes it real easy for several people to update a huge list of password hashes with the ones they have cracked. I'm not blaming pastebin for this. It's just interesting to me how many technologies can have a dual use.
The passwords posted only cover a small range of users. Specifically the list only covers users who e-mail addresses fell into the range ara*** to bla***. It should go without saying then that this is only a small sample of the entire list that the attacker collected. While I don't know how the list was first discovered, my guess is that this list was posted online by the attacker who was trying to sell the entire list, and used this snippet to prove they actually had the goods. Looking at some of the other e-mail lists I've collected through my research, that range, (ara-bla), constitutes around 4% of all the total e-mail addresses. This means that the 10k sample probably represents around a 250k password set. Note, I based this on primarily English e-mail addresses, and the list mostly contains Spanish/Portuguese/French users so this number may be wildly off. That number sounds about right though, since 250k would make a nice round number to sell off in one chunk, (the attacker probably has collected many more passwords and is saving the extra for a different sale).
It's important to realize how valuable your webmail account is. Once someone's e-mail account is compromised, you can take over every other web account they own. Banks, Paypal, Facebook, Twitter, amazon.com, the list goes on and on. That's because every other site relies on your e-mail account to send a reset password to if you forget your current one. An attacker doesn't even need to know which online bank someone uses. All they need to do is just go to all of the online banks, enter in the compromised e-mail address and click "I forgot my password". A couple of minutes later, the bank that the user has an account with will e-mail a new password to the compromised account. I wish I could say I'm exaggerating the problem, but your e-mail address is the key to your whole online existence.
So the next question is: how did the attacker get the e-mail addresses/passwords in the first place? It wasn't from Microsoft, that's for sure. I'm very confident about this due to numerous reasons, (invalid e-mail accounts, several gmail accounts being in the list, blank passwords, and passwords that didn't meet the minimum password requirements, etc). That means that this list was either collected through a phishing attack, or through malicious software, (keystroke loggers), installed on users computers. Luckily, it's looking more and more like it was from a phishing attack, which means that Microsoft's quick efforts to disable the accounts and alert the users will pay off. If it was a keystroke logger, (And by keystroke logger I mean software set to harvest passwords), most of the users would reset their passwords only to have them stolen again. I originally suspected that this was due to infected computers, simply because there weren't any vigilante posts that you normally see due to phishing attacks. Aka passwords such as "F**K YOU, YOU F**KING HACKERS!!!" As Mr. Gadd pointed out to me though, one of the usernames was
The lack of vigilante posts though seems to point to the fact that if this was a phishing scheme, it was a very convincing one. Upon doing some further research, it looks like this list may have been collected from a MSN Instant Messenger scam. I'm still a little weak on the details, but the short answer is that the scam site would send out e-mails saying that they could discover who had blocked you on your instant messenger client. The user would then go to the website and log in using their Microsoft account ,(supposedly so the site could run it's tests to see who was ignoring you). The scammers would then send out the same advertisement to everyone in the user's address book, saying that the e-mail was from them. There are some other details as well that I'm fairly foggy on, (they would try to sell the user some free software, and have the user send them a SMS message so they could split the profit of the SMS message with the phone company).still can not believe this, tell me whether you think is real
As a clarification, the only reasons I suspect that the above phishing attack and this password list might be related, is that the first public comment about this list that I can find appeared in a discussion about that scam on October 2nd, and the list mainly contains Microsoft e-mail addresses. There's a very high chance these various scams might not be related.
So on to the analysis:
- Total Passwords: 9,845 - This number excludes all the e-mail addresses that had blank passwords
- Average Password Length: 8.7 characters long
- Percentage that contained an UPPERCASE letter: 7.2%
- Percentage that contained a special, (aka !@#$), character: 5.2%
- Percentage that contained a digit: 51.7%
- Percentage that only contained lowercase letters: 43.3%
- Percentage that only contained digits: 17.6%
- Percentage the started with a digit, (aka '1password'): 25.0%
- Percentage that ended with a digit, (aka 'password1'): 44.1%
- Percentage that started with a special character: 0.5%
- Percentage that ended with a special character: 2.2%
- Percentage that started with an uppercase letter: 6.1%
Letter Frequency Analysis:
Note, for some reason I can't get a couple of the characters to display correctly on my Mac so I'm cutting off several of the non-ascii ones that are only used once or twice:
Overall letter frequency analysis:
aeoi1r0ln2st9mc83765u4dbpghyvfkjAzEIOxRLwSNq.MTC_DB-UP*G@H/ZYF+VJK,\$&X!Q=W?'#")(%^][}< {`>First character, letter frequency analysis:
a1mbc2sp0lterdjfgn3hi6k759vo48yAwMzBSCuqPLExJRTFDGNV*HOZYKI\W@/-+(.$U&?Q^[,#
Last character, letter frequency analysis:
aos01326e57849nrilydzmtuAhbO.gck*SxpfE@+LvjNRw_-I?/$q!ZX)YKH"UPMDCB#GF'&%}T,]\VJ(Short Analysis:
Overall, the passwords in this list were fairly strong considering Microsoft only had a weak password creation policy in place, (the password had to be at least six characters long). What was also surprising was the number of passwords that only contained numbers. -See, I told you it would be a short analysis. I'll try to post a more detailed analysis, (such as nationality/language breakdown, effectiveness of input dictionaries, etc), later.
Comments
As for representation of statistical results, it would be of practical interest to calculate medians of distributions in addition to average values.
Thanks. I'm working on a new update and I'll include a graphical representation of the password length to better show the length distribution.
Simon,
Thanks too. A good chunk of the next update will be on how resistant these passwords are to actual password cracking attacks. In addition to the overall results, I'll try and break it down into how different length passwords fare