Saturday, October 30, 2010

CCS Paper Part #2: Password Entropy

Round Peg, Square Hole
This is part #2 in a (mumble, cough, mumble) part serious of posts discussing the results published in the paper I co-authored on the effectiveness of passwords security metrics. Part #1 can be found here.

I received a lot of insightful comments on the paper since my last post, (one of the benefits of having a slow update schedule), and one thing that stands out is people really like the idea of password entropy. Here’s a good example:
“As to entropy, I think it would actually be a good measure of password complexity, but unfortunately there's no way to compute it directly. We would need a password database comparable in size (or preferably much larger than) the entire password space in order to be able to do that. Since we can't possibly have that (there are not that many passwords in the world), we can't compute the entropy - we can only try to estimate it in various ways (likely poor)”
First of all I want to thank everyone for their input and support as I really appreciate it. This is one of the few cases though where I’m going to have to disagree with most of you. In fact, as conceited as it sounds, my main takeaway has been that I've done a poor job of making my argument, (or I’m wrong which is always a possibility). So the end result is another post on the exciting topic of password entropy ;)

When I first started writing this post, I began with a long description on the history of Shannon Entropy, how it’s used, and what it measures. I then proceeded to delete what I had written since it was really long, boring, and quite honestly not that helpful. All you need to know is:
  1. Claude Shannon was a smart dude.
  2. No seriously, he was amazing; He literally wrote the first book on modern code-breaking techniques.
  3. Shannon entropy is a very powerful tool used to measure information entropy/ information leakage.
  4. Another way of describing Shannon entropy is that it attempts to quantify how much information is unknown about a random variable.
  5. It’s been effectively used for many different tasks; from proving one time pads secure, to estimating the limits of data compression.
  6. Despite the similar sounding names, information entropy and guessing entropy are not the same thing.
  7. Yes, I’m actually saying that knowing how random a variable is doesn’t tell you how likely it is for someone to guess it in N number of guesses, (with the exception of the boundary cases where the variable is always known – aka the coin is always heads- or when the variable has an even distribution – aka a perfectly fair coin flip).
Ok, I’ll add one more completely unnecessary side note about Shannon Entropy. Ask a crypto guy, (or gal), if the Shannon entropy of a message encrypted with a truly random and properly applied one time pad is equal to the size of the key. If they say “yes”, point and laugh at them. The entropy is equal to that of original message silly!

Hey, do you know how hard it is to make an entropy related joke? I’m trying here…

Anyways, to calculate the entropy of a variable you need to have a fairly accurate estimate of the underlying probabilities of each possible outcome. For example a trick coin may land heads 70% of the time, and tails the other 30%. The resulting Shannon entropy is just a summation of the probability of each event multiplied by the log2 of its probability, (and then multiplied by -1 to make it a positive value). Aka:

So the Shannon entropy of the above trick coin would be -(.7 x log2(.7) + .3 x log2(.3)) which is equal to 0.8812 bits. A completely fair coin flip’s entropy would be equal to 1.0. In addition, the total entropy of different independent variables is additive. This means the entropy of flipping the trick coin and then the fair coin would be .8812 + 1.0 = 1.8812 bits worth of entropy.

I probably should have put a disclaimer above to say that you can live a perfectly happy life without understanding how entropy is calculated.

The problem is that while the Shannon entropy of a system is determined using the probability of the different outcomes, the final entropy measurement does not tell you about the underlying probability distribution. People try to pretend it does though, which is where they get into trouble. Here is a picture, (and a gratuitous South Park reference), that I used in my CCS presentation to describe NIST’s approach to using Shannon entropy in the SP800-63 document:

Basically they take a Shannon entropy value, assume the underlying probability distribution is even, and go from there. Why this is an issue is that when it comes to human generated passwords, the underlying probability distribution is most assuredly not evenly distributed. People really like picking “password1”, but there is always that one joker out there that picks a password like “WN%)vA0pnwe**”. That’s what I’m trying to say when I show this graph:

The problem is not that the Shannon value is wrong. It’s that an even probability distribution is assumed. To put it another way, unless you can figure out a method to model the success of a realistic password cracking session using just a straight line, you’re in trouble.

Let me make this point in another way. A lot of people get hung up on the fact that calculating the underlying probability distribution of a password set is a hard problem. So I want to take a step back and show you this holds true even if that is not the case.

For an experiment, I went ahead and designed a variable that has 100 possible values that occur at various probabilities, (thanks Excel). This means I know exactly what the underlying probability distribution is. This also means I’m able to calculate the exact Shannon entropy as well. The below graph shows the expected guessing success rate against one such variable compared to the expected guessing success generated by assuming the underlying Shannon entropy had an even distribution.

Now tell me again what useful information the Shannon entropy value tells the defender about the resistance of this variable to a guessing attack? What’s worse is the graph below that shows 3 different probability distributions that have approximately the same entropy, (I didn’t feel like playing around with Excel for a couple of extra hours to generate the EXACT same entropy; This is a blog and not a research paper after-all).

These three variables have very different resistance to cracking attacks, even though their entropy values are essentially the same. If I want to get really fancy, I can even design the variables in such a way that the variable with a higher Shannon entropy value is actually MORE vulnerable to a shorter cracking session.

This all comes back to my original point that the Shannon entropy doesn’t provide “actionable” information to a defender when it comes to selecting a password policy. Even if you were able to perfectly calculate the Shannon entropy of a password set, the resulting value still wouldn’t tell you how secure you were against a password cracking session. What you really want to know as a defender is the underlying probably distribution of those passwords instead. That's something I've been working on, but I’ll leave my group’s attempts to calculate that for another post, (hint, most password cracking rule-sets attempt to model the underlying probability distribution because they want to crack passwords as quickly as possible).

Thursday, October 7, 2010

New Paper on Password Security Metrics

I'm in Chicago at the ACM CCS conference, and the paper I presented there: "Testing Metrics for Password Creation Policies by Attacking Large Sets of Revealed Passwords", is now available online.
Since I had the paper and presentation approved through my company's public release office I was given permission to blog about this subject while the larger issue of my blog is still going through the proper channels. Because of that I'm going to limit my next couple of posts to this subject rather than talking about the CCS conference as a whole, but let me quickly point you to the amazing paper "The Security of Modern Password Expiration: An Algorithmic Framework and Empirical Analysis", written by Yinqian Zhang, Fabian Monrose and Michael Reiter. In short, they managed to obtain a great dataset, their techniques were innovative and sound, and there's some really good analysis on how effective password expiration policies really are, (spoiler: forcing users to change their password every six months isn't very useful).

I'd like to first start by acknowledging the other authors who contributed to the "Testing Password Creation Metrics..." paper.
  • Dr. Sudhir Aggarwal - Florida State University: My major professor, who spent I don't know how many hours helping walk me through the subtle intricacies of information entropy.
  • Michael Collins - Redjack LLC: Another data driven researcher, and much cooler than me since he uses GNUPlot instead of Excel ;)
  • Henry Stern - Cisco IronPort: He was the driving force behind getting this paper written. It was over lunch at the Microsoft Digital Crime Consortium, (it's a conference to combat cybercrime, and not a group of people from Microsoft looking to commit digital crime like the name implies...), that the framework for this paper was laid out.
As for the contents of the paper, I'm planning on breaking the discussion about it down into several different posts, with this post here being more of an overview.

When writing this paper, we really had two main goals:
  1. How does the NIST model of password entropy as defined in SP800-63 hold up when exposed to real password datasets and realistic attacks?
  2. How much security is actually provided by typical password creation policies, (aka minimum length, character requirements, blacklists)?
Based on our results, we then looked at the direction we would like password creation policies move to in the future. This ended up with us suggesting how to turn our probabilistic password cracker around, and instead use it as part of a password creation strategy that allows people to create passwords however they like, as long as the probability of the resulting password remains low.

Of all that, I feel our analysis of the NIST password entropy model is actually the most important part of the paper. I know it sounds like an esoteric inside baseball subject, but the use of NIST's password entropy model has a widespread impact on all of us. This is because it provides the theoretical underpinning for most password creation policies out there. Don't take my word for how widespread the use of it is. Check out the Wikipedia article on password strength, (or better yet, read the discussion page) for yourself.

Our findings were that the NIST model of password entropy does not match up with real world password usage or password cracking attacks. If that wasn't controversial enough, we then made the even more substantial claim that the current use of Shannon Entropy to model the security provided by human generated passwords at best provides no actionable information to the defender. At worse, it leads to a defender having an overly optimistic view of the security provided by their password creation policies while at the same time resulting in overly burdensome requirements for the end users.

Getting in front of a room full of crypto experts and telling them that Shannon Entropy wasn't useful to evaluate the security of password creation policies and "We REALLY need to STOP using it", was a bit of a gut clenching moment. That's because the idea of information entropy is fairly central to the evaluation of most cryptographic algorithms. I would have never done it except for the fact that we have a lot of data backing this assertion up. The reason we are making the broader point is because it's tempting to dismiss the flaws in the NIST model by saying that NIST just estimated the entropy of human generated passwords wrong. For example, if you juggle the constants around or perhaps look at word entropy vs character entropy, things will work out. Our point though is not that you can't come up with a fairly accurate Shannon entropy model of human generated passwords. You most assuredly can. It's just that it's not apparent how such a model can provide "actionable information". In addition, the way we currently use Shannon Entropy in evaluating password security policies is fundamentally flawed.

This subject really does require another blog post, but before I head back to Boston I wanted to leave you with one of the graphs from our paper that demonstrates what I'm talking about:

The above graph shows cracking sessions run against passwords that met different minimum length password creation requirements, (aka must be at least seven characters long). The NIST estimated cracking speed is based on the calculated NIST entropy of passwords created under a seven character minimum password creation policy. You may notice that it overestimates the security of the creation policy over shorter cracking sessions, but at the same time doesn't model longer cracking sessions either. This is what I keep on saying that it doesn't provide "actionable intelligence", (third time and counting). When we say "password entropy" what we really want to know is the Guessing Entropy of a policy. Unfortunately, as a community, we keep using Shannon entropy instead. Guessing entropy and Shannon entropy are two very different concepts, but unfortunately there doesn't exist a very good way of calculating the guessing entropy, while calculating the Shannon entropy of a set of text is well documented. This is part of the reason why people keep trying to use Shannon entropy instead.

So I guess I should end this post by saying, if any of this sounds interesting please read the paper ;)

Friday, September 10, 2010

Quick Status Update

This is just a quick post to let you know that I for once have a valid excuse for not updating this blog in a timely manner. I actually found a job! Thanks to everyone who offered help, recommendations and encouragements. The only catch is that right now it's being decided if I have to run my posts through our public release office or not. Don't worry, this blog is not going away regardless of the decision.. It might just gain a few unwilling readers ;)

As to my new company, I'm going to keep that a bit of an open secret. This blog reflects my personal views. I certainly don't speak for them, and I plan on avoiding any topics that have to do with my day job, (Don't worry, I'm not doing any password cracking there).

Once again thanks, and I'll resume posting once I get the OK and can update this blog while complying with company policies. I just want to make sure I handle this situation the right way.

Wednesday, July 28, 2010

Defcon Crack Me if You Can Competition

I'd be remiss if I didn't spend a little time talking about the "Crack Me if you Can" competition at Defcon. It's really been amazing the amount of interest that this contest is drumming up. People are excited; it seems like everyone is refining their mangling rules, putting together new wordlists, and finishing up various password cracking tools. The impact that this is having on the password cracking community as a whole is hard to overstate. Needless to say, I'm a fan of that, and I have a ton of respect for Minga and the folks at KoreLogic for putting this together.

I'll be participating, though I certainly don't plan on winning. What I'm really looking forward to though is the chance to meet with everyone else and learn what other people are doing. I'm hoping this turns into an event like the lockpicking village with the contest being almost besides the point. Of course I might be saying that because I'm going to get creamed as well...

Since I've had a few people ask me about the competition itself, here's my two cents. My biggest concern is that the passwords we will be cracking aren't real. This isn't a criticism. There's no way you could run this competition with real corporate passwords, (well, legally that is...). It's just something to keep in mind. What will be interesting though is applying the techniques learned from the winner, (part of the rules are that you have to disclose your cracking techniques), to other datasets as they become available. That's why I have this blog. I might not be the best password cracker out there, but I can certainly run other people's attacks and plot the results on Excel ;)

If I had to hazard a guess, here's some predictions of mine about the contest:

1) Most passwords will be based on relatively common dictionary words. Way more so than you would find normally.

2) Most of the cracking will center around applying the correct mangling rules. Yes there will be the 'Dictionary123' words, but I expect most 'high score' passwords will have less common rules such as 'xD1ct1onaryx'.

3)There will probably be some LANMAN passwords, so bring your rainbow tables.

4) I expect there to be so many NTLM passwords that rainbow tables for them won't be cost effective.

5) I'll be interested to see if they have any 'exotic' password hashes. WinRAR, TrueCrypt, etc.

6) It'll be a ton of fun ;)

I'll see you guys there.

Saturday, July 3, 2010

Protecting Physical Documents

The above picture is of my Subaru Baja. Sometime last night, someone broke my back window, and stole almost everything from my car, (they apparently did not like my music; btw Punk is not dead). Normally this would be a costly annoyance, but in this case I'm in the process of moving and finding a new job. While most of my stuff is sitting safely in a storage locker, I still had several boxes of various items stored in my back seat, including unfortunately my "to-go" bag.

My to-go bag contained all of my important possessions that I planned on grabbing if my house was in the process of burning down. For example, it contained two thumb drives with backups of all of my work plus assorted other documents. I'm not worried about them though since I used TrueCrypt. Good luck cracking those, which BTW, is the reason I love TrueCrypt. What I am concerned about though is that my social security card, my passport, my birth certificate, my extra banking checks, and a whole lot of other important paper documents were also taken. Visualize everything you don't want to be stolen and you have a pretty good idea of what was in that bag.

Of course the next question is "Why the Hell did you leave such important documents in your car?" My response is, laziness and a poor threat model. I needed several of those documents when getting a job. A drivers license by itself doesn't cut it, as you need at least two other forms of ID. Rather than grab those two documents out of my bag and leave the rest in the storage locker I grabbed the whole thing in case there was anything else I needed. Normally I also wouldn't leave it in my truck if I was planning on going somewhere sketchy, but I was parking in a well lit hotel parking lot, and quite simply I had my hands full hauling my bag of clothes, my suit and my computer bag into my room. Darned if I wanted to make a second trip back down to my car. What's worse though is I rationalized it away by saying that at least my car was locked, unlike my hotel room where any of the housekeeping staff could access it during my stay; Plus my car's never been broken into before... The reason that's worse is because I didn't simply forget my bag; I realized it might be a problem and then actively convinced myself that, "No, everything really is ok".

I have to confess that this is a difficult post for me to write. I was counseled by several friends never to mention this incident to anyone, (besides the cops and other officials of course). Their comments were along the lines of "You're looking for a security job, and you just had your life stolen because you left it in the back of a car. Even I wouldn't hire you now!" Of course that was said in jest, but the concern is real.

I'm willing to take that risk though for several reasons. First of all I'm really angry, both at the person who did this and myself. Actually mostly at myself which is seriously messed up. By talking about this incident hopefully I can gain a bit more control over the situation. Second, I'm a big fan of disclosure. If the above description doesn't sound like a typical computer attack, let me rewrite it for you:
The attacker performed an SQL injection attack against After gaining access to the database they downloaded the user's social security number, banking information, and other personally identifiable information. Afterwards the attacker performed a 'drop table users' destroying the local copy of the data. When the site administrator was asked about this, he responded that knew SQL injection attacks were common, but he never expected to be targeted by one. As for the reason why the user data was accessible, the administrator admitted the site was in the process of transitioning to a new forum software, and that if the attack happened a week later when the new forum software was in place, this wouldn't have been a problem.
I've always felt that security incidents can happen to anyone, and what's important is to focus on the remediation, and use them as a learning tool to make sure the same attack doesn't happen again. That's one nice thing about having a blog, I'm on record on saying much the same thing when talking about the ZF0 attack, so at least this isn't a new found belief I came to after finding myself completely 0wned ;)

So in the spirit of full disclosure I wanted to talk about this attack in a public forum where hopefully it will benefit other people, and if someone doesn't want to hire me because I'm not perfect, well at least they found out now. So on to a more detailed analysis of the attack:
  1. Only items in the back seat were stolen. Since there was so much stuff it looks like the attacker grabbed what they could, and then left without doing a full search of the car. I just was really unlucky that everything I cared about was in the back seat.
  2. When this happened, my car was parked at the far end of the parking lot, since I'd rather walk than squeeze into a small spot. This was a serious mistake considering all of the stuff I had in my car.
  3. While there was a night watchman, he did not notice the attack. Likewise there were no sensors collecting forensically useful data, (aka cameras, and the thief did not leave any useable fingerprints).
  4. I'm much too focused on digital issues. The fact that I bothered to encrypt my electronic documents and the store them with paper documents that are way more valuable to an attacker shows a serious lack of priorities and/or threat modeling. I'm not saying don't encrypt your files, but simply that I should have taken the same care with my other documents and locked them up in my hotel room safe.
  5. The mindset, "Just because something bad hasn't happened to me in the past means it won't happen to me in the future" is very hard to avoid.
  6. I really hope all that is happening right now is some 16 year old kid is trying to use my passport to buy booze. That being said, I need to plan for much more serious scenarios, hence closing my old bank account rather than just canceling my checks, signing up to a credit check service, etc.
  7. Dealing with issues like this on a holiday weekend is extremely difficult. Canceling my old bank account and moving it to a new one was particularly stressful since I had a couple of hours to do it before the bank closed for the long weekend. Likewise, I will have to wait till Tuesday to have my car window replaced. Of course, cyberattacks never happen during a holiday...
  8. Security policies are important, but what's more important is enforcement of those policies. You really do need some force from on high telling people, "Yes, it is a pain to take a second trip down to your car, but you are going to do it anyway".
  9. Storing all of your valuables in one place has advantages and disadvantages. I still don't know if the idea of a to-go bag was a fundamentally bad idea, but I certainly should have "checked out" my two forms of id from it rather than taking the whole thing.
  10. What makes a fiasco is a cascade of multiple smaller mistakes/failures occurring together. Whether we are talking about the BP oil spill, or a horribly hilarious Peter Pan play, serious problems often are not the result of just one thing going wrong, but several poor decisions.
  11. Combined with my comments in the previous paragraphs, it's really easy to analyze all of the stuff that I did wrong after the fact. The problem is it all of my decisions seemed so reasonable at the time. On the other hand, you can't live a perfectly secure life. What I'm wrestling with right now is how to re-evaluate my personal threat models and learn from this incident without letting it ruin my life.
Well, that about does it for now. Hopefully I can get back to the focus of this blog, the academic study of password cracking techniques, soon. This whole real life security thing can be pretty annoying...

Monday, May 31, 2010 - General Observations and Updates - Part 3

Digging into this data is like watching an episode of Lost. Whenever it seems like one question gets answered, about ten other questions pop up.

Before I get into details, I want to start with a comment Per Thorsheim sent me as to what other password cracking programs support salted sha1 hashes:

The sha1(lowercase_username.password_guess) is at least supported by these:

Extreme GPU Bruteforcer ( hashcat and oclhashcat (cpu/gpu respectively)

I'm kicking myself for not thinking about hashcat, since it's a extremely powerful password cracker; plus it's free. Unfortunately the GPU version doesn't support the salted sha1 hash type, but even the non-gpu version is quite nice.

As for InsidePro, it also is very good, though it does cost some money. I've had a license-free version of questionable origin offered to me before, but I turned that down. Legality aside, installing pirated software given to you by shady people at a hacker conference would just be stupid...

Also, I've been talking to Ron Bowes over at the excellent skullsecurity blog, and he and some other people are hard at work cracking the passwords. It sounds like they have some serious hardware behind the effort, so expect to see something posted on his site about that in the near future.

OK, now onto the analysis:

First of all, I'm downgrading my opinion of the skill showed by the hackers in their password cracking attack. What I didn't realize before was the extent that the users of had been compromised previously. Just about all of the non-trivial passwords that were cracked appear in publicly available input dictionaries which are based on passwords cracked from user submitted hashes - aka hashkiller, insidepro, etc. Please note, I'm not saying they were script kiddies. The attackers were able to target the salted sha1 hash, and they knew where to get some good input dictionaries. It's just that they are not some uber-l33t password crackers, and anyone else using those input dictionaries could crack the same number of passwords in a couple of hours.

What this also means is we might be able to figure out which input dictionaries the attackers were using by looking at the hashes they cracked, vs. what the input dictionary would crack. To demonstrate this, below is a Venn diagram of what an input dictionary the attackers used would look like:

For example, there is a good chance the attackers used the InsidePro Big dictionary, since it cracks the same 598 passwords from the list that the attackers cracked. Add in a couple of the other publicly available input dictionaries, and you get real close to the 920 total passwords they managed to crack. To put this in perspective, I've so far managed to crack 62%, (compared to the 53% that the hackers cracked), of the salted Sha1 passwords on my laptop using basic dictionaries with almost no mangling rules. I fully expect other people to blow past that mark.

Next up, initial thoughts about the database:

What I really need to do is load all of the tables into my own database so I can do quick SQL queries, vs. manually going through the data by hand. That being said there are a few things that stick out:
  1. There's a lot of userids/password hashes in the carders_smf_members table that did not appear in the write-up.
  2. The MD5 hash is salted, but at least the salt is also available in the carders_smf_members table. The hash itself is a vBulletin3 hash type, MD5(MD5(Password).Salt). Both John the Ripper and Hashcat support this hash type.
  3. I'm not sure if the IP addresses stored in the table are accurate or not, since it looks like the site admins tried to obscure it in the webserver logs, but if they are, the database stores the last two ip addresses used.
  4. Other interesting fields include date joined, number of posts, karma level, last login date, etc.
  5. The above doesn't even begin to get into all of the data contained in the actual posts themselves...
That about it for this update. Let me leave you with one last fact that Ron found out:

Another interesting factoid:

Last MD5 password: 2010-01-10 21:54:01

First SHA1 password: 2010-01-10 22:40:16

So it was on January 10, 2010, later in the evening (in CDT) that they upgraded from vBulletin 3 to SMF.

*shrug* the more you know!

Wednesday, May 26, 2010 - Analysis of Password Cracking Techniques - Part 2

So I figure I probably should get around to looking at the passwords in this list, since password cracking techniques are the focus of this blog...

First though, a real quick definition. I needed to decide what to call the various parties involved in this whole shenanigans. For example, when I'm talking about the 'hackers', am I referring to the people collecting stolen credit card data who belonged to the board, or the people who hacked Likewise, if I use the term criminals, that could refer to both groups as well. Therefore, in my blog posts I'm going to use the following terms to refer to the two groups:
  • Carders/Users: The people who belonged to the board. Normally I would also use the term 'victims', but I don't want to honor them with that title.
  • Hackers/Attackers: The people who broke into the forum and posted the data online.
Ok, now that we have that out of the way, the rest of this post is going to be broken up into four parts:
  1. Executive Summary
  2. Brief Description of the Password List
  3. General Observations
  4. Setting Up a Password Cracker to Target Salted Sha1 Hashes
More in-depth analysis will have to wait till later since I need more time to go through the data. Also, I'm moving up to the D.C. area in a week to look for a job/start interviewing, so large scale password cracking sessions are not really feasible for me to perform right now.

-BTW, if you have a job opening please feel free to contact me.

Finally, I apologize but this post is going to be a bit more haphazard then the above outline implies, simply because it's hard to talk about one aspect of this dataset without discussing other aspects as well.

Executive Summary:, on online forum for the buying and selling of stolen bank and credit card data, was broken into sometime before May 5th by vigilante hackers. On May 18th, the hackers then posted data about the users to various file sharing sites, (pastebin, rapidshare), which included IP addresses, usernames, e-mail addresses, and all of the saved forum posts. More significant to this report, the attackers also posted the users' password hashes along with the plaintext passwords that the attackers had managed to crack. These passwords and password hashes provide significant data for researchers. From this data, we now know that cyber-criminals often select weak passwords. Furthermore, by analyzing these passwords we can develop better password cracking techniques that target the specific type of users, (criminals), whom law enforcement is the most interested in. This is an improvement from some of the other disclosed password datasets which were gathered from the general public. In addition, by looking at the passwords the attackers managed to crack, along with the passwords they failed to crack, we can gain a better understanding of the capabilities and techniques employed by hacker groups in real life attacks. This hacker group in particular appears to be quite proficient in password cracking techniques, possessing both the tools to attack salted SHA1 password hashes, and the ability to crack close to 53% of the passwords in the dataset.

Brief Description of the Password List(s):

The first thing that stands out is there are actually two lists of passwords present in the attacker's write-up, though all of the password hashes are mixed together.

List 1)
Contains 1737 salted Sha1 password hashes using the following format:


This is the list that the attackers talked about and the list they attempted to crack passwords from. Of all the salted Sha1 password hashes, the attackers cracked 920 of them, (around 53%), during the course of an approximately two week period.

List 2)
Contains 2036 "one hundred and twenty eight bit" password hashes of unknown origin. Due to their length, the base password hash is most likely MD4 or MD5. It is unknown if the attackers knew of the existence of these separate hashes since they did not mention the second hashing function in their writeup or crack any of the passwords from this list. I've run some quick tests and these hashes do not appear to be a single round of MD4 or MD5, but probably instead are salted in some manner, and/or use multiple rounds of MD4/5. If a salt that was not the username was used, then it probably will be infeasible to crack very many of the passwords due to the salt not being included in the writeup. A few of the passwords might still be vulnerable though since the salt may be short enough to be brute-forced along with the guesses.
So the next question is, why were there two separate types of password hashes? The most likely explanation is that at some point in the past, the site administrators switched over to a new forum software, or upgraded their security settings. The old users were left with the original password hash, and users who joined up afterwards were assigned the newer password hash. The exact same thing happened when was hacked, leaving users who hadn't logged into the site recently, exposed with only a simple MD5 hash protecting their password vs. the tougher phpbb3 hash.

The interesting thing is that in this case, a user's hash probably was not changed to the new format when they logged into the site. This can be seen by comparing the user's hash type to the IP addresses I talked about earlier. Of the 960 username/IP address combinations, 47 of the users had their password stored as the unknown hash, and 726 had their passwords stored as a salted Sha1 hash. You might notice that these two numbers did not add up to 960. There were 187 usernames that appeared in the IP logs but did not appear in either list of password hashes. I need to look into this more, but many of them might represent incorrect login attempts. A second option is the attackers may not have released all of the user password hashes, instead choosing to keep several for themselves to aid in future hacking attacks. I should probably analyze the forum posts to confirm if the hash type correlates to when the users became active, and if any of the users with missing password hashes actually posted previously on the forum.

Why yes, this is my "brief" description...

General Observations:

Observation 1) The attackers are very proficient in password cracking techniques
Level of Confidence: High
Discussion: The fact that the attacker could even target the salted Sha1 password hashes speaks to their competence. Most major password crackers, (Cain&Able, John the Ripper, L0phtcrack, etc), do not natively support that salt/hash combination. While John the Ripper has extensive support for a generic MD5 hashing function, where it is very easy to modify it to support all sorts of weird hash/salt combinations, that functionality has not yet been ported over to support Sha1 as well. So either they developed their own password cracker/workaround, (which isn't as hard as you might think, as I'll discuss later), or they had access to a less well known password cracker that supported that hash.

Question: Does anyone know of any password cracking program that does support Sha(username.password)?

In addition, since the passwords were salted with a unique salt per hash, this negates many common attacks like rainbow tables. Also, I don't know of any online password cracking site that supports this hash format, since they can't use hash lookup tables either. This means the attackers had to crack all of the password hashes themselves instead of relying on community resources.

Next, the fact they cracked 53% of the password hashes in a two week period speaks highly of the attacker's skills. To put this in perspective, in the attack, the attacker had submitted unsalted MD5 password hashes to an online cracker and only managed to crack 24% of the passwords. In that case, while the hacker might have been very skilled in webpage attacks, they were pretty much at the script kiddie level when it came to cracking the passwords. While 53% might not sound that much more impressive, there is the fact that in the case, the hashes were salted with a unique salt. The 'unique' part is important since it means an attacker has to hash each guess independently against each target hash they are trying to crack.

For example, imagine a typical cracking session taking one hour to complete. If an attacker was attempting to crack three thousand unsalted password hashes, then it would take them one hour to run all of their guesses against the entire list. On the other hand, if those password hashes used a unique salt, the same attack would then take three thousand hours, or roughly four months. That's a big difference. Assuming the attackers spent two weeks attempting to crack all 1737 salted Sha1 password hashes, that would mean the guesses they could make in their attack would be equivalent to a 11 minute cracking session against unsalted Sha1 passwords. Even assuming they cracked 50% of the password almost immediately, their cracking session would still be only equivalent to around 20 minutes of attacking unsalted hashes. In conclusion: cracking 53% of close to two thousand salted password hashes in a two week period is fairly impressive.

Observation 2) The Attackers did not user John the Ripper's Incremental mode to brute force guesses.
Level of Confidence: High
Discussion: When I ran JtR's incremental mode, using the default charset 'All', against the list, I almost immediately cracked a new password. Over a longer running session, I managed to crack even more.
Observation 3) The Attackers brute forced digits
Level of Confidence: Low/Medium Very Low
Edit March 28th 2001: I've since managed to crack the password '37721010'. It appears that most of the longer digit based passwords that the attackers had cracked were publicly available in lists of previously cracked password. I'm leaving my original discussion post up for posterity.
Discussion: I have not yet been able to crack any all-digit passwords that the attackers did not crack. Likewise, the attackers cracked several passwords such as '19930720', where unless the attacker has outside information, or had run across that password earlier, it is unlikely to be cracked by any form of attack besides brute force.
Observation 4) The Attackers did not brute force alpha passwords longer than five characters long
Level of Confidence: High
Discussion: I have since managed to crack several passwords that were four lowercase characters long followed by two digits. Likewise I managed to crack several passwords that were six characters long all lowercase.
Observation 5) The attackers may have used knowledge from previous attacks to aid their password cracking sessions
Level of Confidence: Medium
Discussion: Let's be honest, this isn't their first time at the rodeo. It's doubtful that is the first blackhat site these attackers have compromised. As we've seen from the ZF0 attacks, just because a website supports a tech-savy audience, doesn't mean it stores their passwords securely. And as we all know, people re-use passwords ... a lot. It's quite possible that the attackers used passwords cracked from previous sites to in turn crack some of the passwords in the set. For example, one of the cracked passwords was 'Nadia2312'. An unsalted MD5 hash of that same password was cracked via the Hashkiller site's project Opencrack, on February 1st, 2010. While it could be a coincidence, I'd also like to point out that Hashkiller is a German based website. Other plaintext passwords in the list such as '123456r12', and '01724776692' also show up as being cracked by HashKiller and InsidePro in 2009. Note, this may just imply that the attackers used custom input dictionaries from these sites, and did not actually submit the original password hashes to them.

Setting Up a Password Cracker to Target Salted Sha1 Hashes

As I mentioned previously: I'm sure there's already a password cracking program out there that can target salted Sha1 hashes, but I don't know of it. Not to be deterred, I quickly modified an old password cracking program I had written a while ago to support the hash, (note I could have modified John the Ripper instead, but I'll get to that in a bit). The hardest part was I didn't realize that when you added the username as the salt, you had to lowercase the entire thing. That's a couple of hours of my life I'd like back...

So testing this on my Mac laptop I expected to be able to make around 1000 guesses a second against the entire set of 1737 password hashes. I based this guess on benchmarking unsalted Sha1 hashes using John the Ripper, which gave me around 7,000,000 guesses a second. This would give me a maximum of 4029 guesses a second against the salted hashes. Throw in the overhead required to apply the salt plus my general programing sloppiness, and 1000 guesses a second sounded like a good number. So you can imagine my surprise when i was only ended up making around 80 guesses a second. Something's not right here...

Well it turned out, the problem was I had used the OpenSSL implementation of the Sha1 hashing algorithm which is really slow compared to the version in John the Ripper. Rather than trying to jury rig it into my old password cracker, I then decided to work with John the Ripper instead, which is what I should have done in the first place.

Since I'm a bit lazy goal oriented, I didn't bother to modify John the Ripper's source code though. I was able to get away with this since the hashing algorithm only uses one round of Sha1. Let's look at the hashing algorithm again:
The simplest way to do this would be to create a rule in JtR that would insert the username first. Such a rule would look like:
Which would insert Alice in front of all of your password guesses. Since we have close to two thousand usernames we are talking about though, that could get annoying having to create a rule for each one. This calls for the use of one of my favorite programs, Awk.

Awk, (and its sister Sed), is one of the best programs out there for quickly parsing and modifying files. It's also very useful for creating word mangling rules on the fly for use in a password cracking session in conjunction with JtR's stdin option. In that way, I sometimes use it much like MiddleChild, and in retrospect, I could easily have written MiddleChild in awk instead. For example I can create guesses using John the Ripper, pipe them into an awk script, and then pipe them back into John the Ripper where the guess will actually be hashed and compared against the target hashes. The above JtR rule could then be replaced with:
./john -wordlist=passwords.lst -session=cracking1 -rules -stdout | awk '{print "alice" $0}' | ./john -stdin -format=raw-sha1
The advantage of this approach is that 'alice' will be automatically inserted in front of all the guesses/rules that the first instance of John the Ripper is generating. But wait, don't we still have close to two thousand usernames we have to do this for? That would be one nasty command line. Luckily an Awk script can be run from a file with the -f option. And how do we create that file with all 1737 usernames? Well with Awk of course! The original format that the hashes were saved to in the writeup was:
Simply cat that file into Awk using the following command will create your Awk script for you.
cat carders_hashes.txt | awk -F : '{print"{print \"" tolower($1) "\"$0}"}' > awk_add_usernames.txt
The -F tells it to use the ':' as a seperator and the tolower() option lowercases the usernames. To run this whole thing just type:
./john -wordlist=passwords.lst -session=cracking1 -rules -stdout | awk -f awk_add_usernames.txt | ./john -stdin -format=raw-sha1
Using this setup, now I'm getting around 2,300 guesses a second which is much more respectable. There certainly is room for improvement though. First of all, John compares each hashed guess against all of the target hashes, instead of only the hash which the salt belongs too. Also, you have to clean up the cracked hashes file afterwards to remove the username from the plaintext password. But did I mention it works? As Miles said in the Lost finale, "I don't believe in a lot of things, but I do believe in duct tape!"

Well, that about does it for now. Next up, analysis of the cracked passwords.

Tuesday, May 25, 2010 - Analysis of E-mail Addresses

I just wanted to point everyone over to Cedric Pernet's bog where he did an amazing job analyzing the e-mail addresses that the carders had used. You can view his work at the following link:

It shouldn't come as a surprise, but just because someone is a cybercriminal doesn't mean they are smart.

Also, if you or anyone you know is doing research into this, feel free to forward me the links. I only found Cedric's blog on a reference in another post on page 8 of a Google search I did, (aka I stumbled on it by pure luck). Thanks!

Sunday, May 23, 2010 Hacked - Initial Analysis of IP addresses

As the title says,, a German forum for the buying and selling of stolen credit cards was hacked and a ton of information was posted publicly online. For a more detailed description, I highly recommend reading the always excellent Brian Krebs writeup on the incident.

I'm going to skip right past my feelings on the subject. The short version is, while part of me is laughing inside, I tend to think such vigilante justice is often counter-productive. I just wish people like that could work with the system because by doing so you can sometimes achieve spectacular results.

Instead I'm going to focus on the data itself and what it can tell us from a research perspective. So far I've managed to download the writeup of the attack, which also includes IP addresses, usernames, e-mail addresses, and password hashes. I'm also currently in the process of downloading what I think is the listing of all the private messages, though it may just turn out to be viruses and fail. BTW, that's why I love VMWare. Edit: It's legit, and wow. It's going to take me a while to sort through/make sense of that data. Expect a post on that later. Google translate, don't fail me now!

As a word of warning, just about everything in this post falls into the category of WAG. While I try to create these guesses with the data I have available, please take everything I say here with a grain of salt. The uncertainty level is HIGH.

First of all, here is a picture of the writeup so you can all enjoy the ASCII art:

So what do we know? From the file timestamps in the writeup, the site was probably compromised sometime around May 5th, (though of course the initial attack could have occurred earlier.) The pastebin copy I downloaded was uploaded on May 18th, so this gives us a general timeframe which will be useful when we start talking about the password cracking that went on.

Question: Does anyone know where/how the attack was initially publicized? Was it a site defacement, posting in a rival forum, etc?

Before I get sidetracked talking about password cracking the hackers did, let's look at those IP addresses collected from the various users of the forum. These IP addresses were only collected for the users who logged on to between May 11th and 12th. Starting out, the simplest thing to do was to run a basic GOIP lookup on the countries associated with them. The results can be viewed in the following graph:

As you can see, it really was a mostly German based forum. Not a big surprise I know, but it's nice to see where everyone is coming from. The next thing I wanted to look into was how many duplicate IP addresses there were. This would imply a forum goer was using multiple usernames, or a common proxy that several of the carders were connecting through.

Of the 960 published IP addresses:
  • 819 of them were unique
  • 25 of them were associated with two user accounts
  • 10 of them were associated with three user accounts
  • 3 of them were associated with four user accounts
  • 2 of them were associated with five user accounts
  • 1 of them was associated with seven user accounts
  • 1 was associated with eight user accounts
  • 1 was associated with eleven user accounts
  • 1 was associated with thirteen user accounts
Thirteen user accounts to one IP address? As the Germans would say: "That's a Bingo". The IP address in question,, has been previously detected by Project Honeypot as a prolific Russian based Mail, and Blog spam sever. It's on most IP blacklists by now, so I guess they also use it as an anonymous proxy sever to keep their revenue stream up. The other heavily used IP addresses showed similar characteristics. What I found particularly interesting was the following pastebin link I stumbled across when Google searching those IP addresses. It was posted on February 28th, and contained three of the proxy servers that the carders used.

The IP address I have the most questions about though was, which is a Swedish IP where eight of the users connected from. I couldn't find any past abuse originating from that site. When looking at the user-accounts/e-mail addresses associated with those IP addresses, another interesting thing popped up. The user "Risking", didn't have an associated password hash/user account listed in the writeup. My guess is that the hackers dumped the hashes fairly early on, (probably around May 5th), but didn't figure out how to grab the IP logs until May 11th. In that time the user Risking probably created a new account.

So we know some of the users were smart enough to use proxy servers. How about Tor? Downloading a list of all of the known Tor routers was fairly easy, so all that remained was to compare them to the IP addresses in the writeup. All in all, there were 20 unique IP addresses that matched known Tor IP addresses. That's actually a lot lower than I expected. Of course the question then is, how were a majority of users connecting to the site? Smaller proxy servers? Public cyber-cafes? Starbucks? Their neighbors wireless? Were they really dumb enough to connect from their home accounts? I don't know the answer to that, but hopefully someone with more experience investigating blackhat forums will post their analysis/experiences on the subject.

So that about does it for IP addresses today. The database containing all of the downloaded messages looks like it also contains IP addresses too so I might revisit this subject soon. The next topic: An analysis of the hacker's password cracking attacks against

Wednesday, May 12, 2010

They'll Let Anyone Graduate: My Password Cracking Dissertation

You've all heard me complain/stress out about writing my dissertation, so now that it's done of course I'm going to post it online. My PhD. dissertation, "Using Probabilistic Techniques to Aid in Password Cracking Attacks" is available for download from my tools page here.

A lot of it is going to look fairly familiar if you've seen my talks or been reading this blog, which makes sense since my dissertation is a summary of what I've been up to for the last three years. Here's a quick breakdown of what's in it:

Chapter 1:
  • Overview + background info
  • The need for password cracking
  • General terms and techniques
  • Obtaining the datasets, and basic statistics about the datasets
  • A quick survey of common password hashes and popular password cracking tools
Chapter 2:
  • Brute Force Attacks
  • 95% of it I've talked about on this blog before
  • The remaining 5%, which I really should post an entry on, is a comparison of a targeted brute force attack against a pure Markov attack. The targeted attack uses the tool I previously released, MiddleChild.
Chapter 3:
  • Dictionary Based Attacks
  • Summary of some of the custom dictionaries I've created plus tools. Most of the tools are available in various places on my tools webpage.
  • Mostly this chapter focuses on the use of a customized edit distance metric for evaluating the effectiveness of input dictionaries, which is something that I've found incredibly useful.
  • Being a total of nine pages long, this is the chapter I feel is the most incomplete. It's also why I'm glad I have a blog so I can rectify that shortcoming over the next couple of months with additional posts ;)
Chapter 4:
  • Dictionary based Rainbow Tables
  • For the complete AV experience, a video of me talking about this at Shmoocon is available here.
  • Quick note: I've since figured out that the higher number of collisions caused by the dictionary tables was due to the fact that I was comparing them to "perfect tables", aka ones where all of the merging chains had been removed. If I remove merging chains from the dictionary tables, they perform just as well, collision wise. The fact that I didn't realize this when I was running the tests is a bit embarrassing. #facepalm
  • I have a bunch of ideas on how to further improve my tables, such as adding targeted brute force support, (with Markov Models!!), and enhancing the basic dictionary attacks with more advanced word mangling rules and multiple dictionaries. Implementing those is very high on my todo list.
Chapter 5:
  • Using probabilistic context free grammars for password cracking
  • Instead of focusing on the word mangling rules, focus on the probabilities instead
  • The result is a password cracker that can be trained on previously disclosed passwords and generates highly targeted guesses
  • This is the heart and soul of my dissertation and the reason why I graduated. If you only read one chapter this should be it.
  • Why should you care: It cracks more passwords with fewer guesses. What's not to like?
Chapter 6:
  • A critique of NIST's use of entropy as a password strength measurement
  • Essentially a very rough draft of the paper I submitted to the ACM CCS conference on the effectiveness of different password creation policies. I started running the tests/writing this section about two weeks before I defended, so I've collected a lot of data since this chapter was finished.
Sot that's about it for three years worth of work. It's a bit humbling I have to admit, since there's still a ton of stuff I still want to look into/implement. Luckily, just because I graduated doesn't mean I have to stop my research. Here's looking forward to the future.

E-mail Address Change

Since I'm graduating, I was informed that I might not be able to keep my e-mail address. I'm trying to see what I can do to hold onto it, but for the time being I'd recommend e-mailing me at

Saturday, April 24, 2010

Optimizing JtR's Single Mode Follow Up

Over at the John the Ripper mailing list, (I'm sure you already belong to it right?!), SolarDesigner, the creator of JtR, raised the following question about the re-ordered Single Mode rule-set I released last night.

It is not clear whether you have full (or any) separation between your training and test sets when you re-order the rules. (You do say that you have such separation for your "UnLock" test, but that's another one.) In other words, the improvement from "Original Single Rules" to "Edited Single Version 2" that you've demonstrated might be partially attributable to you training (re-ordering) the rules on the same set that you later test them on.

It's a valid question and it's something I've worried about myself. Referring back to my original post:

For the target set, the RockYou list seemed like an obvious choice. I actually used a subset of the RockYou list of one million passwords I designated for training purposes, (that way I'm not testing/training against the same passwords).

I should have written more about my methodology. Basically the RockYou set represents a huge number of passwords, (over 32 million of them). One of my concerns though has always been over-training my password cracking techniques. As Solar pointed out, you can easily create a highly optimized set of rules that crack one set of passwords extremely well, but performs poorly against everything else. To avoid that, and to get the most use out of the RockYou list, I decided to follow typical machine learning practices and split the list up into one million chunks of passwords. To ensure the same password doesn't end up in different lists, I first randomized the full list by using the GNU shuf tool, and then divided the list into 32 sub-lists containing one million passwords each. I refer to these as the Rockyou1-32 lists. So far I have designated five the sub-lists as test lists, RockYou_test1-5, and five of the sub-lists as training lists, Rockyou_training28-32. I've been assigning them at the end of the spectrum and moving my way to the middle so I don't get confused which one I used for training vs. testing. The remaining lists I'm saving for future tests, so that way I don't "taint" them with my experiments.

Still, since some users had multiple accounts on RockYou due to the way it was set up, it's highly possible, (if not almost certain in some cases), that different passwords from the same user might appear in both a training and a test set. That's also why I would love to get my hands on the original list that included usernames so I can make sure that this doesn't happen. Since there is bound to be some "cross-pollination" though, it's a very valid concern that tests trained on one of the RockYou training sets, and tested against one of the RockYou test sets contain unfair knowledge and don't accurately represent real password cracking attacks.

All the experiments I've run so far have indicated that the above isn't a major problem, but rather than take my word for it, Fig 4.1 shows another round of tests comparing the original single rule-set vs. the reordered rule-set I released the other night. The input dictionary remains the same, (dic-0294), but this time they are both attacking the list.

Figure 4.1:

In the above test, the rearranged version on the Single rule-set performed better, but this time to a much lesser degree. I then ran the both of the attacks against 33k passwords from the MySpace password testing list, (considering the MySpace list was the first set of real passwords I collected, it also was split into a training and test list each containing around 33K passwords). The results of those cracking sessions can be seen in Fig 4.2.

Figure 4.2:

This time the results are much more dramatic over the first 500 million guesses, with the modified ruleset performing significantly better. So now we've seen against three different sets of English website passwords, (the RockYou_test1 list, the list, and the MySpace test list), that the re-ordering of the single mode mangling rules cracked as many, if not more passwords over the first 500 million guesses compared to the original rule-set.

If you are interested in downloading the modified rule-set, you can still grab it here. To use it, just enter the command line option "rules=Modified_Single".

Thursday, April 22, 2010

Optimizing John the Ripper's "Single" Mode for Dictionary Attacks

While I've been doing a lot of analysis, I figure it's been a while since I actually released anything. That obviously needs to change. As one small step in the right direction, I decided to optimize John the Ripper's "Single" mode word mangling rules for use in normal dictionary based attacks. If you don't want to read through the rest of this post on my methodology, you can grab the new rule-set right here. To make use of it in a cracking session, simply enter the flag:


For a more detailed explanation on what I did, please read on.

The Problem:

First of all, did you know that starting with John the Ripper version 1.7.4 you can have multiple rulesets in the same john.conf config file? Also SolarDesigner added a several new mangling rules, (such as the ability to insert/append whole strings), and increased the speed at which the mangling rules generate guesses. I know the current 1.7.5 branch is still not considered the stable version, but if you are using John the Ripper, I highly recommend upgrading to it.

Perhaps the biggest change from a research perspective was that the single mode ruleset can now be used with normal dictionary based attacks. Let me back up and quickly summarize John the Ripper's three default modes:
  1. Single Mode: Originally designed to only work with the GECOS field information, (mostly just the username + full name of the individuals who the password hashes belong to); Think of it as a dictionary attack with a large number of word mangling rules but a very small dictionary. This is the default first attack that John the Ripper runs.
  2. WordList Mode: Your typical dictionary based attack. The default John the Ripper mangling rules were designed to finish very quickly, so they are highly optimized, but only produce a limited number of guesses, (on average around 40 guesses a word -not a scientific number, but more of a general guesstimate). This is the default second attack that John the Ripper runs.
  3. Incremental Mode: An "intelligent" brute force attack. What John the Ripper falls back on if nothing else works.
One problem I've always had in my research was finding an "honest" comparison when evaluating dictionary based attacks. In the professional password cracking industry, people are very reluctant to disclose their custom mangling rules. That pretty much left me with just John the Ripper's default wordlist mode. The problem was, even with a larger input dictionary, the default rules don't generate very many guesses, (For example, there is no rule that will try all two digit numbers at the end of a password guess). This made comparisons against longer cracking sessions quite difficult since I didn't have anything to compare my methods against. While I could always design a longer set of mangling rules myself, that becomes a little "iffy" from a research perspective. Aka how do people know that I put forth an honest effort to make a fair set of mangling rules vs intentionally creating mangling rules that perform horribly in order to make my own cracking techniques look better. Single mode helps solve this problem since now I can model longer password cracking sessions against an "industry standard attack".

This also helps a majority of the users of John the Ripper who don't want to create their own word mangling rules. Since for simple password hashes such as MD5, the time it takes for the default wordlist mode to finish generally is measured in minutes, (if not seconds), depending on the input dictionary, the ability to use single mode rules allows people to run more extensive dictionary based attacks before they are forced to fall back to incremental mode.

The problem is that the single mode ruleset was designed to work with usernames, not dictionary words. This means the order in which it tries guesses is not tailored to most input dictionaries. As an example of that check out Fig 1.1 of a password cracking session using the single mode rule-set and the dic-0294 input dictionary, (around 860k words), run against a list of one million passwords selected from the RockYou dataset divided by minimum password length. Note: The cracking session was limited to 1 billion guesses.

Figure 1.1:

BTW, you'll probably be seeing this graph in a later post as well...

The thing that really stands out is that the cracking session looks like a series of steps. What this means is that some effective mangling rules are not being tried until later in the password cracking session. While this still isn't a huge deal if fast hashes like MD5 are being attacked, it does become an issue if more computationally expensive hashes are targeted. In short, you want to front-load your cracking session to crack as many passwords as quickly as possible.

The Solution:

Basically, I set out a modest goal for myself. Reorder the Single mode rules to make for a more optimized dictionary based cracking attack.

The first problem was identifying how effective each mangling rule was. To do this I inserted a new mangling rule after each mangling rule in JtR's single ruleset. That rule is as follows:


All it does is delete whatever word that was supposed to be printed from the input dictionary and outputs "THISISABENCHMARK!!" That way I can easily tell when a new rule is started. Admittedly, depending on the size of the dictionary, several million of these debug lines may be printed but that's easy to deal with.

After that is was simple a matter of modifying my verification program to take input from John the Ripper, and keep track of 1) How many passwords each rule cracked, and 2) How many guesses each rule generated. This lead to a passwords cracked per guess metric:
passwords_cracked_per_guess = (#_of_password_cracked) / (#_of_guesses)
In the future I'll probably normalize the equation so the metric is independent of the number of passwords in the training set being attacked. For now though it doesn't really matter. One other minor point: I reset the password list with each new rule. That way a single rule won't mark the same password being cracked twice, but if two different rules would crack a password, they both get credit for it.

After that, the next question was which input dictionaries and password sets should I tailor it to. As you can imagine, both of those choices can make a fairly large impact on which rules are considered the most effective. For the target set, the RockYou list seemed like an obvious choice. I actually used a subset of the RockYou list of one million passwords I designated for training purposes, (that way I'm not testing/training against the same passwords). For the input dictionary I was very tempted to use one of my custom ones, but I actually settled on using dic-0294 instead since that's readily available and most likely represents the type of dictionary a majority of people are going to use.

After that it was pretty straightforward. Run my cracking session against the training set, evaluate how effective each rule was, and then re-order the rules. The results of using the reordered rules when cracking one million test passwords, (different from the training passwords), from the RockYou list can be seen in Fig 2.1.

Figure 2.1:

So as the results show, while the modified version of the single rules starts out slightly better, the original version cracks more passwords in the first one billion guesses .... Wait, What!! That wasn't supposed to happen.

While eventually both rule-sets will crack the exact same number of passwords, I certainly expected the modified version to perform much better over a majority of the cracking session. At first I thought it might be some differences in the training/test password sets, but quickly discarded that idea as one million passwords should be enough for the law of large numbers to come into effect. Looking into it more I quickly discovered that the problem was with my replacing cracked passwords between each different rule when calculating the passwords cracked per guess. The way that the rules and the input dictionary interacted creates a huge number of duplicate guesses. For example, both 'password', and 'Password' are in dic-0294. Therefore the guesses 'Password', 'Password1', 'Password2', etc are created twice, once when trying the words with no capitalization, and once when the first letter is capitalized.

Duplicate guesses are a fact of life when performing most dictionary attacks. While there are ways to minimize them, the benefits of having a pre-mangled input dictionary can sometimes be quite high since it can allow you to crack passwords with context sensitive mangling rules, such as 'passWord', much faster. The reason I'm writing about the above test, (vs. just sweeping it under the rug), is I feel even a failed test can provide a lot, if not more, information than a successful test. That's why I love science.

So the next step was to re-run my calculations, but this time without performing replacement of cracked passwords between different rules. Ideally I should have done this several times, (re-ordering the rules, and recalculating their effectiveness), until a local maximum was reached, but I was lazy and just did it once. The results of running these re-ordered rules andt the original single rule-set against the RockYou test list can be seen in Fig 2.2:

Figure 2.2:

OK, this time the results turned out how I expected it to, with the modified rule-set performing as well as or better than the original single rules. That being said, I think this highlights how much credit should go to SolarDesigner for his work creating the original Single rule-set, since the improvement, while noticeable in the first 500 million guesses, was not overly dramatic. To improve the cracking effectiveness even more, changes would need to be made to the individual rules, vs. just re-ordering them.

As a preview of the probabilistic password cracker I've been working on, currently called UnLock (though I'm desperately looking for a new name since that sounds a bit presumptuous), Fig 3.1 shows it attacking the RockYou test list using the same input dictionary dic-0294. Also, since UnLock allows you create rules by training it on disclosed password lists, I trained this attack on the MySpace password set, which actually wasn't ideal since MySpace required users to include one non-lowercase letter in their password.

Figure 3.1:

Finally, I'll talk about this more later when I write a post on using modified edit distance to evaluate input dictionaries, but the estimated maximum number of passwords possible for Dic-0294 to crack when compared against the RockYou training list was 60.59%. An attacker would achieve that if they tried every combination of case-mangling rules, common letter replacements, combining two words, and appending/pre-pending numbers + special characters. Aka '7123PaS$w0rd$@12' would be included, even though no real life cracking session would likely every create that guess.