Saturday, October 30, 2010

CCS Paper Part #2: Password Entropy

Round Peg, Square Hole
This is part #2 in a (mumble, cough, mumble) part serious of posts discussing the results published in the paper I co-authored on the effectiveness of passwords security metrics. Part #1 can be found here.

I received a lot of insightful comments on the paper since my last post, (one of the benefits of having a slow update schedule), and one thing that stands out is people really like the idea of password entropy. Here’s a good example:
“As to entropy, I think it would actually be a good measure of password complexity, but unfortunately there's no way to compute it directly. We would need a password database comparable in size (or preferably much larger than) the entire password space in order to be able to do that. Since we can't possibly have that (there are not that many passwords in the world), we can't compute the entropy - we can only try to estimate it in various ways (likely poor)”
First of all I want to thank everyone for their input and support as I really appreciate it. This is one of the few cases though where I’m going to have to disagree with most of you. In fact, as conceited as it sounds, my main takeaway has been that I've done a poor job of making my argument, (or I’m wrong which is always a possibility). So the end result is another post on the exciting topic of password entropy ;)

When I first started writing this post, I began with a long description on the history of Shannon Entropy, how it’s used, and what it measures. I then proceeded to delete what I had written since it was really long, boring, and quite honestly not that helpful. All you need to know is:
  1. Claude Shannon was a smart dude.
  2. No seriously, he was amazing; He literally wrote the first book on modern code-breaking techniques.
  3. Shannon entropy is a very powerful tool used to measure information entropy/ information leakage.
  4. Another way of describing Shannon entropy is that it attempts to quantify how much information is unknown about a random variable.
  5. It’s been effectively used for many different tasks; from proving one time pads secure, to estimating the limits of data compression.
  6. Despite the similar sounding names, information entropy and guessing entropy are not the same thing.
  7. Yes, I’m actually saying that knowing how random a variable is doesn’t tell you how likely it is for someone to guess it in N number of guesses, (with the exception of the boundary cases where the variable is always known – aka the coin is always heads- or when the variable has an even distribution – aka a perfectly fair coin flip).
Ok, I’ll add one more completely unnecessary side note about Shannon Entropy. Ask a crypto guy, (or gal), if the Shannon entropy of a message encrypted with a truly random and properly applied one time pad is equal to the size of the key. If they say “yes”, point and laugh at them. The entropy is equal to that of original message silly!

Hey, do you know how hard it is to make an entropy related joke? I’m trying here…

Anyways, to calculate the entropy of a variable you need to have a fairly accurate estimate of the underlying probabilities of each possible outcome. For example a trick coin may land heads 70% of the time, and tails the other 30%. The resulting Shannon entropy is just a summation of the probability of each event multiplied by the log2 of its probability, (and then multiplied by -1 to make it a positive value). Aka:


So the Shannon entropy of the above trick coin would be -(.7 x log2(.7) + .3 x log2(.3)) which is equal to 0.8812 bits. A completely fair coin flip’s entropy would be equal to 1.0. In addition, the total entropy of different independent variables is additive. This means the entropy of flipping the trick coin and then the fair coin would be .8812 + 1.0 = 1.8812 bits worth of entropy.

I probably should have put a disclaimer above to say that you can live a perfectly happy life without understanding how entropy is calculated.

The problem is that while the Shannon entropy of a system is determined using the probability of the different outcomes, the final entropy measurement does not tell you about the underlying probability distribution. People try to pretend it does though, which is where they get into trouble. Here is a picture, (and a gratuitous South Park reference), that I used in my CCS presentation to describe NIST’s approach to using Shannon entropy in the SP800-63 document:


Basically they take a Shannon entropy value, assume the underlying probability distribution is even, and go from there. Why this is an issue is that when it comes to human generated passwords, the underlying probability distribution is most assuredly not evenly distributed. People really like picking “password1”, but there is always that one joker out there that picks a password like “WN%)vA0pnwe**”. That’s what I’m trying to say when I show this graph:



The problem is not that the Shannon value is wrong. It’s that an even probability distribution is assumed. To put it another way, unless you can figure out a method to model the success of a realistic password cracking session using just a straight line, you’re in trouble.

Let me make this point in another way. A lot of people get hung up on the fact that calculating the underlying probability distribution of a password set is a hard problem. So I want to take a step back and show you this holds true even if that is not the case.

For an experiment, I went ahead and designed a variable that has 100 possible values that occur at various probabilities, (thanks Excel). This means I know exactly what the underlying probability distribution is. This also means I’m able to calculate the exact Shannon entropy as well. The below graph shows the expected guessing success rate against one such variable compared to the expected guessing success generated by assuming the underlying Shannon entropy had an even distribution.


Now tell me again what useful information the Shannon entropy value tells the defender about the resistance of this variable to a guessing attack? What’s worse is the graph below that shows 3 different probability distributions that have approximately the same entropy, (I didn’t feel like playing around with Excel for a couple of extra hours to generate the EXACT same entropy; This is a blog and not a research paper after-all).


These three variables have very different resistance to cracking attacks, even though their entropy values are essentially the same. If I want to get really fancy, I can even design the variables in such a way that the variable with a higher Shannon entropy value is actually MORE vulnerable to a shorter cracking session.


This all comes back to my original point that the Shannon entropy doesn’t provide “actionable” information to a defender when it comes to selecting a password policy. Even if you were able to perfectly calculate the Shannon entropy of a password set, the resulting value still wouldn’t tell you how secure you were against a password cracking session. What you really want to know as a defender is the underlying probably distribution of those passwords instead. That's something I've been working on, but I’ll leave my group’s attempts to calculate that for another post, (hint, most password cracking rule-sets attempt to model the underlying probability distribution because they want to crack passwords as quickly as possible).

Thursday, October 7, 2010

New Paper on Password Security Metrics

I'm in Chicago at the ACM CCS conference, and the paper I presented there: "Testing Metrics for Password Creation Policies by Attacking Large Sets of Revealed Passwords", is now available online.
Since I had the paper and presentation approved through my company's public release office I was given permission to blog about this subject while the larger issue of my blog is still going through the proper channels. Because of that I'm going to limit my next couple of posts to this subject rather than talking about the CCS conference as a whole, but let me quickly point you to the amazing paper "The Security of Modern Password Expiration: An Algorithmic Framework and Empirical Analysis", written by Yinqian Zhang, Fabian Monrose and Michael Reiter. In short, they managed to obtain a great dataset, their techniques were innovative and sound, and there's some really good analysis on how effective password expiration policies really are, (spoiler: forcing users to change their password every six months isn't very useful).

I'd like to first start by acknowledging the other authors who contributed to the "Testing Password Creation Metrics..." paper.
  • Dr. Sudhir Aggarwal - Florida State University: My major professor, who spent I don't know how many hours helping walk me through the subtle intricacies of information entropy.
  • Michael Collins - Redjack LLC: Another data driven researcher, and much cooler than me since he uses GNUPlot instead of Excel ;)
  • Henry Stern - Cisco IronPort: He was the driving force behind getting this paper written. It was over lunch at the Microsoft Digital Crime Consortium, (it's a conference to combat cybercrime, and not a group of people from Microsoft looking to commit digital crime like the name implies...), that the framework for this paper was laid out.
As for the contents of the paper, I'm planning on breaking the discussion about it down into several different posts, with this post here being more of an overview.

When writing this paper, we really had two main goals:
  1. How does the NIST model of password entropy as defined in SP800-63 hold up when exposed to real password datasets and realistic attacks?
  2. How much security is actually provided by typical password creation policies, (aka minimum length, character requirements, blacklists)?
Based on our results, we then looked at the direction we would like password creation policies move to in the future. This ended up with us suggesting how to turn our probabilistic password cracker around, and instead use it as part of a password creation strategy that allows people to create passwords however they like, as long as the probability of the resulting password remains low.

Of all that, I feel our analysis of the NIST password entropy model is actually the most important part of the paper. I know it sounds like an esoteric inside baseball subject, but the use of NIST's password entropy model has a widespread impact on all of us. This is because it provides the theoretical underpinning for most password creation policies out there. Don't take my word for how widespread the use of it is. Check out the Wikipedia article on password strength, (or better yet, read the discussion page) for yourself.

Our findings were that the NIST model of password entropy does not match up with real world password usage or password cracking attacks. If that wasn't controversial enough, we then made the even more substantial claim that the current use of Shannon Entropy to model the security provided by human generated passwords at best provides no actionable information to the defender. At worse, it leads to a defender having an overly optimistic view of the security provided by their password creation policies while at the same time resulting in overly burdensome requirements for the end users.

Getting in front of a room full of crypto experts and telling them that Shannon Entropy wasn't useful to evaluate the security of password creation policies and "We REALLY need to STOP using it", was a bit of a gut clenching moment. That's because the idea of information entropy is fairly central to the evaluation of most cryptographic algorithms. I would have never done it except for the fact that we have a lot of data backing this assertion up. The reason we are making the broader point is because it's tempting to dismiss the flaws in the NIST model by saying that NIST just estimated the entropy of human generated passwords wrong. For example, if you juggle the constants around or perhaps look at word entropy vs character entropy, things will work out. Our point though is not that you can't come up with a fairly accurate Shannon entropy model of human generated passwords. You most assuredly can. It's just that it's not apparent how such a model can provide "actionable information". In addition, the way we currently use Shannon Entropy in evaluating password security policies is fundamentally flawed.

This subject really does require another blog post, but before I head back to Boston I wanted to leave you with one of the graphs from our paper that demonstrates what I'm talking about:


The above graph shows cracking sessions run against passwords that met different minimum length password creation requirements, (aka must be at least seven characters long). The NIST estimated cracking speed is based on the calculated NIST entropy of passwords created under a seven character minimum password creation policy. You may notice that it overestimates the security of the creation policy over shorter cracking sessions, but at the same time doesn't model longer cracking sessions either. This is what I keep on saying that it doesn't provide "actionable intelligence", (third time and counting). When we say "password entropy" what we really want to know is the Guessing Entropy of a policy. Unfortunately, as a community, we keep using Shannon entropy instead. Guessing entropy and Shannon entropy are two very different concepts, but unfortunately there doesn't exist a very good way of calculating the guessing entropy, while calculating the Shannon entropy of a set of text is well documented. This is part of the reason why people keep trying to use Shannon entropy instead.

So I guess I should end this post by saying, if any of this sounds interesting please read the paper ;)