CCS Paper Part #2: Password Entropy
This is part #2 in a (mumble, cough, mumble) part serious of posts discussing the results published in the paper I co-authored on the effectiveness of passwords security metrics. Part #1 can be found here.
I received a lot of insightful comments on the paper since my last post, (one of the benefits of having a slow update schedule), and one thing that stands out is people really like the idea of password entropy. Here’s a good example:
“As to entropy, I think it would actually be a good measure of password complexity, but unfortunately there's no way to compute it directly. We would need a password database comparable in size (or preferably much larger than) the entire password space in order to be able to do that. Since we can't possibly have that (there are not that many passwords in the world), we can't compute the entropy - we can only try to estimate it in various ways (likely poor)”
First of all I want to thank everyone for their input and support as I really appreciate it. This is one of the few cases though where I’m going to have to disagree with most of you. In fact, as conceited as it sounds, my main takeaway has been that I've done a poor job of making my argument, (or I’m wrong which is always a possibility). So the end result is another post on the exciting topic of password entropy ;)
When I first started writing this post, I began with a long description on the history of Shannon Entropy, how it’s used, and what it measures. I then proceeded to delete what I had written since it was really long, boring, and quite honestly not that helpful. All you need to know is:
- Claude Shannon was a smart dude.
- No seriously, he was amazing; He literally wrote the first book on modern code-breaking techniques.
- Shannon entropy is a very powerful tool used to measure information entropy/ information leakage.
- Another way of describing Shannon entropy is that it attempts to quantify how much information is unknown about a random variable.
- It’s been effectively used for many different tasks; from proving one time pads secure, to estimating the limits of data compression.
- Despite the similar sounding names, information entropy and guessing entropy are not the same thing.
- Yes, I’m actually saying that knowing how random a variable is doesn’t tell you how likely it is for someone to guess it in N number of guesses, (with the exception of the boundary cases where the variable is always known – aka the coin is always heads- or when the variable has an even distribution – aka a perfectly fair coin flip).
Ok, I’ll add one more completely unnecessary side note about Shannon Entropy. Ask a crypto guy, (or gal), if the Shannon entropy of a message encrypted with a truly random and properly applied one time pad is equal to the size of the key. If they say “yes”, point and laugh at them. The entropy is equal to that of original message silly!
Hey, do you know how hard it is to make an entropy related joke? I’m trying here…
Anyways, to calculate the entropy of a variable you need to have a fairly accurate estimate of the underlying probabilities of each possible outcome. For example a trick coin may land heads 70% of the time, and tails the other 30%. The resulting Shannon entropy is just a summation of the probability of each event multiplied by the log2 of its probability, (and then multiplied by -1 to make it a positive value). Aka:
So the Shannon entropy of the above trick coin would be -(.7 x log2(.7) + .3 x log2(.3)) which is equal to 0.8812 bits. A completely fair coin flip’s entropy would be equal to 1.0. In addition, the total entropy of different independent variables is additive. This means the entropy of flipping the trick coin and then the fair coin would be .8812 + 1.0 = 1.8812 bits worth of entropy.
I probably should have put a disclaimer above to say that you can live a perfectly happy life without understanding how entropy is calculated.
The problem is that while the Shannon entropy of a system is determined using the probability of the different outcomes, the final entropy measurement does not tell you about the underlying probability distribution. People try to pretend it does though, which is where they get into trouble. Here is a picture, (and a gratuitous South Park reference), that I used in my CCS presentation to describe NIST’s approach to using Shannon entropy in the SP800-63 document:
Basically they take a Shannon entropy value, assume the underlying probability distribution is even, and go from there. Why this is an issue is that when it comes to human generated passwords, the underlying probability distribution is most assuredly not evenly distributed. People really like picking “password1”, but there is always that one joker out there that picks a password like “WN%)vA0pnwe**”. That’s what I’m trying to say when I show this graph:
The problem is not that the Shannon value is wrong. It’s that an even probability distribution is assumed. To put it another way, unless you can figure out a method to model the success of a realistic password cracking session using just a straight line, you’re in trouble.
Let me make this point in another way. A lot of people get hung up on the fact that calculating the underlying probability distribution of a password set is a hard problem. So I want to take a step back and show you this holds true even if that is not the case.
For an experiment, I went ahead and designed a variable that has 100 possible values that occur at various probabilities, (thanks Excel). This means I know exactly what the underlying probability distribution is. This also means I’m able to calculate the exact Shannon entropy as well. The below graph shows the expected guessing success rate against one such variable compared to the expected guessing success generated by assuming the underlying Shannon entropy had an even distribution.
Now tell me again what useful information the Shannon entropy value tells the defender about the resistance of this variable to a guessing attack? What’s worse is the graph below that shows 3 different probability distributions that have approximately the same entropy, (I didn’t feel like playing around with Excel for a couple of extra hours to generate the EXACT same entropy; This is a blog and not a research paper after-all).
These three variables have very different resistance to cracking attacks, even though their entropy values are essentially the same. If I want to get really fancy, I can even design the variables in such a way that the variable with a higher Shannon entropy value is actually MORE vulnerable to a shorter cracking session.
This all comes back to my original point that the Shannon entropy doesn’t provide “actionable” information to a defender when it comes to selecting a password policy. Even if you were able to perfectly calculate the Shannon entropy of a password set, the resulting value still wouldn’t tell you how secure you were against a password cracking session. What you really want to know as a defender is the underlying probably distribution of those passwords instead. That's something I've been working on, but I’ll leave my group’s attempts to calculate that for another post, (hint, most password cracking rule-sets attempt to model the underlying probability distribution because they want to crack passwords as quickly as possible).
Comments
I'm confused because when I think about this I come up with this basic scenario my message and one time pad are one bit long and my message is always 1.
My message's Shannon entropy is zero [100% for 1].
My one time pad's Shannon entropy is one [50% for 0, 50% for 1].
My encrypted message's Shannon entropy is one [50% for 1, 50% for 0].
What am I doing wrong?
Let's say you know that 70% of my messages read "Attack at dawn", because I'm an early riser, (cough), but 30% of my messages read "Attack at noon", due to some nights I like to spend at the bar. This means if you intercept one of my messages, you would still be able to guess the contents correctly 70% of the time. The one time pad doesn't add any entropy, but neither does it reduce the entropy of the final message, (aka leak any additional information). This is why it's considered a perfectly secure cipher. Any other crypto algorithm has the potential of leaking additional information.
I'm pretty sure once a message is given its Shannon entropy would be zero because there is only one message.
The encrypted message is given:
This would mean the Shannon entropy of the encrypted message is zero.
The key is given:
This would mean the Shannon entropy of the encrypted message is the same as the Shannon entropy of the message.
Given neither:
This would mean the Shannon entropy of the encrypted message is the same as the key because that's how truly random and properly applied one time pads work.
If the message, encrypted_message and key are unknown, then the entropy is that of the message. That's because you have a base understanding of what the message might say. Aka, Matt really likes attacking at noon, so if Matt sends a message it will probably say "attack at noon" with a certain probability. Even if the attacker isn't able to intercept any messages, they still will know I like to sleep in, and probably won't attack in the morning ;)
If the encrypted message + key is revealed to the attacker, the entropy is 0. Aka, the attacker can decode everything, so there is no unknown info.
If the message is revealed, the entropy is 0. See the above, the attacker knows all they need to know.
With a one-time-pad, if then encrypted message, but NOT the key is revealed, the entropy is equal to that of the original message. By that I mean the attacker doesn't suddenly get dumber by intercepting the encrypted message. Therefore, the entropy can not be worse than if the attacker did not intercept the encrypted message. Therefore, the entropy of the encrypted message can not be more than the entropy of the unknown transmitted message.
Long story short, the entropy of a hidden value can only decrease with the addition of new information. Therefore, even with perfect encryption, the entropy of the message can never be higher than entropy of the original message.
As I said, information entropy is weird ;)
"Ask a crypto guy, (or gal), if the Shannon entropy of a message encrypted with a truly random and properly applied one time pad is equal to the size of the key."
Is asking "if the entropy of the plain text is the same as the entropy of the key."
I thought it was asking "if the entropy of the cipher text is the same as the entropy of the key."
Thanks for be so patient and sorry for wasting your time.
C(X,k) = 2 log_2(G(0.5, X, k))
where X is a probability distribution over {x_1, ... x_m} and k is a key (or password), k = x_i, and N(p, X, k) is number of guesses needed for probability p of finding k in distribution X.
C is for "crack time". And I picked 2 log_2 so that C(X,w) will equal H(X) when X is uniformly distributed.
Some things in there are arbitrary and not particularly principled. But if nobody has something better then I will use that when talking about this.