Cracking Passwords with Middle Child
Yup, it's been a month since my last post. Believe it or not, I've actually been fairly busy, both in working on my probabilistic password cracker, and writing up several research papers. That doesn't even begin to get into the stacks of disclosed passwords I've managed to accumulate but I still need to do some analysis of. Of course it's fairly hard to complain about having too many passwords to crack/analyze. It's sort of like having too many girls ask you out. It's a good problem to have.
Wow, after rereading the last couple of sentences, I really need to get my priorities in order ;)
On that off-topic note though, I figured I should actually make some more of my tools available to the public rather than just boring everyone by talking about crypto. Middle Child is a tool designed to aid in targeted brute force password cracking attacks.
The short summary is that I love using John the Ripper's -incremental mode which incorporates the most sophisticated Markov modeling I've yet to see in a password cracker. JtR uses 2nd order Markov models, (where it models the probability of three letter triplets appearing), and it actually keeps separate probability information for different length words, (aka the probability matrix it uses for a 5 letter word is different from the one it uses for an 8 letter word). To better illustrate this, below are the first 10 guesses JtR will create when running a brute force attack, (using only a lower alpha key-set):
- bara
- sandy
- shanda
- sandall
- starless
- dog
- bony
- bool
- boon
- stark
As you can see, it looks a lot like a dictionary attack, but it's actually using brute force. The advantage is that if you let it run long enough it will completely cover the keyspace. For example, it will eventually try unlikely guesses such as 'zqzqqqzz' which a normal dictionary attack will never try. The problem is, while Markov models are very efficient at generating guesses that look like human generated words, they still can have trouble cracking any halfway decent password, (due to time constraints). Middle Child is an attempt to apply additional logic to JtR's -incremental option to more efficiently attack stronger passwords.
-Ed note: Yes that is the "short" summary...
So what is the additional logic that Middle Child uses? Mostly just basic modeling of how people create passwords. For example, people often put numbers at the end of a password, and if they bother to hit the shift key, capitalize the first letter. Another optimization is if users incorporate special characters and numbers into their passwords, they generally use a special character followed by a number and not the other way around. Really all Middle Child does is apply word mangling rules to brute force guesses. By applying these rules to the password guess as a whole, it allows an attacker to target what are traditionally considered stronger passwords using a brute force attack. This is important since most input dictionaries will generally only crack around 30-50% of an average password set even in a best case scenario. That number isn't made up, (like 74% of all statistics...). If you are interested, check out my Defcon16 slides for a further explanation of how I came up with that 30-50% range. Note, I haven't tried to run the numbers again with Sebastien Raveau's humongous wikipedia wordlist yet since it was so large it broke my parsing tool.
Ok, back on track: Now admittedly when you start to use multiple input dictionaries you can achieve better results, (especially input dictionaries created from previously cracked passwords), but after a certain point that starts to resemble brute force as well. I guess what I'm trying to say is that brute force attacks still play a very important role in password cracking. It just means you have to be smart about how you go about using brute force style attacks.
Now, I've actually gotten into arguments over whether this can still be called a brute force attack, (since I'm using Markov models and word mangling rules). Can you say, "Nerd fight"? My response is anything that doesn't use an input dictionary technically is a brute force attack. Personally I like the term "Targeted brute force attacks", though I've seen people call it "Indexed attacks" as well. In the end, it is what it is, and I find it useful so I don't care if you call it a brute force attack or not. I've posted before about how troublesome it is that the password cracking field doesn't have standard definitions...
You can get the tool, (along with some more info and a short user's guide), here.
Comments