Wednesday, May 12, 2010

They'll Let Anyone Graduate: My Password Cracking Dissertation

You've all heard me complain/stress out about writing my dissertation, so now that it's done of course I'm going to post it online. My PhD. dissertation, "Using Probabilistic Techniques to Aid in Password Cracking Attacks" is available for download from my tools page here.

A lot of it is going to look fairly familiar if you've seen my talks or been reading this blog, which makes sense since my dissertation is a summary of what I've been up to for the last three years. Here's a quick breakdown of what's in it:

Chapter 1:
  • Overview + background info
  • The need for password cracking
  • General terms and techniques
  • Obtaining the datasets, and basic statistics about the datasets
  • A quick survey of common password hashes and popular password cracking tools
Chapter 2:
  • Brute Force Attacks
  • 95% of it I've talked about on this blog before
  • The remaining 5%, which I really should post an entry on, is a comparison of a targeted brute force attack against a pure Markov attack. The targeted attack uses the tool I previously released, MiddleChild.
Chapter 3:
  • Dictionary Based Attacks
  • Summary of some of the custom dictionaries I've created plus tools. Most of the tools are available in various places on my tools webpage.
  • Mostly this chapter focuses on the use of a customized edit distance metric for evaluating the effectiveness of input dictionaries, which is something that I've found incredibly useful.
  • Being a total of nine pages long, this is the chapter I feel is the most incomplete. It's also why I'm glad I have a blog so I can rectify that shortcoming over the next couple of months with additional posts ;)
Chapter 4:
  • Dictionary based Rainbow Tables
  • For the complete AV experience, a video of me talking about this at Shmoocon is available here.
  • Quick note: I've since figured out that the higher number of collisions caused by the dictionary tables was due to the fact that I was comparing them to "perfect tables", aka ones where all of the merging chains had been removed. If I remove merging chains from the dictionary tables, they perform just as well, collision wise. The fact that I didn't realize this when I was running the tests is a bit embarrassing. #facepalm
  • I have a bunch of ideas on how to further improve my tables, such as adding targeted brute force support, (with Markov Models!!), and enhancing the basic dictionary attacks with more advanced word mangling rules and multiple dictionaries. Implementing those is very high on my todo list.
Chapter 5:
  • Using probabilistic context free grammars for password cracking
  • Instead of focusing on the word mangling rules, focus on the probabilities instead
  • The result is a password cracker that can be trained on previously disclosed passwords and generates highly targeted guesses
  • This is the heart and soul of my dissertation and the reason why I graduated. If you only read one chapter this should be it.
  • Why should you care: It cracks more passwords with fewer guesses. What's not to like?
Chapter 6:
  • A critique of NIST's use of entropy as a password strength measurement
  • Essentially a very rough draft of the paper I submitted to the ACM CCS conference on the effectiveness of different password creation policies. I started running the tests/writing this section about two weeks before I defended, so I've collected a lot of data since this chapter was finished.
Sot that's about it for three years worth of work. It's a bit humbling I have to admit, since there's still a ton of stuff I still want to look into/implement. Luckily, just because I graduated doesn't mean I have to stop my research. Here's looking forward to the future.


Anonymous said...

Hey Matt... long time reader of your blog, but never posted.

Chapter 5 is an interesting read. Thanks for sharing the paper. Good luck in finding a great job too, you deserve one after all of this research.

I was hoping to get your opinion on this year's DEF CON password cracking contest. Have you heard of it? From the description on the website, it sounds like a corporate environment with password complexity rules in place (length, charsets, etc.) in your opinion, what percentage would the winner or a competitive entrant need? 20%, 30%? Unlike many of the public hacks, where we find thousands of common dictionary words and names, these passwords should be much harder to crack because of the complexity. I just wanted a professional opinion ;) on what percentage would be reasonable.

Matt Weir said...

Hey Brad, thanks! As to your question, I really don't know what a respectable cracking percentage will be, but hopefully that number will match whatever I manage to crack ;)

Yup, I'll be participating, though I certainly don't plan on winning. What I'm really looking forward to though is the chance to meet with everyone else and learn what other people are doing. This is the first major password cracking challenge that I'm aware of, and judging from the amount of e-mails I've been getting, a lot of other people are excited as well. What's also nice is that it's really spurring a ton of password cracking tool development, (expect a lot of releases in the next two weeks). In short, I'm hoping this turns into an event like the lockpicking village with the contest being almost besides the point. Of course I might be saying that because I'm going to get creamed as well...

Focusing on the contest, my biggest concern is that the passwords we will be cracking aren't real. This isn't a criticism. There's no way you could run it with real corporate passwords, (well, legally that is...). It's just something to keep in mind. What will be interesting though is applying the techniques learned from the winner, (part of the rules are that you have to disclose your cracking techniques), to other datasets as they become available. If I had to hazard a guess, here's some predictions of mine about the contest:

1) Most passwords will be based on relatively common dictionary words. Way more so than you would find normally.

2) Most of the cracking will center around applying the correct mangling rules. Yes there will be the 'Dictionary123' words, but I expect most 'high score' passwords will have less common rules such as 'xD1ct1on@ryx'.

3)There will probably be some LANMAN passwords, so bring your rainbow tables.

4) I expect there to be so many NTLM passwords that rainbow tables for them won't be cost effective.

5) I'll be interested to see if they have any 'exotic' password hashes. WinRAR, TrueCrypt, etc.

6) It'll be a ton of fun ;)

I hope to see you there Brad!

Anonymous said...

Thanks for replying. I hope you do well in the contest. I plan to try some code I wrote to see how it fairs. I'm into threaded programming and have a new 6 core computer. I'm also researching word patterns as an alternate approach. I won't be attending, but have some friends who will be.

I enjoy these sorts of contests. Last year a friend and I participated in the Engineyard sha1 challenge, We didn't win, but we got respectable scores. Good luck.

pro said...

Thanks for your share! I think this information is helpful for everyone. I'm doing practice GMAT here: . I hope it's useful for GMAT test takers.

Lionel Abbott said...
This comment has been removed by the author.
Lionel Abbott said...

It’s really an amazing feeling once you finally finish writing your dissertation, right Matt? Well, three years of your hard work really paid off after your dissertation writing was completed. Anyway, took a peak at your paper. You really did an amazing job with it.