Sunday, March 11, 2018

Creating Long Term SSL Certificates

"It's constantly fascinating for me that something that feels absolutely right one year, 12 months later feels like the wrong thing to do." --Damian Lewis
Often I find myself having to create my own SSL certificates. Be it an internal web-server, or two scripts that need to communicate to each other, SSL is the easiest way to encrypt network traffic. Unfortunately it's also one of the most dangerous encryption methods. If you make a mistake setting it up it usually works ... at least for a little while.

Ignoring the client SSL checks for now, (hint if your script is using SSL and it works the first time, you probably are not checking SSL correctly), one area of danger is having your SSL certificates expire. As an example of that, recently every Oculus Rift broke because a code signing certificate expired. Admittedly this was a different type of certificate, but the same thing tends to happen with internal SSL deployments. People do not remember to update them, and when they expire things tend to break, (at least if your clients are checking SSL properly). The problem is when you use the standard OpenSSL libraries to create your certificates, there's three places that you need to specify certificate lifetimes. If you forget to specify any of the three, the certificate will be valid only for the default which is set to be "365 days".

These lifetime checks are:
  1. The Certificate Authority has an expiration date
  2. The actual certificate you are using has an expiration date
  3. The CA signature for the certificate has an expiration date
Since most stack-overflow posts don't cover this, and Linux man pages are not helpful unless you already know what you are doing, I wanted to share my cheat sheet for creating long term, (valid for one thousand years), SSL Certificate Authorities and signing certs. This script was born from many previous failed efforts, and to be honest I'm still not sure I have it perfectly right. If you notice any improvements that could be made, please let me know! 

  • These instructions were written for CentOS. It should work for most other Linux flavors without any changes. If you are using Windows, good luck!
  • OpenSSL
  • Whenever you see 365000 in the command that's the expiration date. I'm using 365*1000 as shorthand for one thousand years. Yes I realize that isn't exactly accurate. Feel free to change this to the time period you want to use.

Creating the Certificate Authority: (If you already have a CA ignore this, but you might want to check the valid lifetime for that CA)
  • Generate the key for the CA using 4096 RSA. Note the key will be cakey.pem so protect that!
openssl req -new -newkey rsa:4096 -nodes -out ca.csr -keyout cakey.pem
  • Create the CA's public certificate which will be called "cacert.pem". Note the '-days' field:
openssl x509 -trustout -signkey cakey.pem -days 365000 -req -in ca.csr -out cacert.pem
Important: When you run the previous command, you'll be set a list of questions. Note, for many SSL deployments you *must* have the Country, City, State, and Organization match between your CA and the certificates you are signing. Does this make sense? Of course not! The domain can be pretty important as well depending on what you are doing.
  • Next you need to copy the CA info and create the required files into where OpenSSL expects them. Yes if you know what you are doing you can override the defaults, but if not here's what to do:
    1. If the /etc/pki/CA directly does not exist, create it
    2. mv cakey.pem /etc/pki/CA/secret/cakey.pem
    3. touch /etc/pki/CA/index.txt
    4. create or edit /etc/pki/CA/serial using the text editor of your choice
    5. In this file put a list of all the serial numbers you want to assign certificates, separated by a newline. For example:
      • 01
      • 02
      • 03
      • 04
    6. It is *highly* recommended that you set permissions on the /etc/pki/CA directory so only the user you want to sign certificates has access to it.
  • Note, cacert.pem is not used for signing SSL certificates, but you'll need to push it to clients that are verifying the certificates

Creating and Signing a SSL Certificate:

  • Create the certificate private key using RSA 4096. It is named client.key in this example. Make sure you protect this!
 openssl genrsa -out client.key 4096
  • Create the certificate request. Note the "days" field.
openssl req -new -key client.key -out client.csr -days 365000  
Important: Remember for the questions it asks you, the Country, City, State, and Organization *must* match between your CA and the certificates you are signing. In addition, the domain can be pretty important depending on if you are checking that with your client or not
  •  Create the actual client certificate. Once again, note the '-days' field
openssl ca -in client.csr -out client.pem -days 365000 -notext

Resulting Files: 
  1. Public Client Certificate: client.pem
  2. Client private key: client.key (Only deploy on the server that owns this key)
  3. Public CA certificate: cacert.pem
  4. Private CA key: cakey.pem (Protect this one!!)

Wednesday, August 16, 2017

Solving Problems with Unknown Constraints

"Software constraints are only confining if you use them for what they're intended to be used for" 
-- David Byrne (Of the Talking Heads)
I recently had an ongoing conversation that spanned several days about the subject of solving mazes. A friend casually mentioned the "Same Wall Rule", (also known as the "Right Hand Rule"), for solving a maze. This is where if you want to find the exit of a maze you should pick a wall and follow it, with the assumption that you will eventually find the exit this way.
Same Wall Rule for Solving a Maze

I pointed out that while this rule generally works, you can't count on it as it can fail spectacularly. For example, what if you start out next to a free-standing wall?
Same Wall Rule Failing Horribly
After that our conversation turned to other things but the next day my friend came back and said "I found the problem! The Same Wall Rule will work, but you have to start at the beginning of the maze! Then you can be guaranteed that you won't hit a free-standing wall".

Which is true in most cases, but what if what you are looking for an exit in a free-standing section of the maze? For example what if the treasure is in the middle or you are dealing with a 3-dimensional maze?
Same Wall Rule Failing to Find Treasure
This reminded me of a paper that Cormac Herley recently wrote titled: Justifying Security Measures. I highly recommend reading it. It points out that in the security community we often say:
Security(X) > Security(~X)
When we really mean:
Outcome(X|ABCD) > Outcome(~X|ABCD).
Which is a fancy way of showing that when we say doing X is more secure than not doing X, there usually is a large number of assumptions, (ABCD....), that we're leaving out. Where this directly relates to the main topic of this blog, (password security), is that Herley specifically calls out the password field for the practice of ignoring constraints in our security advice. Or, to quote his paper:
"Passwords offers a target-rich environment for those seeking tautologies and
unfalsifiable claims."
Now back to the issue of maze solving, the same problem often arises. When we make a maze solving algorithm, we're making certain assumptions about the rules of the game. For example, the next iteration of a mapping algorithm might involve marking rooms that you have been in before to detect loops. Well there is a certain fairy-tale where that approach failed due to the marks being destroyed by a 3rd party actor:

Hansel and Gretel showing that marks aren't always permanent
Even assuming you can safeguard your marks in the maze, that approach may still not be effective if the maze moves while you are traversing it.

I've never seen such an amazing premise turned into such a boring book
Note, these assumptions go both ways. For example if you are designing a super hard maze, a snarky player can often do something completely unexpected.

Seriously, why would you want to go through the maze?
I'd argue that coming up with a perfect maze solver that works for all mazes with no constraints is a near impossible problem. If you can design an algorithm, chances are someone else can come up with a situation where it will fail. On the plus side, the same goes for maze designers. If you come up with a maze with constraints, someone probably can solve it even if it's not how you expected the maze to be solved.

This is a point that I'm actually optimistic about. We deal with imperfect knowledge of the rules we're operating under every day. That's part of the human condition! Tying this back in with Herley's paper, I think there's some things to keep in mind.
  1. When giving advice to end users, I think it's fair to leave implied constraints out as long as the person giving the advice keeps them in mind. Aka telling your kids to follow the right hand wall to get through a corn maze is perfectly reasonable. Telling your kids this assumes there are no minotaurs or evil clowns waiting in the maze to eat them probably will not result in the end state you are aiming for.
  2. Unfortunately following the above can lead to those constraints being forgotten over time and that advice being applied to situations where it is no longer helpful.
  3. Therefore you need to be willing to question previously held beliefs and come up with new approaches when reality doesn't match your expected experiences.
The question then is, how do you discover/rediscover unknown constraints when your start experiencing issues?

One way to deal with this is through experimental design along with making hypothesis about what the results of those experiments will be before you run them. That's something I'm trying to get better at doing as seen in my previous blog post

As an example: Hurley raises the question "Are lower-case pass-phrases better or worse than passwords with a mix of characters". If I construct an experiment I have to specify a set of constraints that experiment will run under. Now do those constraints match up with the real world use-cases. Of course not! But the fact that there are constraints can help myself and other people interpret how to use those results. Likewise before running an experiment it's important to have a theory and make a hypothesis about what the results will be. Once that's done, running the experiment can validate or falsify the hypothesis. I can then update theory as needed and the process continues.

To put it another way, I think there is a lot of areas where the academic side of computer security can help improve the practical impact that computer security choices impose on the end user ;p

Sunday, August 14, 2016

Evaluating the Value of the (@)Purge Rule

“Only sometimes when we pick and choose among the rules we discover later that we have set aside something precious in the process.”  
― Helen Simonson, Major Pettigrew's Last Stand

Background and Problem Statement:

I was recently asked the following question: "Is there any value in supporting the character purge rule in Hashcat?" The purge rule '@x' will remove all characters of a specific type from a password guess. So for example the rule '@s' would turn 'password' into 'paword'. The full thread can be found on the Hashcat forum here. The reason behind this inquiry was that while the old version of Hashcat implemented the character purge rule, GPU versions of Hashcat and Hashcat 3.0 dropped support for it. Since then, At0m added support for the rule back in the newest build of Hashcat which makes this question much less pressing. That being said, similar questions pop up all the time and I felt it was worth looking into if only to talk about the process of investigating problems like this.

Side note, as evidence that any change will break someone's workflow, when researching this topic I did find one user who stored passphrase dictionaries with spaces left intact. They would then use the purge rule to remove the spaces during a cracking session so that way they wouldn't have to save a second copy of their passphrase wordlist without spaces. For that reason alone I think there is some value in the purge rule

The Purge Rule Explained:

Hashcat Rule Syntax: @X where (X) is the character you want to purge from the password guess
Example Rule: @s
Example Input: password
Example Output: paword


My gut feeling is that the purge rule will have limited impact on a cracking session. I base that on a rule of thumb that mangling rules work best if they mimic the thought process people use when creating passwords. For example, people often start with a base word and then append digits to it, replace letters with L33t replacements, etc. Therefore rules that mimic these behaviors tend to be more successful. I just don't see many people removing character classes from their password.

Now if you are a Linux fan, you'll realize Linux developers *love* removing characters from commands. Do you want to change your password? Well "passwd" is the command for you! Maybe Linux developers use the same strategy for their passwords? So I certainly could be wrong. That being said, the whole idea of a hypothesis is to go out on a limb and make a prediction on how an existing model will react so here I go:

My hypothesis is that the purge rule will crack less than 1 thousand passwords of a 1 million password dataset, (0.1%). Of those passwords cracked, a vast majority (95%), will be cracked due to weaknesses of the input dictionary vs. modeling how the user created the password. For example, 'paword' might be a new Pokemon type that didn't show up in the input dictionary vs being created by a user taking the word 'password' and then removing the S's.

Short Summary of Results:

The purge ruleset cracked 164 passwords (0.016% of the test set). This was slightly better then just using random rules which in a test run cracked 23 password, but not by much. Supporting this rule is unlikely to help in any noticeable degree with your cracking sessions.

Experimental Setup:

Test Dataset: 1 million passwords from the newest MySpace leak. These were randomly selected from the full set using the 'gshuf -n 1000000' command.

Reason: Truth be told, the main reason I used the MySpace passwords was I'm getting tired of using the RockYou dataset for everything. That being said, it's useful for this experiment that all of the passwords in that dataset have been converted to lowercase since I don't have to worry about combining case mangling rules with the purge rules.

Tools Used: Hashcat for the cracking, and John the Ripper for the --status option

Rulesets Used: Hashcat's D3ad0ne manging rules. I broke it up into two different rulesets with one containing the purge rules, (along with a few append/prepend '@' rules that snuck in), and the other one containing all the other mangling rules.

Reason: D3ad0ne's mangling rules contains about 34 thousand individual mangling rules. Due to its size and the fact that it is included with Hashcat it should make a good example of a ruleset that many Hashcat users are likely to incorporate in their cracking sessions. I initially split the base ruleset into two different subsets, with all rules including the '@' into one ruleset called d3ad0ne_purge, and all the other rules into another one called d3ad0ne_base. I then started manually going through d3ad0ne_purge and placing rules such as "append a @" into the d3ad0ne_base, but with over 1k rules in d3ad0ne_purge I quickly decided to remove the results of the append/prepend '@' after the fact instead of trying to fully isolate only purge rules in their own ruleset.

Dictinary Used: I used dic-0294 as my wordlist. Yes there are better input dictionaries out there, but this is a common one and strikes a good balance between size and coverage, plus it is public vs other dictionaries I have that are based on cracked passwords

Experimental Results:

Step 1) Run a normal cracking session on the 1 million myspace passwords using dic-0294 and D3ad0ne_base. This is important since the purge rule will likely crack many passwords that would be cracked normally with other rules. Running a normal cracking session first remove those passwords so we can focus on password that would only be cracked by the purge rules. The command I ran was below, (note, I'm editing some of the path information out of the commands for clarity sake).
./hashcat -D1 -m 100 -a 0 --remove myspace_rand_1m_hc.txt -r rules/d3ad0ne_base.rule dic-0294.txt
A couple of notes about the above rule. I'm using a version of Hashcat that I updated on August 10th 2016. I ran it on a very old MacBook Pro so the -D1 is telling it to use CPU only, (since the GPU doesn't have enough memory). The -m 100 is telling it to crack unsalted SHA-1 hashes. The -a 0  is to do a basic dictionary attack. --remove was to remove any cracked hashes so they aren't counted twice in future cracking sessions. myspace_rand_1m_hc.txt is my target set, rules/d3ad0ne_base.rule is my ruleset, and dic-0294.txt is my input dictionary. Below are the results of running this first attack.

With 36% of the passwords cracked by a very vanilla attack on a slow computer, that isn't bad. Next up is running the purge rules.

Step 2) Delete the previous hashcat.pot file. Run a cracking session on the remaining passwords using the purge ruleset. The command I ran was very similar to the one above:
./hashcat -D1 -m 100 -a 0 myspace_rand_1m_hc.txt -r rules/d3ad0ne_purge.rule dic-0294.txt
Note, I took off the --remove option since I didn't care about removing cracked hashes for this. I also deleted the previous .pot file of cracked passwords since I only wanted to store passwords associated with this test. Here is a screenshot I took partway through the cracking session:

As you can see. many of the cracked passwords were due to "insert a @ symbol" vs. using the purge rule. Here are the final results:

The session managed to crack 405 unique hashes. I then went into the pot file and deleted any password containing the '@' character so what was left was due to the purge rule.  This left me a list containing 128 unique passwords. A screenshot is shown below:

Now it's hard to tell what people were thinking when they created these passwords, but glancing through the list, it certainly appeared that most of the cracked passwords were simply due to limitations in my input dictionary vs users purging characters from their passwords. I was actually surprised 'jayden' and 'fatguy' weren't in dic-0294 but after double checking it they were in fact missing from it.

Now, input dictionaries are always going to be limited to a certain extent so these cracks absolutely count. They only represent uniq cracked hashes though. For example, if 20 people used the password 'imabear' it would only be counted once. To figure out how many total accounts would have been cracked, I re-ran the above dictionary through John the Ripper against the myspace_1m_rand list. This was to get the files into John's cracked file (pot) format. For example here is 'imabear' in john.pot:


The reason I did this was because JtR has a really cool feature '-show' that will match up cracked passwords with the accounts in the target set. Running the command:

./john -format=raw-sha1 -show myspace_rand_1m_clean.txt

resulted in the following output:

Therefore the purge rules cracked a total of 164 passwords from the test set, or 0.0164% of the total. That's a really small amount. Admittedly every password cracked is nice, but still I was curious if the purge rules were better then just running random mangling rules instead. Luckily, Hashcat supports a command to test that out:
./hashcat -D1 -m 100 -a 0 myspace_rand_1m_hc.txt -g 500 dic-0294.txt
The only difference with the above command and the previous Hashcat commands I ran was that instead of a rules file I specified '-g 500'. What that does is tell Hashcat to generate 500 random rules to run on the input dictionary. I choose that number since there were over a thousand rules in my D3ad0ne_purge dictionary and I guestimated that about half of them were actual purge rules. When I ran the above I ended up cracking 23 more passwords. That's significantly less then the 164 the purge rules did but in the grand scheme of things it was about the same in effectiveness. Considering some of those rules were likely duplicates of rules in D3ad0ne_base ruleset as well I'd argue that running a purge rule is about equivalent of running a random mangling rule. In fact if you don't already have purge rules in your mangling set, I'd probably recommend not worrying about it and just running a brute force method like Markov mode to stretch your dictionary instead.


For once my gut feeling was right and the value of Hashcat's purge rule '@' was limited in the tests that were run. That's not to say that it's not useful. It may help when targeting certain users or aid in keeping the size of your dictionary files on disk manageable. But at the same time, it's not a major feature that other password crackers should rush to mimic. I hope this blog post was informative in helping show different ways to evaluate the effectiveness of a mangling technique. If you have any questions, comments or suggestions please feel free to leave them in the comments section.

Thursday, July 7, 2016

Cracking the MySpace List - First Impressions

Alt Title: An Embarrassment of Riches


Sometime around 2008, a hacker or disgruntled employee managed to break into MySpace and steal all the usernames, e-mails, and passwords from the social networking site. This included information covering more than 360 million accounts. Who knows what else they stole or did, but for the purposes of this post I'll be focusing only on the account info. For excellent coverage of why the dataset appears to be from 2008 let me refer you to the always superb Troy Hunt's blog post on the subject. Side note, most of my information about this leak also comes from Troy's coverage.

This dataset has been floating around the underground crime markets since then, but didn't gain widespread notoriety until May 2016 when an advertisement offering it for sale was posted to the "Real Deal" dark market website. Then on July 1st, 2016, another researcher managed to obtain a copy and then posted a public torrent of then entire leak for anyone to download. That's where things stand at this moment.

Unpacking the Dataset:

The first thing that stands out about the dataset is how big it is. When uncompressed the full dump is 33 Gigs. Now, I've dealt with database dumps of similar size but they always included e-mails, forum posts, website code, etc. The biggest password dataset I previously had the chance to handle was RockYou set which weighed in at 33 million passwords and took up 275 MB of disk. Admittedly that didn't include user info and passwords were stored as plaintext, (the plaintexts are generally shorter than hex representation of hashes), but still that's a huge leap in data to process. Heck, even the full RockYou list is a bit of a pain to processes.

Let me put this another way. Here is a simple question, "How many accounts are in the MySpace list?" Normally that's quick and easy. Just run:
wc -l
And then you wait ... and wait ... and wait ... and then Google if there is a faster way to count lines .. and then wait. 16 minutes and 24 seconds later, I fount out there were 360,213,049 lines in the file. Does that equal the number of total accounts or is there junk in that file? Well, I don't want to spend the 30+ minutes to run a more complicated parser so that sounds about right to me ¯\_(ツ)_/¯.  Long story short, doing anything with this file takes time. Eventually I plan on moving over to a computer with a SSD and more hardware which should help but it's something to keep in mind.

That being said, the next question is "What does the data look like?" Well here is a screenshot of the first couple of lines.

As you can see, it takes the form of unique ID that increments, e-mail address, username, and then two hashes. All of the fields except the unique ID can be blank.To answer the next question, "Why two hashes?" well ... ¯\_(ツ)_/¯. That's something I plan on looking at but I haven't gotten around to it yet.

Update: 7/7/16: Just as I was finalizing this post, I ran across CynoSure Prime's analysis where they managed to crack almost every single hash in this dataset. You can find their blog post here. It turns out the second hash is actually the original password, (full length with upper case characters) salted with the user_id. I'm going to leave most of this blog entry unmodified even though how to parse the list can certainly be optimized based on this new info. </Update>

Other random tidbits: The final unique ID is 1005290998. That's significantly higher than the number of accounts in this dataset so there are large chunks of accounts that were deleted at some point in time. My guess is when a user deleted their MySpace account it really was deleted in which case, kudos to MySpace for doing that! That's just a guess though. As you would expect the first accounts were administrative accounts and system process accounts. I know I blocked out the user e-mails but I will admit I googled the first name. When I found his LinkedIn profile my first reaction was, "Wow, he needs brag about his accomplishments more than just saying:"
Developed, and launched the initial Myspace community which currently has over 100 million members and was acquired by Fox Corp. for $580 million.
I mean if it was me I would post that database dump on my resume! Of course further googling led me to to the book "Stealing MySpace." Reading about all the drama that went on and suddenly there went my evening. Needless to say, the general layout of the dataset looks legit but one more interesting fact was all those gmail accounts. MySpace was created in 2003, Gmail opened for invitation access in 2004, and the lead engineer of MySpace left in 2003. So employees were able to update their accounts after they had left the company. Once again, kudos to MySpace but that was surprising.

Password Hash Format:

I initially learned from Troy Hunt's posts that the hashes were unsalted SHA1 with the plaintext lowercased and then truncated to 10 characters long. Therefore the password:
would be saved as:
I've heard some people say that this means hackers can just brute force the entire key-space. If I was feeling nit-picky I could argue *technically* that's beyond the reach of commercial setups as 70^10 is still a really big number (27 characters + 10 digits, + 33 special characters). In reality though by intelligently searching the key-space, (who uses commas in their password?), a vast majority of unsalted password hashes can be cracked under that format. It's a bit of a moot point though since the real issue is using such a fast unsalted hash. Ah 2008, when it was still acceptable to claim ignorance for using a bad hashing set-up.

Long story short, from my experiments so far I can confirm that it appears all the hashes had their plaintexts lowercased and truncated to 10 characters. Also, yes, serious attackers are very likely to crack almost every password in this list.

Cracking MySpace Passwords With John the Ripper (Take 1):

After glancing around the dataset, the next thing I wanted to do was start cracking. To do this, I needed to extract and format the hashes. My first attempt to do this yielded the following script:
cat | awk -F':' '{if (length($2) > 3) {print "myspace_big_hash1:" substr($4,3); if (length($5) > 3) {print "myspace_big_hash2:" substr($5,3)}}}'  > myspace_clean_big.hsh
To point out a couple of features, I was labeling my data-sets so they are correctly identified in my input file, (I maintain different input files for different data sets but still having that name there has saved me trouble in the past), and I was removing blank hashes. Also I was stripping the username and e-mail addresses since I really didn't want to see passwords associated with names. The problem was the resulting file was huge. I didn't save it, but it was bigger than the original list! I couldn't afford the full naming convention. Therefore I switched to to following script:
cat | awk -F':' '{if (length($2) > 3) {print substr($4,3); if (length($5) > 3) {print substr($5,3)}}}'  > myspace_temp.hsh
 And then to remove duplicates I ran:
sort -u myspace_temp.hsh > myspace_big.hsh
The resulting file was a little under 8 gigs which was better. Problems occurred though when I tried to load the resulting hash file into JtR. More specifically after letting it run overnight, JtR still hadn't loaded up the password list and started making guesses. That kind of makes sense, That's way more passwords than normal to parse and my laptop only had 8 gigs of ram so even in an ideal case the whole list probably couldn't be stored in memory. That's not an ideal cracking situation. Being curious, I then decided to try and load it up in Hashcat.

Cracking MySpace Passwords With Hashcat:

Loading up the dump in Hashcat was interesting since it gave me warnings about records in the dataset that weren't parsed correctly.

Regardless, once all was said and done, I ended up with the following error:
ERROR: cuMemAlloc() 2

Doing some quick Googling, I found out the cause was that the GPUs ran out of memory trying to load the hashes. Not surprising but it meant I had to take a different approach if I wanted to crack any hashes from this set.

The easiest way to do this was to split the full list up into smaller chunks and then crack each section by itself. One way to do that is with the split command
split -l 5000000000 myspace_big.hs myspace_split_
This will break up the list into 5 million hash chunks that follow the line of myspace_split_aa, myspace_split_ab .... The downside is since you have to crack each file individually, the total cracking time has been increased by close to a factor of 40.  I'd recommend playing with the file size to maximize the total number of hashes per file that your GPU supports. On the plus side, after all that I can now finally crack passwords!

Finally cracking passwords

One issue I had was that there were so many hashes cracking all the time that it was hard to see the status of my session. It's not that my attack was effective, but with a list that large it's hard not to crack something. I belatedly realized I could pause hashcat, print the status and then resume. Or are Jeremi Gosney replied on Twitter, I could have used the following switch with Hashcat:
-o /dev/null 

Closing Thoughts:

I'll admit I'm writing this conclusion with CynoSure Prime's analysis fresh in my mind. While the MySpace list is great for giving me a real world challenge to knock my head against, I'm not sure how useful it'll be from a research perspective. The 66 million salted hashes that were created from the original plaintexts will be nice for new training and testing sets so researcher's don't have to keep using RockYou for everything. That being said, MySpace is actually an older list than RockYou. Also I fully expect there to be a lot of overlap in the passwords between the two datasets. RockYou's entire business model was allowing apps to work across multiple social networking sites in the era before federated logins. RockYou was storing MySpace + LiveJournal + Facebook passwords in the clear so its app could post cross-post across all of them. Statistically I expect MySpace and RockYou to be very similar. 

What worries me though, and what makes the MySpace list special, is it has user information associated with all those 360 million accounts + password hashes. Just about everyone who did any social networking and is between the ages of 24 and 40 is in this dump. I realize this list has been in the hands of criminals for the last eight years and a lot of the damage has already been done. Still, now that this list is public it enables many more targeted attacks to be carried out by malicious actors from all over the internet. How long before we start seeing the top 100 celebrity passwords posted on sites like Gawker? What about ex's using this information against former partners? Previous public password dumps have been much more limited or didn't contain e-mail addresses. I really don't know what will happen with this one. Hopefully I'm being overly paranoid but it's hard not to think about the downsides associated with this dump being widely distributed. On the plus side, hopefully this is the only mega-breach we'll see with weak password storage. Sites like Google and Facebook are now using very strong hashes which will limit a lot of damage if their user information is disclosed in the future.

Wednesday, May 11, 2016

Getting Started With Quantum Computing

“More often than not, the only reason we need experiments is that we're not smart enough.” ― Scott Aaronson
IBM is currently offering free time on one of their quantum computers for interested researchers. Yup, you can program a real life quantum computer right now! In fact, I highly recommend signing up which you can do here. Go ahead and check it out. It took me about 24 hours to get my account approved so you can come back here afterwards to finish reading this post.

What got me interested in this opportunity was that while I have tried to keep up on the field of quantum computing, it basically is magic to me.  I've been building up some general rules in my head about quantum systems, but any sort of question about them that did more than scratch the surface left me shrugging my shoulders. Also it was hard to separate fact from fiction.
Quantum Laws (in Matt's head):
  1. Quantum is a system like everything else. 
  2. A quantum state is a configuration of the system.
  3. A quantum state changes; it naturally wants to evolve, but it can always be undone.
  4. Evolution of a closed system is a unitary transformation on its Hilbert space.
  5. Only the Keeper can block quaffle shots thrown by the opposing team
  6. Do not feed your qubits after midnight
That's why IBM's offer interested me so much. Let's be honest, there's always going to be some magic when it comes to quantum systems, but the opportunity to actually get hands on time programming one would at least turn the whole experience into alchemy if not science for me.

Participating in IBM's Quantum Experience:

After your account is approved you immediately have access to a research portal which IBM calls the "Quantum Experience". It's currently in Beta, but beyond a few bugs in the composer, (which I'll talk about in a bit), it's a very well polished site.

Are you ready to experience some quantums!?
The portal is divided into three tabs, "User Guide", "Composer", and "My Scores". The User Guide is fairly self explanatory but actually impressed me more than the quantum computer itself. I'm still making my way through it but the authors deserve a pat on the back since it's some of the best technical writing I've seen in a while. What's more, there are multiple links to the quantum simulator with examples for each section so you can read about a particular operation or theory and then run a simulation of it and check the results. You can then modify the example, re-run it, and in general play around with the concept before going back to where you were in the user's guide.

Don't worry, it starts out with simpler concepts.
The Composer is the programming interface for the quantum computer. It is attached to a quantum simulator as well. In it, you write "Scores" which are basically circuits to run on the quantum computer. IBM calls them scores since with five qubits to work with it looks like sheet music. That's also how the composer got its name.

An example score in the composer. Yes, this is the default example for Grover's algorithm, but I renamed it since it's all about how you frame the problem.
You can simulate a given quantum score as much as you'd like. When doing so, (or creating a new score), you have the option of choosing an ideal or real layout. The difference is that there are physical limitations of the real quantum computer which directly impact how you design your score.

Red pill or blue pill?
Qubit 2 is the gatekeeper
That's one of the neat things about using this service vs a standard quantum simulator. You can see some of the limitations that current implementations have to deal with. For example, Qubit 2 is the only qubit that can talk to other qubits, so if you want to perform operations like conditional NOTs, (CNOT), that has a huge impact.

Running it For Real:

That's all fun but the real reason you are probably using IBM's service is to actually run programs on their quantum computer.  I'll admit the "good old days" of punch card mainframes was before my time but the whole setup is somewhat similar. You are given "Units" which are then used up when you run a program vs simulate it. IBM currently is being very generous with giving them out and you can request more for free. The typical program uses around 3 units to run. The results are probabilistic, so each run can be made up of multiple "shots", and then in the end the average of the results is presented to you. Further display options, such as blotch spheres where the results are plotted as a vector on a 3D sphere take even more shots to generate.

I feel a bit guilty about not just running this once, but it's the same price!
To further help you save units, as well as get you the results sooner, if your quantum score has previously been run by someone IBM will give you the option to see the saved results vs re-running it yourself.

Well I guess I wasn't that original...
If you do choose to run your program you are added to the queue. So far, most of my results have been available within a couple of minutes.

Someone else is quantuming ahead of me
Looks like the plain-text is '00'. As I said, it's all about framing the problem.
Remember earlier when I said the runs were probabilistic? You can really see that in the results above. The correct answer was '00', but around 24% of the time a different answer was chosen. 

Issues With the Composer:

I need to submit bug reports, (the bug icon is prominently displayed in the lower right corner of the screen on the portal site), but I've been hesitant to since all the issues I've run into have been very minor. Sometimes the composer gets a bit wonky, (gates get stuck or aren't saved when you run your simulation), but the problem goes away when I refresh my screen. Also, it would be nice if the transitions between composer and users guide were quicker or I could have them open side by side, (opening multiple browser windows does not work). All in all though, I haven't run into any major issues considering this program is currently a beta release.


You are not going to be able to hack any Gibsons with IBM's quantum computer. It's very limited, but that is kind of the point. It shows where the field of quantum computing is right now. IBM is providing an amazing free learning opportunity with this service and if you are at all interested in the future of computing I highly recommend checking it out.

Wednesday, August 19, 2015

Challenges with Evaluating Password Cracking Algorithms

"In theory, theory and practice are the same. In practice they are not" -Quote from somebody on the internet. Also attributed to Albert Einstein but I've never been able to find the original source to back that up.


Currently I'm writing a post looking into Hashcat's Markov mode but I found myself starting off by including several paragraphs worth of disclosures and caveats. Or to put it another way:

It was a valid point Jeremi brought up and it's something I'm trying to avoid. After thinking about it for a bit I figured this topic was worth its own post.

Precision vs Recall: 

Part of the challenge I'm dealing with is I'm performing experiments vs writing tutorials. That's not to say I won't write tutorials in the future but designing and running tests to evaluate algorithms is fun and what I'm interested in right now. Why this can be a problem though is that I can get so deep into how an algorithm works that it's easy to loose sight of how it performs in a real life cracking session. I try to be aware of this, but an additional challenge is representing these investigations to everyone else in a way that isn't misleading.

This gets into the larger issue of balancing precision and recall. In a password cracking context, precision is modeling how effective each guess is when it comes to cracking a password. The higher your precision, the fewer guesses on average you need to make to crack a password. As a rule of thumb if you see a graph with number of guesses on the X axis and percentage of passwords cracked on the Y axis, it's probably measuring precision.

An example of measuring the precision of different cracking techniques
Recall on the other hand is the percentage of passwords cracked during a cracking session in total, regardless of how many guesses are made. Usually this isn't represented in a graph format, and if it is, the X axis will be represented by "Time", and not number of guesses.

Courtesy of Korelogic's Crack Me If You Can contest. This represents a Recall based graph
It's tempting to say that "Precision" is a theoretical measurement and "Recall" is the practical results. It's not quite so clear cut though since the "time" factor in password cracking generally boils down to "number of guesses". In an online guessing scenario an attacker may only be able to make 10-20 guesses. With a fast hash, offline attack, and a moderate GPU setup, billions of guesses a second are possible and an attack might run for several weeks. Therefore recall results tend to be highly dependent of the particular situation being modeled.

Now it would be much easier to switch between "Precision" and "Recall" if there was a direct mapping between number of guesses and time. The problem is, not all guesses take the same amount of time. A good example of that is CPU vs GPU based guessing algorithms. Going back to John the Ripper's Incremental mode, I'm not aware of any GPU implementation of it so guesses have to be generated by the CPU and then sent to the GPU for hashing. Meanwhile Hashcat's Markov mode can run in the GPU itself, and in Atom's words "it has to create 16 billions candidates per 10 milliseconds on a single GPU. Yes, billions". Therefore this can lead to situations such in the case of a very fast hash where certain attacks might have a higher precision, but worse recall.

Amdahl's law and why I find precision interesting

When trying to increase recall an attacker generally has two different avenues to follow. They can increase the number of guesses they make or they can increase the precision of the guesses they make. These improvements aren't always exclusive; many times you can do both. Often though there is a balancing act as more advanced logic can take time and may be CPU bound. What this means is that you might increase precision only to find your recall has fallen since you are now making fewer guesses. That being said, if the increase in precision is high enough, then even an expensive guessing algorithm might do well enough to overcome the decrease in the total number of guesses it can make.

Often in these optimization situations Amdahl's law pops into my head, though Gustafson's law might be more appropriate for password cracking due to the rate of increase in the number of guesses. Amdahl's law in a nutshell says the maximum speedup you can have is always limited by the part of the program you can't optimize. To put it another way, if you reduce the cost of an action by 99%, but that action only accounts for 1% of the total run-time, then your maximum total speedup no matter how cool your optimization is would be no more than 1%.

Where this applies to password cracking is the cost of  a guess in an offline cracking attack can be roughly modeled as:
Cost of making the plain-text guess + cost of hashing + general overhead of the cracking tool
Right now the situation in many cases is that the cost of hashing is low thanks to fast unsalted hashing algorithms and GPU based crackers. Therefore it makes sense to focus on reducing the cost of making the plain-text guesses as much as possible since that will have a huge impact on the overall cost of making a guess. Aka, trading precision for speed in your guessing algorithm can have a significant impact on the total number of guesses you can make. If on the other hand a strong hash is used, (or you at least are trying to crack a large number of uniquely salted hashes), the dominant factor in the above equation becomes the hashing itself. Therefore a speedup in the plaintext generation will not have as much impact on the overall cost and therefore precision becomes more important.

As a researcher, precision is very interesting for me. From a defensive standpoint a good starting place is "use a computationally expensive salted hash". If you aren't at least doing that then the chances are you aren't interested in doing anything more exotic. Also when it comes to contributing to the larger research community, well my coding skills are such that I'm not going to be making many improvements to the actual password cracking tools. Evaluating and improving the precision of different attacks is much more doable.

Carnegie Mellon's Password Guessability Service:

One cool resource for password security researchers is the new Password Guessability service being offered by the CUPs team over at Carnegie Mellon. I'm going to paraphrase their talk, but basically their team got tired of everyone comparing their passwords attacks to the same default rulesets of John the Ripper so they created a service  for researchers to model more realistic password cracking sessions. If you are interested their USNIX paper describing their lab setup can be found here. Likewise if you want to see a video of their Passwords15LV talk you can view it here. More importantly, if you want to go to their actual site you can find it here:

The service itself is free to ethical security researchers and is run by students so don't be a jerk. The actual attacks they run are bound to change with time, but as of right now they are offering to model several different default password cracking attacks consisting of around 10 trillion guesses each. These cracking attacks use the public TrustWave's JtR KoreLogic Rulelist, several different HashCat rulesets, an updated Probabilistic Context Free Grammar attack, and another custom attack designed by Korelogic specifically for this service. All in all, if you need to represent an "industry standard" cracking session it's hard to do better. In fact it probably represents a much more advanced attacker than many of the adversaries out there if you assume the target passwords were protected by a hashing algorithm of moderate strength.

I could keep on talking about this service but you really should just read their paper first. I think it's a wonderful resource for the research community and I have lots of respect for them offering this. So the next question of course is what does that mean for this blog? I plan on using this service as it makes sense without hogging Carnegie Mellon's resources. I need to talk to them more about it but I expect that I'll  have them run it against a subset of the RockYou list and then use, and reuse, those results to evaluate other cracking techniques as I investigate them. If I attack some other dataset though I may just run a subset of the attacks myself, unless that dataset and the related tests are interesting enough to make using CM's resources worth it.

Fun with Designing Experiments:

When designing experiments there's usually a couple of common threads I'm always struggling with:
  1. Poor datasets. I know there's a ton of password dumps floating around but often due to the nature of their disclosure there's massive problems or shortcomings with most of them. For example most of the dumps on password cracking forums or pastebin have only unique hashes, so '123456' only shows up once, and there is no attribution. Gawker was a site most people didn't care about and the hashing algorithm cut off the plaintext after 8 characters and replaced non-ASCII text as a '?'. A majority of the passwords in the Stratfor dataset were machine generated. Myspace, well that was a result of a phishing attack so it has many instances of 'F*** You You F***ing Hacker'. Even with RockYou the dataset is complicated as it contained many passwords from the same users for different sites but since there were no usernames connected with the public version of it, it can be hard to sort out. Then there is the fact that most of these datasets were for fairly unimportant sites. I'm not aware of any confirmed public Active Directory dump, (though there are a large number of NT hashes floating about and this whole Ashley Madison hack may change things with the Avid Life Media NT hashes there). Likewise, while there are some banking password lists, the amount of drama surrounding them makes me hesitant to use them.
  2. Short running time. Personally I like keeping the time it takes to run a test to around an hour or so. While I can certainly run longer tests, realistically anything over a couple of days isn't going to happen since I like using my computers for other things and truth be told, it always seems like I end up finding out I need to run additional tests or I messed something up in my original setup and need to re-run it. Shorter tests are very much preferred. Add into that the fact that most of the time I'm modeling precision and running my tests on a CPU system means most of my tests will not be modeling GPU cracking vs fast hashes.
  3. What hypothesis do I want to test, and can I design an experiment to test it? I'll admit, sometimes I'll have no clue what the results of a test will be so I'll pull a YOLO, throw some stuff together and just run it to see what pops out. That's not ideal though as I usually like to try and predict the results. I'm often wrong, but that at least forces me to look deeper into what assumptions I held were wrong, and hey that's why I run tests in the first place.
Furthermore, for at least the next couple of tools I'm investigating I plan on using both Hashcat and John the Ripper as much as possible. While it might not always make sense to use both of them as often there isn't an apples to apples comparison, I do have some ulterior motives. Basically it helps me to use both of these tools in a public setting and I've already gotten a lot of positive feedback from my PRINCE post. It's pretty amazing when I can have a creator of a tool tell me how I can optimize my cracking techniques. My secondary reason for this is to make people more aware of both of these tools. When it comes to the different attack modes I've found there's a lot of misunderstandings of what each tool is capable of.

That being said, I explicitly don't want to get into "Tool A is better than Tool B" type debates. Which tool you use really depends on your situation. Heck, occasionally I'm glad I still have Cain and Abel installed. I'll admit, this is going to get tricky when I'm doing tests such as comparing Hashcat's Markov mode to JtR's Incremental mode, but please keep in mind that I want to make all the tools better.

Enough talk; Give us some code or graphs or GTFO:

Thanks for putting up with all of that text. In the spirit of showing all my research I'm sharing the tool that I wrote to evaluate password cracking sessions which I'll be using in this blog. The code is available here:

The specific tool I'm talking about, (in the hope that I release multiple tools in the future so it isn't obvious ;p), is called It's a significantly faster version of the old checkpass program I had used and released in the past. The options on how it works are detailed in the -h switch, but basically you can pipe whatever password guess generation tool you are using into it and it'll compare your guesses against a plaintext target list and tell you how effective your cracking session would have been. For example if you were using John the Ripper you could use the -stdout option to model a cracking session as follows:
./john -wordlist=passwords.lst -rules=single -stdout | python -t target.pws -o results.txt
It also has some options like limiting the maximum number of guesses or starting a count at a specific number if you want to chain multiple cracking sessions together. There's certainly still a lot of improvements that need to be made to it, but if you like graphs I hope it might be useful to you. Please keep in mind that this isn't a password cracker. Aka, It does not do any hashing of password guesses. So if you want to model a password cracking session against a hashed list you'll need to run two attacks, One to crack the list using the tool of your choice, and a second session to use the tool to model your cracking session against the cracked passwords. Since both John the Ripper and Hashcat have logging options you might want to consider using them instead to save time. Where checkpass2 is nice for me anyway is the fact that I can quickly edit the code depending on what I need so it's easier to do things like format the output for what I'm doing. Long story short, I hope it is helpful but I still strongly recommend looking into the logging options that both John the Ripper and Hashcat offer.

Monday, December 22, 2014

Tool Deep Dive: PRINCE

Tool Name: PRINCE (PRobability INfinite Chained Elements)
Version Reviewed: 0.12
Author: Jens Steube, (Atom from Hashcat)
OS Supported: Linux, Mac, and Windows
Password Crackers Supported: It is a command line tool so it will work with any cracker that accepts input from stdin

Blog Change History:

1/4/2015: Fixed some terminology after talking to Atom
1/4/2015: Removed a part in the Algorithm Design section that talked about a bug that has since been fixed in version 0.13
1/4/2015: Added an additional test with PRINCE and JtR Incremental after a dictionary attack
1/4/2015: Added a section for using PRINCE with oclHashcat

Brief Description: 

PRINCE is a password guess generator and can be thought of as an advanced Combinator attack. Rather than taking as input two different dictionaries and then outputting all the possible two word combinations though, PRINCE only has one input dictionary and builds "chains" of combined words. These chains can have 1 to N words from the input dictionary concatenated together. So for example if it is outputting guesses of length four, it could generate them using combinations from the input dictionary such as:
4 letter word
2 letter word + 2 letter word
1 letter word + 3 letter word
1 letter word + 1 letter word + 2 letter word
1 letter word + 2 letter word + 1 letter word
1 letter word + 1 letter word + 1 letter word + 1 letter word
..... (You get the idea)

Algorithm Design:

As of this time the source-code of PRINCE has not been released. Therefore this description is based solely on At0m's Passwords14 presentation, talking to At0m himself on IRC as well as running experiments with various small dictionaries using the tool itself and manually looking at the output.

As stated in the description, PRINCE combines words from the input dictionary to produce password guesses. The first step is processing the input dictionary. Feeding it an input dictionary of:
resulted it in generating the following guesses:
...(output cut to save space)
Therefore, it's pretty obvious that the tool does not perform duplicate detection when loading a file

Finding #1: Make sure you remove duplicate words from your input dictionary *before* you run PRINCE

After PRINCE reads in the input dictionary it stores each word, (element), in a table consisting of all the words of the same length. PRINCE then constructs chains consisting of 1 to N different elements. Right now it appears that N is equal to eight, (confirmed when using the --elem-cnt-min option). It does this by setting up structures of the different tables and then filling them out. For example with the input dictionary:
It will generate the guesses:
This isn't to say that it won't generate longer guesses since elements can be longer then length 1. For example with the following input dictionary:
It generates the following guesses
...(output cut to save space)
Next up, according to the 35 slide of the Passwords14 talk it appears that Prince should be sorting these chains according to keyspace. This way it can output guesses from the chains with the smallest keyspace first. This can be useful so it will do things like append values on the end of dictionary words before it tries a full exhaustive brute force of all eight character passwords. While this appears to happen to a certain extent, something else is going on as well. For example with the input dictionary:
It would output the following results:
.....(Lots of results omitted).....
This is a bit of a mixed bag. While it certainly saved the highest keyspace chains for the end, it didn't output everything in true increasing keyspace order since elements of length 1, (E1), had two items, while elements of length 2, (E2), only had one item, but it outputted E1 first. I have some suspicions that the order it outputs its chains is independent on how many items actually are in each element for that particular run, (aka as long as there is at least one item in each element, it is independent of your input dictionary). I don't have anything hard to back up that suspicion though beyond a couple of sample runs like the one above. Is this a problem? Quite honestly, I'm not really sure, but it is something to keep in mind. When I talked to Atom about this he said that password length compared to the average length of items in the training set also influenced the order at which chains were selected so that may have something to do with it.

Finding #2: PRINCE is not guaranteed to output all chains in increasing keyspace order, though it appears to at least make an attempt to do so

Additional Options:

--elem-cnt-max=NUM:  This limits the number of elements that can be combined to NUM. Aka if you set NUM to 4, then it can combine up to 4 different elements. So if you had the input word 'a' it could generate 'aaaa' but not 'aaaaa'. This may be useful to limit some of the brute forcing it does.

The rest of the options are pretty self explanatory. One request I would have is for PRINCE to save its position automatically, or at least print out the current guess number when it is halted, to make it easier to restart a session by using the "--skip=NUM" option.


PRINCE was written by Atom so of course it is fast. If you are using a CPU cracker it shouldn't have a significant impact on your cracking session even if you are attacking a fast hash. For comparison sake, I ran it along with JtR's incremental mode on my MacBook Pro.

run laki$ ../../../Tools/princeprocessor-0.12/  < ../../../dictionaries/passwords_top10k.txt | ./john --format=raw-sha1-linkedin -stdin one_hash.txt
Loaded 1 password hash (Raw SHA-1 LinkedIn [128/128 SSE2 intrinsics 8x])
guesses: 0  time: 0:00:02:00  c/s: 1895K  trying: asdperkins6666 - bobperkins

JtR Incremental Mode:
run laki$ ./john -incremental=All -stdout | ./john --format=raw-sha1-linkedin -stdin one_hash.txt 
Loaded 1 password hash (Raw SHA-1 LinkedIn [128/128 SSE2 intrinsics 8x])
guesses: 0  time: 0:00:00:14  c/s: 2647K  trying: rbigmmi - rbigm65

Using PRINCE with OCLHashcat:

Below is a sample screen shot of me using PRINCE as input for OCLHashcat on my cracking box, (it has a single HD7970 GPU). Ignore the --force option as I had just installed an updated video card driver and was too lazy to revert back to my old one that OCLHashcat supports. I was also too lazy to boot into Linux since I was using Excel for this post and my cracking box also is my main computer...

What I wanted to point out was that for a fast hash, (such as unsalted SHA1 in this case), since PRINCE is not integrated into OCLHashcat it can't push guesses fast enough to the GPU to take full advantage of the GPU's cracking potential. In this case, the GPU is only at around 50% utilization. That is a longer way of saying that while you still totally make use of OCLHashcat when using PRINCE, it may be adventurous to also run dictionary based rules on the guesses PRINCE generates. Since those dictionary rules are applied on the GPU itself you can make a lot more guesses per second to take full advantage of your cracking hardware. This is also something Atom recommends and he helpfully included two different rulesets with the PRINCE tool itself.

Side note: PRINCE plows though the LinkedIn list pretty effectively. To get the screenshot above I had to run the cracking session twice since otherwise the screen would have been filled with cracked passwords.

Big Picture Analysis:

The main question of course is how does this tool fit into a cracking session? Atom talked about how he saw PRINCE as a way to automate password cracking. The closest analogy would be John the Ripper's default behavior where it will start with Single Crack Mode, (lots of rules applied to a very targeted wordlist), move on to Wordlist mode, (basic dictionary attack), and then try Incremental mode, (smart bruteforce). Likewise with PRINCE depending on how you structure your input dictionary it can act as a standard dictionary attack, (appending/prepending digits to input words for example), combinator attack, (duh), and pure brute force attack, (trying all eight character combos). It can even do a limited passpharse attack though it gets into "Correct Horse Battery Staple" keyspace issues then. For example, with the input dictionary of:
It will generate all four word combinations such as:


When talking about passpharse attacks then, keep in mind it doesn't have any advanced logic so you are really doing a full keyspace attack of all the possible combinations of words.

The big question then is how does it compare against other attack modes when cracking passwords? You know what this means? Experiments and graphs!

I decided I would base my first couple of comparisons using the demos Atom had listed in his slides as a starting point. I figure no-one would know how to use PRINCE better than he would. Note: these are super short runs. While I could explain that away by saying this simulates targeting a slow hash like bcrypt, the reality is Atom made some noticeable changes in PRINCE while I was writing this post, (yay slow update schedule). I figured it would be good to make some quick runs with the newer version to get a general idea of how PRINCE performs and then post a more realistic length run at a later time. Also, this way I can get feedback on my experiment design so I don't waste time running a longer cracking session on a flawed approach.

Experiment 1) PRINCE, Hashcat Markov mode, and JtR Incremental mode targeting the MySpace list

Experiment Setup:
The input dictionary for PRINCE was the top 100k most popular passwords from the RockYou list, as this is what Atom used. For Hashcat I generated a stats file on the full RockYou list and used a limit of 16. For JtR I ran the  default Incremental mode using the "All" character set. The target list was the old MySpace list. The reason why I picked that vs the Stratfor dataset which Atom used was simply because there are a ton of computer generated passwords, (aka default passwords assigned to users), in the Startfor dataset so it can be a bit misleading when used to test against.

Cracking Length: 1 billion guesses

Commands used:
laki$ ../../../Tools/princeprocessor-0.12/ < ../../../dictionaries/Rockyou_top_100k.txt | python -t ../../../Passwords/myspace.txt -m 1000000000

laki$ ../../../John/john-1.7.9-jumbo-7/run/john -incremental=All -stdout | python -t ../../../Passwords/myspace.txt -m 1000000000

laki$ ../../../hashcat/statsprocessor-0.10/ --threshold=16 ../../../hashcat/statsprocessor-0.10/hashcat.hcstat | python -t ../../../Passwords/myspace.txt -m 1000000000

Experiment Results:

Click on the graph for a zoomed in picture. As you can see, Prince did really well starting out but then quickly became less effective. This is because it used most, (if not all), of the most common words in the RockYou list first so it acted like a normal dictionary attack. At the same time, Incremental Mode was starting to catch up by the end of the run. While I could continue to run this test over a longer cracking session, this actually brings up the next two experiments....

Experiment 2) PRINCE and Dictionary Attacks targeting the MySpace list

Experiment Setup:
This is the same as the previous test targeting the MySpace dataset, but this time using dictionary attacks. For JtR, I stuck with the default ruleset and the more advanced "Single" ruleset. I also ran a test using Hashcat and the ruleset Atom included along with PRINCE, (prince_generated.rule). For all the dictionary attacks, I used the RockYou top 100k dictionary to keep them comparable to the PRINCE attack.

Cracking Length: I gave each session up to 1 billion guesses, but the two JtR attacks were so short that I only displayed the first 100 million guesses on the graph so they wouldn't blend in with the Y-axis. The hashcat attack used a little over 700 million guesses which I annotated its final results on the graph. Side note, (and this merits another blog post), but Hashcat performs its cracking sessions using word order, vs JtR's rule order. I suspect this is to make hashcat faster when cracking passwords using GPUs. You can read about the difference in those two modes in one of my very first blog posts back in the day. What this means is that Hashcat's cracking sessions tend to be much less front loaded unless you take the time to run multiple cracking sessions using smaller mangling rulesets.

Commands used:
laki$ ../../../Tools/princeprocessor-0.12/ < ../../../dictionaries/Rockyou_top_100k.txt | python -t ../../../Passwords/myspace.txt -m 1000000000

laki$ ../../../John/john-1.7.9-jumbo-7/run/john -wordlist=../../../dictionaries/Rockyou_top_100k.txt -rules=wordlist -stdout | python -t ../../../Passwords/myspace.txt -m 1000000000

laki$ ../../../John/john-1.7.9-jumbo-7/run/john -wordlist=../../../dictionaries/Rockyou_top_100k.txt -rules=single -stdout | python -t ../../../Passwords/myspace.txt -m 1000000000

laki$ ../../../hashcat/hashcat-0.48/ --stdout -a 0 -r ../../../Tools/princeprocessor-0.12/prince_generated.rule ../../../dictionaries/Rockyou_top_100k.txt | python -t ../../../Passwords/myspace.txt -m 1000000000

Experiment Results:

As you can see, all of the dictionary attacks performed drastically better than the PRINCE over the length of their cracking sessions. That's to be expected since their rulesets were crafted by hand while PRINCE generates its rules automatically on the fly. I'd also like to point out that once the normal dictionary attacks are done, PRINCE keeps on running. That's another way of saying that PRINCE still has a role to play in a password cracking session even if standard dictionary attacks initially outperform it. All this test points out is if you are going to be running a shorter cracking session you would be much better off running a normal dictionary based attack instead of PRINCE. This does lead to my next question and test though. After you run a normal dictionary attack, how does PRINCE do in comparison to a Markov brute force based attack?

Experiment 3) PRINCE and JtR Wordlist + Incremental mode targeting the MySpace list

Experiment Setup:
Based on feedback from Atom I decided to restructure this next test. First of all, Atom recommended using the full Rockyou list as an input dictionary for PRINCE. Since that is a larger input dictionary than just the first 100k most frequent passwords, I re-ran JtR's single mode ruleset against the MySpace list using the full Rockyou dictionary as well. I also used the most recent version of JtR, 1.8-jumbo1 based on the recommendation of SolarDesigner. This cracked a total of 23,865 passwords from the MySpace list, (slightly more than 64%). I then ran PRINCE, (the newer version 0.13) with the full RockYou dictionary, (ordered), and JtR Incremental=UTF8, (equivalent to "ALL" in the older version of JtR), against the remaining uncracked passwords. I also increased the cracking time to 10 billion guesses.

Side note: I ran a third test PRINCE using the RockYou top 100k input dictionary as well since the newer results were very surprising. I'll talk about that in a bit...

Cracking Length: 10 billion guesses

Commands used:
laki$ ../../../John/john-1.8.0-jumbo-1/run/john -wordlist= ../../../dictionaries/Rockyou_full_ordered.txt -rules=single -stdout | python -t ../../../Passwords/myspace.txt -u uncracked_myspace.txt

laki$ ../../../Tools/princeprocessor-0.13/ < ../../../dictionaries/Rockyou_full_ordered.txt | python -t ../../../Passwords/uncracked_myspace.txt -m 10000000000 -c 23865

laki$ ../../../Tools/princeprocessor-0.13/ < ../../../dictionaries/Rockyou_top_100k.txt | python -t ../../../Passwords/uncracked_myspace.txt -m 10000000000 -c 23865

laki$ ../../../John/john-1.8.0-jumbo-1/run/john -incremental=UTF8 -stdout | python -t ../../../Passwords/uncracked_myspace.txt -m 10000000000 -c 23865

Experiment Results:

I'll guiltily admit before running this test I hadn't been that impressed with PRICE. That's because I had been running it with the top 100k RockYou dictionary. As you can see, with the smaller dictionary it performed horribly. When I ran the new test with the full RockYou dictionary though, PRINCE did significantly better than an Incremental brute force attack. Yes, cracking 1.5% more of the total set might not seem like much, but it will take Incremental mode a *long* time to catch up to that. Long story short though, PRINCE's effectiveness is extremly dependend on the input dictionary you use for it.

Like most surprising test results, this opens up more questions then it solves. For example, what exactly is going on with PRINCE to make it so much more effective with the new dictionary. My current hypothesis is that it is emulating a longer dictionary attack, but I need to run some more tests to figure out if that's the case or not. Regardless, these results show that PRINCE appears to be a very useful tool to have in your toolbox if you use the right input dictionary for it.

Current Open Questions:

  1. What is the optimal input dictionary to use for PRINCE? Yes the full RockYou input dictionary does well but my gut feeling is we can do better. That leads me to the next open question...
  2. Can we make PRINCE smarter? Right now it transitions between dictionary attacks and brute force automatically, but beyond sorting the chains by keyspace it doesn't have much advanced logic in it. Perhaps if we can better understand what makes it effective we can make a better algorithm that is even more effective than PRINCE.

Other References: