Sunday, August 14, 2016

Evaluating the Value of the (@)Purge Rule

“Only sometimes when we pick and choose among the rules we discover later that we have set aside something precious in the process.”  
― Helen Simonson, Major Pettigrew's Last Stand

Background and Problem Statement:

I was recently asked the following question: "Is there any value in supporting the character purge rule in Hashcat?" The purge rule '@x' will remove all characters of a specific type from a password guess. So for example the rule '@s' would turn 'password' into 'paword'. The full thread can be found on the Hashcat forum here. The reason behind this inquiry was that while the old version of Hashcat implemented the character purge rule, GPU versions of Hashcat and Hashcat 3.0 dropped support for it. Since then, At0m added support for the rule back in the newest build of Hashcat which makes this question much less pressing. That being said, similar questions pop up all the time and I felt it was worth looking into if only to talk about the process of investigating problems like this.

Side note, as evidence that any change will break someone's workflow, when researching this topic I did find one user who stored passphrase dictionaries with spaces left intact. They would then use the purge rule to remove the spaces during a cracking session so that way they wouldn't have to save a second copy of their passphrase wordlist without spaces. For that reason alone I think there is some value in the purge rule

The Purge Rule Explained:

Hashcat Rule Syntax: @X where (X) is the character you want to purge from the password guess
Example Rule: @s
Example Input: password
Example Output: paword


Hypothesis:

My gut feeling is that the purge rule will have limited impact on a cracking session. I base that on a rule of thumb that mangling rules work best if they mimic the thought process people use when creating passwords. For example, people often start with a base word and then append digits to it, replace letters with L33t replacements, etc. Therefore rules that mimic these behaviors tend to be more successful. I just don't see many people removing character classes from their password.

Now if you are a Linux fan, you'll realize Linux developers *love* removing characters from commands. Do you want to change your password? Well "passwd" is the command for you! Maybe Linux developers use the same strategy for their passwords? So I certainly could be wrong. That being said, the whole idea of a hypothesis is to go out on a limb and make a prediction on how an existing model will react so here I go:

My hypothesis is that the purge rule will crack less than 1 thousand passwords of a 1 million password dataset, (0.1%). Of those passwords cracked, a vast majority (95%), will be cracked due to weaknesses of the input dictionary vs. modeling how the user created the password. For example, 'paword' might be a new Pokemon type that didn't show up in the input dictionary vs being created by a user taking the word 'password' and then removing the S's.

Short Summary of Results:

The purge ruleset cracked 164 passwords (0.016% of the test set). This was slightly better then just using random rules which in a test run cracked 23 password, but not by much. Supporting this rule is unlikely to help in any noticeable degree with your cracking sessions.

Experimental Setup:

Test Dataset: 1 million passwords from the newest MySpace leak. These were randomly selected from the full set using the 'gshuf -n 1000000' command.

Reason: Truth be told, the main reason I used the MySpace passwords was I'm getting tired of using the RockYou dataset for everything. That being said, it's useful for this experiment that all of the passwords in that dataset have been converted to lowercase since I don't have to worry about combining case mangling rules with the purge rules.

Tools Used: Hashcat for the cracking, and John the Ripper for the --status option

Rulesets Used: Hashcat's D3ad0ne manging rules. I broke it up into two different rulesets with one containing the purge rules, (along with a few append/prepend '@' rules that snuck in), and the other one containing all the other mangling rules.

Reason: D3ad0ne's mangling rules contains about 34 thousand individual mangling rules. Due to its size and the fact that it is included with Hashcat it should make a good example of a ruleset that many Hashcat users are likely to incorporate in their cracking sessions. I initially split the base ruleset into two different subsets, with all rules including the '@' into one ruleset called d3ad0ne_purge, and all the other rules into another one called d3ad0ne_base. I then started manually going through d3ad0ne_purge and placing rules such as "append a @" into the d3ad0ne_base, but with over 1k rules in d3ad0ne_purge I quickly decided to remove the results of the append/prepend '@' after the fact instead of trying to fully isolate only purge rules in their own ruleset.


Dictinary Used: I used dic-0294 as my wordlist. Yes there are better input dictionaries out there, but this is a common one and strikes a good balance between size and coverage, plus it is public vs other dictionaries I have that are based on cracked passwords


Experimental Results:

Step 1) Run a normal cracking session on the 1 million myspace passwords using dic-0294 and D3ad0ne_base. This is important since the purge rule will likely crack many passwords that would be cracked normally with other rules. Running a normal cracking session first remove those passwords so we can focus on password that would only be cracked by the purge rules. The command I ran was below, (note, I'm editing some of the path information out of the commands for clarity sake).
./hashcat -D1 -m 100 -a 0 --remove myspace_rand_1m_hc.txt -r rules/d3ad0ne_base.rule dic-0294.txt
A couple of notes about the above rule. I'm using a version of Hashcat that I updated on August 10th 2016. I ran it on a very old MacBook Pro so the -D1 is telling it to use CPU only, (since the GPU doesn't have enough memory). The -m 100 is telling it to crack unsalted SHA-1 hashes. The -a 0  is to do a basic dictionary attack. --remove was to remove any cracked hashes so they aren't counted twice in future cracking sessions. myspace_rand_1m_hc.txt is my target set, rules/d3ad0ne_base.rule is my ruleset, and dic-0294.txt is my input dictionary. Below are the results of running this first attack.



With 36% of the passwords cracked by a very vanilla attack on a slow computer, that isn't bad. Next up is running the purge rules.

Step 2) Delete the previous hashcat.pot file. Run a cracking session on the remaining passwords using the purge ruleset. The command I ran was very similar to the one above:
./hashcat -D1 -m 100 -a 0 myspace_rand_1m_hc.txt -r rules/d3ad0ne_purge.rule dic-0294.txt
Note, I took off the --remove option since I didn't care about removing cracked hashes for this. I also deleted the previous .pot file of cracked passwords since I only wanted to store passwords associated with this test. Here is a screenshot I took partway through the cracking session:



As you can see. many of the cracked passwords were due to "insert a @ symbol" vs. using the purge rule. Here are the final results:



The session managed to crack 405 unique hashes. I then went into the pot file and deleted any password containing the '@' character so what was left was due to the purge rule.  This left me a list containing 128 unique passwords. A screenshot is shown below:



Now it's hard to tell what people were thinking when they created these passwords, but glancing through the list, it certainly appeared that most of the cracked passwords were simply due to limitations in my input dictionary vs users purging characters from their passwords. I was actually surprised 'jayden' and 'fatguy' weren't in dic-0294 but after double checking it they were in fact missing from it.

Now, input dictionaries are always going to be limited to a certain extent so these cracks absolutely count. They only represent uniq cracked hashes though. For example, if 20 people used the password 'imabear' it would only be counted once. To figure out how many total accounts would have been cracked, I re-ran the above dictionary through John the Ripper against the myspace_1m_rand list. This was to get the files into John's cracked file (pot) format. For example here is 'imabear' in john.pot:

{SHA}QiPoQuc4sqqs3J+OulWLt3H09kY=:imabear

The reason I did this was because JtR has a really cool feature '-show' that will match up cracked passwords with the accounts in the target set. Running the command:

./john -format=raw-sha1 -show myspace_rand_1m_clean.txt

resulted in the following output:


Therefore the purge rules cracked a total of 164 passwords from the test set, or 0.0164% of the total. That's a really small amount. Admittedly every password cracked is nice, but still I was curious if the purge rules were better then just running random mangling rules instead. Luckily, Hashcat supports a command to test that out:
./hashcat -D1 -m 100 -a 0 myspace_rand_1m_hc.txt -g 500 dic-0294.txt
The only difference with the above command and the previous Hashcat commands I ran was that instead of a rules file I specified '-g 500'. What that does is tell Hashcat to generate 500 random rules to run on the input dictionary. I choose that number since there were over a thousand rules in my D3ad0ne_purge dictionary and I guestimated that about half of them were actual purge rules. When I ran the above I ended up cracking 23 more passwords. That's significantly less then the 164 the purge rules did but in the grand scheme of things it was about the same in effectiveness. Considering some of those rules were likely duplicates of rules in D3ad0ne_base ruleset as well I'd argue that running a purge rule is about equivalent of running a random mangling rule. In fact if you don't already have purge rules in your mangling set, I'd probably recommend not worrying about it and just running a brute force method like Markov mode to stretch your dictionary instead.

Conclusion:

For once my gut feeling was right and the value of Hashcat's purge rule '@' was limited in the tests that were run. That's not to say that it's not useful. It may help when targeting certain users or aid in keeping the size of your dictionary files on disk manageable. But at the same time, it's not a major feature that other password crackers should rush to mimic. I hope this blog post was informative in helping show different ways to evaluate the effectiveness of a mangling technique. If you have any questions, comments or suggestions please feel free to leave them in the comments section.