Tuesday, September 11, 2018

Configuring a Password Cracking Computer

  • “Be willing to be a beginner every single morning.” —Meister Eckhart
Disclaimer: While the reason I'm writing this is because I was lucky enough to win a new cracking rig from Netmux's Hash Crack Challenge, I want to state for the record that he never asked me to blog about it, and all of the good things I say are 100% of my own choosing and not contingent on me receiving any prize.

2nd Disclaimer: I plan on this being a "living" blog entry as I continue to update and use my new computer. Since install procedures change over time, for the record I started to perform my install on September 7th 2018. I'll try to date my entries as I write them to help anyone trying to follow this so they can estimate how useful these instructions are.

ChangeLog:
  • September 12, 2018, (rearranged sections, added MDXFind, updated installing OpenCL instructions)

September 7, 2018 (Computer Arrives):

Wow, I suddenly and unexpectedly found myself in possession of a dedicated password cracking machine! For more background how that happened, please refer to my post on Netmux's Hash Cracking Challenge here. For the record, Netmux was amazing when it came to promptly shipping my portable cracking rig and keeping me in the loop. I'll admit I was a bit hesitant to hand out my home address to professional pen-tester and password cracker I met on the internet, but I've made a lot worse threat modeling decisions in the past, (There is a story behind the first picture that gets everyone who knows and cares about me legit angry for the stupid trust I've put in absolute strangers before). Long story short, Netmux was professional in shipping the server, kept me in the loop, and when it showed up I was super excited! As some background, while I study password cracking, develop and analyze password cracking tools, and participate in password cracking challenges, I've never been willing to personally invest in a dedicated password cracking rig. Mostly I've made do with a 2010 MacBook Pro, and a Windows machine with a GTX970 that I'll freely admit spends more time running Excel and playing World of Warcraft than cracking hashes. Which is another way of saying please take all my advice with a grain of salt, and the understanding that I'm planning on using this new server for research. I'm not optimizing it as a pure password cracking rig. But also this is a way of saying that I no longer have any excuses in how much I contribute in password cracking challenges in the future! This gift has inspired me to start a few new research projects so I want to give yet another huge thanks to Netmux!!! If you see me post additional blog content in the next few months or update my PCFG cracker, please give credit to him!

A Quick Aside on my New Password Cracking Rig:

Let me first say that it arrived in perfect shape so of course the first thing I did was crack it open and look at the inside...
My new rig from Netmux
Super excited!!!
The wiring was very well done, the whole rig is water cooled, the case certainly adds hacker creds, and little things were taken care of such as having good filters over the air vents which is pretty much a make or break requirement for this cat owner. I'm *very* happy with it, and would recommend it to someone else.

As far as the specs go:
CPU: Intel i5-7600k, 1 processor; 4 cores
RAM: 16GB
Storage: 500GB SSD
GPU: GeForce GTX 1070

Installing the OS:

Netmux's cracking rig came pre-installed with Ubuntu, but I figured I might as well re-install everything from scratch. After consulting with several password cracking experts I'm lucky to know, my end decision was to re-install Ubuntu. The version I used was 18.04.1 LTS. I plan on using this server for research as well so I went with a full graphical desktop. If you are hardcore and want 100% of your machine devoted to cracking then by all means go with a server deployment, but this guide probably won't help you to much since I *love* GUIs. Spoiler alert, I recommend installing a GUI git client like GitKracken, so that's where this guide is taking you.

Building the Boot USB (September 7, 2018):
Like anyone has a DVD anymore... The very first step I took was to create a bootable USB.

Steps:
  1. You can download an Ubunto ISO from here
  2. Since I already was running Ubuntu, I could use Startup Disk Creator  to create a bootable USB drive. You can perform a search, (use the Windows key), for that application if you are running Ubuntu already.
  3. Follow the options to create a bootable USB using the ISO that you previously downloaded
Installing Ubuntu fro USB (September 7, 2018):
  1. Use multiple swear words and reboot several times until you find the BIOS option to change your boot preference to start with your USB drive. In my case it was hitting F2.
  2. Once you boot from the USB, follow the steps in the Ubuntu installer and configure it how you want.
  3. If you are going to configure full hard drive encryption, (this will be a real portable rig that will potentially be unattended in your car when you make a restroom stop, or you are worried about legal issues), this is the time to configure full hard drive encryption. Just saying.

Core OS Drivers and Important Tools for Other Capabilities:

Installing OpenCL drivers (Originally installed September 7, 2018, updated September 12):
Special thanks to WinXP5421. The following section was written by him, though I tested it on my system and made minor edits based on my experiences and formatting it for this blog
  1. Download the appropriate Opencl Drivers for your system. We are specifically looking for “Intel® Xeon™ Processors OR Intel® Core™ Processors OpenCL runtime” drivers.  
  2. Extract the archive:
    • tar -xvzf opencl_runtime*.tgz
  3. The opencl runtime requires `lsb-core` to be installed on the ubuntu machine:
    • sudo apt install lsb-core
  4. Now install the drivers:
    • Go to the intel directory that you extracted in step #2
    • sudo ./install.sh
    • Work your way through the installer answering questions as needed. The install script will complain that your Ubuntu operating system is not supported this is fine continue with the installation anyway.
  5. Let’s verify we have a working Opencl environment by installing and running `clinfo`
    • Note: clinfo was already installed on my machine, but one of the other tools I installed later may have installed it -- Matt
    • sudo apt install clinfo
    • clinfo
    • The output of clinfo should display detailed information about each CPU core you have on your system. Simply put “Lots of output = all good” If OpenCL did not install properly you will see short and specific errors after running clinfo. 
Installing NVidia Drivers (September 7, 2018):
  1. Run: ubuntu-drivers devices
  2. Select the driver from the list you want to install. In my case it was: 
    1. sudo apt-get install nvidia-driver-396
Install basic GIT (September 7, 2018):
I usually only use a command line git when something goes horribly wrong, but having it ready helps a lot when that happens.
  1. Sudo apt-get install git
Install a GUI GIT Client (September 7, 2018):
I've used a lot of git GUIs in the past. The following is purely personal preference, but I would highly recommend using a graphical git GUI if you are doing any development. Having the ability to easily view changes, manage merge requests, fork, etc, I've found to be invaluable in all my work.

My favorite git GUI of all time has been the official github client from several years ago. Unfortunately since then they re-based everything in a web layout, it completely broke my workflow. I've tried to use Atlassian's SourceTree, but after a few horribly failed merges was told to never use it again by several co-workers. I currently use GitKracken, and am very happy with it. GitKracken is not free for commercial use. I've been told to use SmartGit by several people but don't have experience with it. If you are using this tutorial for commercial use and don't have funding to pay for GitKracken please check it out. Otherwise, I've found GitKracken to be great for non-profit and personal use.
  1. Install GitKracken from https://www.gitkraken.com/
  2. Run the following command or gitkracken will never actually start: sudo apt install libgnome-keyring0
  3. Once GitKracken is installed, log in to your github account using it
  4. Now add your computer's SSH key to your github account using: File->Preferences->Authentication->Github.com->Add_SSH_Public_Key
Installing Password Cracking Programs:

Install Hashcat (September 7, 2018):
Yes there are pre-built binaries for Hashcat, but I highly recommend using the github based source code to stay up to date with all the latest changes, fixes, and features.
  1. Install Hashcat using your git tool of choice. If you are using GitKracken, import the following repo: git@github.com:hashcat/hashcat.git
  2. Full instuctions for installing Hashcat can be found at: https://github.com/hashcat/hashcat/blob/master/BUILD.md
  3. You'll need to update the OpenCL Header submodule. This can be done in GitKracken by importing Hashcat using the above link and then in gitkracken "viewing Left Hand Side" at SubModules, right clicking on the deps/OpenCl-Headers, and selecting "Create" or "Update", If you are not using GitKracken, follow the instructions listed in step #2
  4. In a terminal, select "make", and then "make install"
  5. By building from source, you can periodically pull from the Hashcat repository and re-build it to add new features before an "official" release is published
Benchmarking Hashcat With New Install, (and gratuitous plug for NetMux's Hashcracking Manual which is awesome)
Install John the Ripper (September 7, 2018):
John the Ripper is my favorite password cracking program. If you are doing any sort of academic research or tool development, I can't suggest it enough. I'll admit though that if I'm only concerned with cracking standard hashes I generally use Hashcat instead. Regardless, I'd recommend installing John the Ripper on any password cracking rig you configure. Furthermore, you really need to install the magnum-ripper bleeding edge version of John the Ripper since the base version hasn't been updated in years. New patches, fixes, and features are normally pushed weekly, so building it from source, and constantly re-building it is highly recommended.
  1. Install the following branch of John the Ripper: https://github.com/magnumripper/JohnTheRipper./
  2. Install SSL libraries: sudo apt-get install libssl-dev
  3. cd ./JohnTheRipper/src/
  4. ./configure
  5. Note: The following does not have OpenCL support. I'll try to circle back to this later to figure out how to add it.
  6. make -s clean && make -sj4
  7. cd ../run/
  8. ./john --test
Install MDXFind (September 12th 2018):
I've been told I really need to start using MDXFind so since I'm starting a new cracking platform this is certainly the right time to install it. 

A quick aside, most people might question why I need three different password cracking programs on the same computer. I'm sure it's a lot like how chefs view their kitchen knife collection. Yes they all cut, but the right one depends on what you are trying to do.

While certainly not set in stone, as a general rule of thumb I use John the Ripper for research, CPU cracking sessions, cracking file encryption "hashes", and a few other hash types that don't translate well to GPU like SCrypt/BCrypt. It also has the best support for non-English data-sets.

I use Hashcat for most GPU cracking that I do. Yes, John the Ripper GPU support has been getting more robust, but I've had better luck with Hashcat. For example, I'm cracking large lists of unsalted MD5, Hashcat is my go-to cracking program.

MDXFind seems tailored to cracking large "messy" data-sets. Think of a lot of the major password dumps that become public. It's fast and can handle data-sets going into the millions of password hashes. It also has support for cracking nested hashes which have a way of ending up in some of these dumps. Oh, and it seems to be the password cracking tool of choice for CynoSurePrime and they know a few things...
  1. Obtain the latest copy of the source-code from https://hashes.org/mdxfind.php
    • MDXFind is only provided as a pre-compiled binary so you don't need to build it. Grab the 64bit Linux variant.
    • Download and copy the file to the directory you want to install MDXFind into
  2. Make MDXFind executable
    • chmod +x mdxfind
  3. Install required dependencies
    • sudo apt install libjudydebian1 libmhash2 librhash0
  4. Test MDXFind
    • ./mdxfind 
Other Quality of Life Installations:

Install Text Editor:
  1. I like Kate. To install it: sudo apt-get install kate
  2. You might also want to install Atom which has more features. I'm hesitant to recommend it with Microsoft buying GitHub, but it is free and has a ton of features: https://atom.io/
Change Login Background (September 7th 2018):
Not really important, but I always do this because it helps my gumption level:
  1. Find a picture you want to see when typing your login picture.
  2. sudo cp Pictures/FILENAME_OF_PCITURE_YOU_WANT_TO_USE /usr/share/backgrounds/login.jpg
  3. vim /etc/alternatives/gdm3.css
  4. Find: #lockDialogGroup  background: #2c001e url(resource:///org/gnome/shell/theme/noise-texture.png) background-repeat: repeat; }
  5. Replace it with
    #lockDialogGroup {  background: #2c001e url(file:///usr/share/backgrounds/login.jpg);
      background-repeat: no-repeat;
      background-size: cover;
      background-position: center; }

Monday, September 3, 2018

Netmux's Hash Crack Challenge Writeup

"Good luck is when opportunity meets preparation, while bad luck is when lack of preparation meets reality" -Eliyahu Goldratt
This last week I participated in Netmux's Hash Crack Challenge, and this happened:

So I figured the least I could do was make a blog posting about it along with my analysis of Netmux's One Time Grids, which the challenge was based on.

TLDR/Bottom Line(s) Up Front (BLUF): 
I was lucky enough to be checking Twitter right when Netmux posted his final hint, and that was the only reason I won. As to the security of One Time Grids, they share a lot of similarities to other password books, which can be both good or bad depending on your threat model. Compared to other physically written down password books, the One Time Grid approach pushes users to stronger passwords at the expense of usability. It is *very* secure against your typical online hacker, but shares the weakness of other password books in that it may be weak against people in physical proximity you, (such as ex-boyfriends, nosy parents, nosy children, etc). I didn't find any weaknesses that could be exploited by an online attacker. Long story short, I wouldn't recommend it due to the usability issues, but if you have fun with it, feel free to use it.

What is a One Time Grid and how does that apply to the contest?
Netmux does a better job explaining it in his blog here, but it basically is a password creation book that you can buy from Amazon, available here, that provides a bunch of One Time Grids for creating and storing passwords. The contest was an attempt to crack two different raw-SHA1 password hashes generated using a One-Time-Grid. They were:
Hash1: fe0c9f335b35c45e92d5e7d07c5933b6c4c0a522
Hash2: 120c249bc0f301ef3cba7a0fcbff463aaaded486
As to the One Time Grids themselves, they are either a 7x7 grid filled randomly with one of the following 84 characters:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890-!@#$%^&*=?[](),.;{}:+
hash crack challenge one time grid
One Time Grid used in the contest

Or a 3x26 grid filled with random words:


One Time Grid Word Grid
Example word based One Time Grid. Not used in the contest
The One Time Grid used in the challenges was composed of random letters, so this blog post will focus on that. When it comes to the security of a One Time Grid though, most of the statements I'll make will apply to both unless otherwise specified.

Netmux also suggests three different ways to turn a One Time Grid into a passwords, a "basic" random grid, a "pattern" random grid, and a "scatter" random grid. Only pattern and scatter were used in the contest, so I'll focus on them, but a "basic" grid is simply a "pattern" with no bends. Aka all walks go in a straight line. Below are examples he gave for pattern and scatter on his site. Note, these examples do not use the contest One Time Grid.


Example "Pattern" Password Creation rules



Scatter One Time Grid password creation, taken from Netmux's site

Contest Start:
The first thing that should be apparent is that without the One Time Grid that a password was based on, no attack can be run that has a hope of being successful against passwords longer than 9 characters. Even 8 characters would require significant horsepower. 84^8 = 2.4 quadrillion keyspace which is quite big, even for GPUs. This assumes that the One Time Grids are generated using a true random number generator, yada yada yada, but for the purposes of this contest, no effective attacks could be started. Which is ok, because it gave me time to prep some tools and do some research.

Side note, I'll give Netmux credit that doing a "search inside" check of his Amazon One Time Grid book didn't accidentally share any of the real grids. Not that I've abused that feature in other contexts before...

First Clue: "Pattern" & "Scatter"
Sometime around this point Netmux released his first clue: "Pattern" & "Scatter". This pretty clearly indicated that the above two methods were used to generate the password, so I started to develop some scripts to generate walks of One Time Grids in anticipation of when the actual grid would be released. I originally started out investigating if I could use a custom keyboard layout with Hashcat's kwprocessor, which generates keyboard walks, but quickly realized I would have to significantly modify it to target One Time Grids. That's because kwprocessor was set up to crack 4 row keyboards vs 7x7 grids, along with some other optimizations it made for keyboard quirkiness which is great for normal cracking, but would cause problems with what I wanted it to do. So I wrote my own script, which I posted on github and is available here. It admittedly went through several rounds of improvement throughout the contest, but here is a general overview of how it works, and the constraints I added to reduce the key-space:

  • one_time_grid_walker.py only targets the "Pattern" random grids. "Scatter" random grids need a lot more information to effectively target them. I'll dig into that more later
  • The first constraint I added to it was that all "walks" had to start and end on the edge of a grid. This was based on my reading of netmux's examples and how I expected a typical user to interpret his suggestions. Examples of "valid" and "invalid" walks can be seen below.
Valid walk of contest grid
Invalid walk of contest grid
  • The second constraint I added was a walk could not double back on itself or cross a part of itself. In the above example, a walk could no go, "8oyIyo8". This admittedly was a naive assumption on my part, but I made it once again to reduce the keyspace and based it on my reading of the examples given.
  • The third constraint that I struggled with but felt when coding up my script that I needed to make was to limit the maximum size of a walk. As the maximum length increased, the keyspace also did, which would cause problems later when running a combinator/Prince attack. Len8= 4081, Len9= 7268, Len10=12011, Len11=19131. This on its own would be trivial, but when you start combining multiple walks together, can be significant. For example, 19131^2 = 365 million. 19131^3 = 7 trillion. This admittedly was where I probably made my biggest mistake, prematurely optimizing this.
  • Skipping ahead a bit, I later optimized my approach further to limit the number of "bends" that a walk could make. If I only allowed one "bend", (or change in direction), there were only 575 possible walks for a current grid. This allowed combining many different walks practical. I felt for a typical user following the advice given, this represented what I would expect to see them do.
As far as weaponizing this goes, I was tempted to use the Prince attack, but when talking with Chick3nman, he gave the helpful advice that if you didn't need the optimizations that Prince uses, a straight combinator attack with Hashcat was much faster for easy hashes like raw-sha1.

And then I pretty much waited. Well in reality I tried some attacks against the sample One Time Grids to bide my time, but I didn't expect to crack the first hash. I was a bit cocky though, and expected that I'd crack the first hash within minutes of it being released.

Second Clue: One-Time Grid attached below
Yes! The target one time grid was finally released. I'll admit I said a few choice words that it was released as a picture though, which led to some squinting and me questioning if letters were lower or uppercase. Oh, and also one typo when entering it into my code that I nearly missed, but luckily Hops pointed it out to me. In any future contests, it would be really nice if items like this could be released as text that allowed copying/pasting.

Another challenge I ran into was that I wasn't at my cracking computer, so couldn't run any effective attacks myself. Luckily Chick3nman agreed to run my script and try to crack the first hash for me. Unfortunately he wasn't successful. I want to stress that was my fault since he was running my scripts and attacks.

There was a lot of head scratching, and variations of walks plus the suggested PIN and random word, but long story short, even when I got back to my computer and ran attacks myself, I was completely ineffective at cracking that first hash. I'll admit it really annoyed me in a good way like any fun problem does. I want to give a huge shout out to Boursier Etienne, who actually managed to crack it first. I'd love to hear what Boursier did.

Third Clue: Birthday Paradox
I may have uttered a few more choice words over this clue. I'm well versed in the birthday problem, but that doesn't seem to be applicable to One Time Grids. Yes some individual characters appear more often than others, but the heart of the "scatter" problem is a "Choose X with no replacement" problem. Aka, the first character has 49 different options. The second character has 48 different options. The third character has 47 different options. And so on. This is not related with generating collisions between multiple inputs as far as I can see.

Fourth Clue: Are all cell values equally probable?
I see where Netmux was going with this. For a scatter password, if you were modeling it, cells 3/26, 6/25, and 7/23 all contained periods ".". If you selected any of them when generating a password guess, it didn't matter which order you picked them which can reduce the effective keyspace. The problem comes when trying to weaponize this info. I did some back of the napkin calculations and if your guess generator took into account the "choose and no replacement" aspects along with the "several characters show up several times", you could reduce the keyspace by roughly a factor of 10 for the password lengths I thought the password might be. This sounds great, but one problem I've run into many times before, is that more effective guess generators take time to generate guesses. So while a script that I coded might reduce the keyspace by 10x, it would probably take 100x more time to generate a guess against a raw-sha1 hash then just using a custom mask. Therefore trying to optimize my solution would actually make it worse.

Now admittedly someone could take the time to create a custom solution in Hashcat or John the Ripper that would be fast, but that wasn't going to happen in the time this contest ran. More importantly though, for a 10 character password generated by a "scatter" method, it didn't matter. The keyspace was so large that even a 10x speedup wouldn't be enough to make it practical.

Fifth Clue: str(PIN)[:-1]
This hint was a good clue that the PIN, minus the last character of the PIN, was part of one or both of the passwords. Aka "71997" could be found in the password. This was good info to have when trying to crack the password, but I'll admit I was a little annoyed since guidance to apply mangling rules like this wasn't in the instructions for using One Time Grids. By that I mean, it's totally within the bounds of someone doing this in real life. In fact, I'd recommend it, as it explodes the keyspace of One Time Grids. But based on the instructions I wouldn't expect a typical user of One Time Grids to do mangling rule like "remove the last character of the PIN". Now, most of my password cracking techniques are based on targeting "typical users". If everyone was unique I'd be the worst password cracker out there. But people typically follow standard behavior patterns which makes password cracking possible. I'm biased, but I like to see that reflected in contests. Needless to say though, this wasn't enough information to crack either one of the two password hashes.

Sixth Clue: scatter_cells + str(PIN)[:-1]
This clue said that the PIN-1 would be at the end of the scatter cells password, which was helpful without being useful. They keyspace for likely scatter cells passwords was so large that knowing any additional mangling didn't make a difference.

Seventh Clue: Use seven of the possible ten "repeats" to mask your way to the other half of the scatter_cells solution.
This provided a lot of useful information without being actionable. It said the "scatter" portion of the password was 14 characters long, with 7 of those characters being a repeat item, and the other 7 being unique characters. This meant 7 characters had 10 possible values, and the other 7 had 29 possible values. What's more, the second set was a pure chose with no replacement, so the 7th character would technically only have 22 possible options. The problem once again was making use of this information. For example, I didn't know which positions would take from either set. So for a 14 character password, that increases the keysize by 2^14 = 16,384, which is a problem because the current mask setups for JtR and Hascat don't support that kind of selection. In retrospect, I realized I could have created a script to generate all 16k masks and feed them into Hashcat, but during the contest that didn't occur to me. Long story short, this was the point where if given six months it's possible someone could have cracked the second hash, but it was unrealistic to do it in a day or two.

Eighth Clue: Hash #2 = print(len(scatter_cells + str(PIN)[:-1])) = 19
While this made explicit that there were no other mangling rules or surprises for the second password hash, it didn't make the problem more crackable compared to the previous clue.

Ninth Clue: No cell values have been reused in the composition of scatter_cells.
“q$*????????)wc” + str(PIN)[:-1]
This is where I got really lucky. I managed to check Twitter at the exact right time and saw the following tweet by Netmux:

Therefore I was at my computer and ready to go for the final hint. When he posted it, I quickly created the following mask attack using hashcat:

hashcat64.exe -m100 -O -a 3 ..\contests\netmux\netmux.hsh -1 IA9GV8oyILM.!03WKH+epP{TxJz3hbu\? q$*?1?1?1?1?1?1?1?1)wc71997

By Netmux giving me 6 of the scatter characters used I only had to bruteforce a 8 character password, and there were only 32 possible characters per posision, making this significantly easier than a Lanman password hash. All told, it took me around 5 minutes to crack the password hash, which admittedly was a heart pounding five minutes since I was sure other people were running the same attack as I was. I was sweating the whole time and my adrenaline was pumping. As proof of the timing to run the attack, here is me re-running the cracking attack on my system. It took 9 minutes to exhaust the whole keyspace, but I got my crack around five minutes in.

Cracking the 2nd Hash. Path information and the actual hash plaintext redacted.
For comparison, I have a single NVidea GTX 970 in my computer. Not even a Ti. Really what it comes down to was that I was very lucky, to the point where I feel a little bit guilty about it. In the future I'd advise contest creators to publish set times when they will release hints so that way everyone is on an even field when it comes to making use of this information.

Conclusion:
First of all, I'd like to give thanks to Netmux for putting on this competition. I had a lot of fun and I hope this blog post points that out. There's many "contests" out there but putting my time into this was way more enjoyable than dealing with the drama of hacking Bitfi. Also dealing with a new type of bounded problem like One Time Grids was very interesting.

I'd also like to thank Chick3nman, Hops, and Royce Williams, for lending cracking hardware, giving advice, and all the heckling ;p

As to the security of One Time Grids, let me back up a bit.

When doing any threat analysis or security review my first step is to categorize the adversary. A good rule of thumb brought up by James Mickens is the "Massad vs. not-Massad" categorization. I highly recommend following that link because the write-up is hilarious, but it boils down to if you are worried about the Massad, well there's nothing you can do because you are going to f***ing die. But if your adversary is someone else, there's effective strategies you can take to protect yourself. Now admittedly there's variations of this, but basically if you are worried about nation level attackers, then don't use One Time Grids. If you are worried about typical hackers though, One Time Grids can be extremely effective. I'll freely admit that I'm not the best password cracker out there, but the fact remains that if Netmux hadn't given me the One Time Grid, along with 11 characters of an 19 character password, I'd never have cracked it. Also One Time Grids are such a niche technique that even after this contest I don't see myself incorporating the lessons learned into any of my normal cracking strategies.

There's two major problems I see with One Time Grids though. The first is they don't produce memorable passwords. If you don't want to write the passwords down, you'll need to take your book with you, which is a pain. And if you do write your passwords down, I'd recommend using a traditional password manager instead. Most of which have built in random password generation tools which are just as effective as One Time Grids for creating strong passwords.

The second problem is that One Time Grids share the same issue as many other password "books". They have the potential for horrible failure if your adversary is someone you know and/or love who has access to it directly. Ex-boyfriends/girlfriends/husbands/wives are the big ones, but nosy children or parents also pop up. I'm always very sensitive to this threat vector since while dealing with an abusive ex is bad, dealing with an abusive ex who has access to your e-mail and facebook is way worse. Password management programs can help in this regards, but written down books are problematic. Yes, someone could avoid writing down their "patterns" for One Time Grids, but that doesn't scale as having unique passwords for sites is more important than strong passwords in my opinion. You have no idea how sites are storing their passwords, so the best way to minimize your risk of a site storing your password in plaintext is to use different passwords for different sites.

I guess what I'm trying to say is I'm a big believer in hike your own hike. If you enjoy using One Time Grids, I haven't seen anything to caution against it. You are probably way more secure than most people who don't do anything special. While I'm biased to suggest standard password management programs like 1password, I'll readily admit that programs like 1password have usability problems too. If you really want to have a physical password book, free options include diceware, but if you like the idea of One Time Grids, quite simply, I'm not going to crack those passwords without a whole lot of help.

Bonus Snark

 While doing research on One Time Grids, I came across the following on Amazon and my first thought was, "I bet whoever owned that copy previously was *really* important!!!" /jk

Only $4.67 for shipping though...




Sunday, March 11, 2018

Creating Long Term SSL Certificates

"It's constantly fascinating for me that something that feels absolutely right one year, 12 months later feels like the wrong thing to do." --Damian Lewis
Often I find myself having to create my own SSL certificates. Be it an internal web-server, or two scripts that need to communicate to each other, SSL is the easiest way to encrypt network traffic. Unfortunately it's also one of the most dangerous encryption methods. If you make a mistake setting it up it usually works ... at least for a little while.

Ignoring the client SSL checks for now, (hint if your script is using SSL and it works the first time, you probably are not checking SSL correctly), one area of danger is having your SSL certificates expire. As an example of that, recently every Oculus Rift broke because a code signing certificate expired. Admittedly this was a different type of certificate, but the same thing tends to happen with internal SSL deployments. People do not remember to update them, and when they expire things tend to break, (at least if your clients are checking SSL properly). The problem is when you use the standard OpenSSL libraries to create your certificates, there's three places that you need to specify certificate lifetimes. If you forget to specify any of the three, the certificate will be valid only for the default which is set to be "365 days".

These lifetime checks are:
  1. The Certificate Authority has an expiration date
  2. The actual certificate you are using has an expiration date
  3. The CA signature for the certificate has an expiration date
Since most stack-overflow posts don't cover this, and Linux man pages are not helpful unless you already know what you are doing, I wanted to share my cheat sheet for creating long term, (valid for one thousand years), SSL Certificate Authorities and signing certs. This script was born from many previous failed efforts, and to be honest I'm still not sure I have it perfectly right. If you notice any improvements that could be made, please let me know! 

Requirements/Comments:
  • These instructions were written for CentOS. It should work for most other Linux flavors without any changes. If you are using Windows, good luck!
  • OpenSSL
  • Whenever you see 365000 in the command that's the expiration date. I'm using 365*1000 as shorthand for one thousand years. Yes I realize that isn't exactly accurate. Feel free to change this to the time period you want to use.

Creating the Certificate Authority: (If you already have a CA ignore this, but you might want to check the valid lifetime for that CA)
  • Generate the key for the CA using 4096 RSA. Note the key will be cakey.pem so protect that!
openssl req -new -newkey rsa:4096 -nodes -out ca.csr -keyout cakey.pem
  • Create the CA's public certificate which will be called "cacert.pem". Note the '-days' field:
openssl x509 -trustout -signkey cakey.pem -days 365000 -req -in ca.csr -out cacert.pem
Important: When you run the previous command, you'll be set a list of questions. Note, for many SSL deployments you *must* have the Country, City, State, and Organization match between your CA and the certificates you are signing. Does this make sense? Of course not! The domain can be pretty important as well depending on what you are doing.
  • Next you need to copy the CA info and create the required files into where OpenSSL expects them. Yes if you know what you are doing you can override the defaults, but if not here's what to do:
    1. If the /etc/pki/CA directly does not exist, create it
    2. mv cakey.pem /etc/pki/CA/secret/cakey.pem
    3. touch /etc/pki/CA/index.txt
    4. create or edit /etc/pki/CA/serial using the text editor of your choice
    5. In this file put a list of all the serial numbers you want to assign certificates, separated by a newline. For example:
      • 01
      • 02
      • 03
      • 04
    6. It is *highly* recommended that you set permissions on the /etc/pki/CA directory so only the user you want to sign certificates has access to it.
  • Note, cacert.pem is not used for signing SSL certificates, but you'll need to push it to clients that are verifying the certificates

Creating and Signing a SSL Certificate:

  • Create the certificate private key using RSA 4096. It is named client.key in this example. Make sure you protect this!
 openssl genrsa -out client.key 4096
  • Create the certificate request. Note the "days" field.
openssl req -new -key client.key -out client.csr -days 365000  
Important: Remember for the questions it asks you, the Country, City, State, and Organization *must* match between your CA and the certificates you are signing. In addition, the domain can be pretty important depending on if you are checking that with your client or not
  •  Create the actual client certificate. Once again, note the '-days' field
openssl ca -in client.csr -out client.pem -days 365000 -notext

Resulting Files: 
  1. Public Client Certificate: client.pem
  2. Client private key: client.key (Only deploy on the server that owns this key)
  3. Public CA certificate: cacert.pem
  4. Private CA key: cakey.pem (Protect this one!!)

Wednesday, August 16, 2017

Solving Problems with Unknown Constraints

"Software constraints are only confining if you use them for what they're intended to be used for" 
-- David Byrne (Of the Talking Heads)
I recently had an ongoing conversation that spanned several days about the subject of solving mazes. A friend casually mentioned the "Same Wall Rule", (also known as the "Right Hand Rule"), for solving a maze. This is where if you want to find the exit of a maze you should pick a wall and follow it, with the assumption that you will eventually find the exit this way.
Same Wall Rule for Solving a Maze

I pointed out that while this rule generally works, you can't count on it as it can fail spectacularly. For example, what if you start out next to a free-standing wall?
Same Wall Rule Failing Horribly
After that our conversation turned to other things but the next day my friend came back and said "I found the problem! The Same Wall Rule will work, but you have to start at the beginning of the maze! Then you can be guaranteed that you won't hit a free-standing wall".

Which is true in most cases, but what if what you are looking for an exit in a free-standing section of the maze? For example what if the treasure is in the middle or you are dealing with a 3-dimensional maze?
Same Wall Rule Failing to Find Treasure
This reminded me of a paper that Cormac Herley recently wrote titled: Justifying Security Measures. I highly recommend reading it. It points out that in the security community we often say:
Security(X) > Security(~X)
When we really mean:
Outcome(X|ABCD) > Outcome(~X|ABCD).
Which is a fancy way of showing that when we say doing X is more secure than not doing X, there usually is a large number of assumptions, (ABCD....), that we're leaving out. Where this directly relates to the main topic of this blog, (password security), is that Herley specifically calls out the password field for the practice of ignoring constraints in our security advice. Or, to quote his paper:
"Passwords offers a target-rich environment for those seeking tautologies and
unfalsifiable claims."
Now back to the issue of maze solving, the same problem often arises. When we make a maze solving algorithm, we're making certain assumptions about the rules of the game. For example, the next iteration of a mapping algorithm might involve marking rooms that you have been in before to detect loops. Well there is a certain fairy-tale where that approach failed due to the marks being destroyed by a 3rd party actor:

Hansel and Gretel showing that marks aren't always permanent
Even assuming you can safeguard your marks in the maze, that approach may still not be effective if the maze moves while you are traversing it.

I've never seen such an amazing premise turned into such a boring book
Note, these assumptions go both ways. For example if you are designing a super hard maze, a snarky player can often do something completely unexpected.

Seriously, why would you want to go through the maze?
I'd argue that coming up with a perfect maze solver that works for all mazes with no constraints is a near impossible problem. If you can design an algorithm, chances are someone else can come up with a situation where it will fail. On the plus side, the same goes for maze designers. If you come up with a maze with constraints, someone probably can solve it even if it's not how you expected the maze to be solved.

This is a point that I'm actually optimistic about. We deal with imperfect knowledge of the rules we're operating under every day. That's part of the human condition! Tying this back in with Herley's paper, I think there's some things to keep in mind.
  1. When giving advice to end users, I think it's fair to leave implied constraints out as long as the person giving the advice keeps them in mind. Aka telling your kids to follow the right hand wall to get through a corn maze is perfectly reasonable. Telling your kids this assumes there are no minotaurs or evil clowns waiting in the maze to eat them probably will not result in the end state you are aiming for.
  2. Unfortunately following the above can lead to those constraints being forgotten over time and that advice being applied to situations where it is no longer helpful.
  3. Therefore you need to be willing to question previously held beliefs and come up with new approaches when reality doesn't match your expected experiences.
The question then is, how do you discover/rediscover unknown constraints when your start experiencing issues?

One way to deal with this is through experimental design along with making hypothesis about what the results of those experiments will be before you run them. That's something I'm trying to get better at doing as seen in my previous blog post

As an example: Hurley raises the question "Are lower-case pass-phrases better or worse than passwords with a mix of characters". If I construct an experiment I have to specify a set of constraints that experiment will run under. Now do those constraints match up with the real world use-cases. Of course not! But the fact that there are constraints can help myself and other people interpret how to use those results. Likewise before running an experiment it's important to have a theory and make a hypothesis about what the results will be. Once that's done, running the experiment can validate or falsify the hypothesis. I can then update theory as needed and the process continues.

To put it another way, I think there is a lot of areas where the academic side of computer security can help improve the practical impact that computer security choices impose on the end user ;p


Sunday, August 14, 2016

Evaluating the Value of the (@)Purge Rule

“Only sometimes when we pick and choose among the rules we discover later that we have set aside something precious in the process.”  
― Helen Simonson, Major Pettigrew's Last Stand

Background and Problem Statement:

I was recently asked the following question: "Is there any value in supporting the character purge rule in Hashcat?" The purge rule '@x' will remove all characters of a specific type from a password guess. So for example the rule '@s' would turn 'password' into 'paword'. The full thread can be found on the Hashcat forum here. The reason behind this inquiry was that while the old version of Hashcat implemented the character purge rule, GPU versions of Hashcat and Hashcat 3.0 dropped support for it. Since then, At0m added support for the rule back in the newest build of Hashcat which makes this question much less pressing. That being said, similar questions pop up all the time and I felt it was worth looking into if only to talk about the process of investigating problems like this.

Side note, as evidence that any change will break someone's workflow, when researching this topic I did find one user who stored passphrase dictionaries with spaces left intact. They would then use the purge rule to remove the spaces during a cracking session so that way they wouldn't have to save a second copy of their passphrase wordlist without spaces. For that reason alone I think there is some value in the purge rule

The Purge Rule Explained:

Hashcat Rule Syntax: @X where (X) is the character you want to purge from the password guess
Example Rule: @s
Example Input: password
Example Output: paword


Hypothesis:

My gut feeling is that the purge rule will have limited impact on a cracking session. I base that on a rule of thumb that mangling rules work best if they mimic the thought process people use when creating passwords. For example, people often start with a base word and then append digits to it, replace letters with L33t replacements, etc. Therefore rules that mimic these behaviors tend to be more successful. I just don't see many people removing character classes from their password.

Now if you are a Linux fan, you'll realize Linux developers *love* removing characters from commands. Do you want to change your password? Well "passwd" is the command for you! Maybe Linux developers use the same strategy for their passwords? So I certainly could be wrong. That being said, the whole idea of a hypothesis is to go out on a limb and make a prediction on how an existing model will react so here I go:

My hypothesis is that the purge rule will crack less than 1 thousand passwords of a 1 million password dataset, (0.1%). Of those passwords cracked, a vast majority (95%), will be cracked due to weaknesses of the input dictionary vs. modeling how the user created the password. For example, 'paword' might be a new Pokemon type that didn't show up in the input dictionary vs being created by a user taking the word 'password' and then removing the S's.

Short Summary of Results:

The purge ruleset cracked 164 passwords (0.016% of the test set). This was slightly better then just using random rules which in a test run cracked 23 password, but not by much. Supporting this rule is unlikely to help in any noticeable degree with your cracking sessions.

Experimental Setup:

Test Dataset: 1 million passwords from the newest MySpace leak. These were randomly selected from the full set using the 'gshuf -n 1000000' command.

Reason: Truth be told, the main reason I used the MySpace passwords was I'm getting tired of using the RockYou dataset for everything. That being said, it's useful for this experiment that all of the passwords in that dataset have been converted to lowercase since I don't have to worry about combining case mangling rules with the purge rules.

Tools Used: Hashcat for the cracking, and John the Ripper for the --status option

Rulesets Used: Hashcat's D3ad0ne manging rules. I broke it up into two different rulesets with one containing the purge rules, (along with a few append/prepend '@' rules that snuck in), and the other one containing all the other mangling rules.

Reason: D3ad0ne's mangling rules contains about 34 thousand individual mangling rules. Due to its size and the fact that it is included with Hashcat it should make a good example of a ruleset that many Hashcat users are likely to incorporate in their cracking sessions. I initially split the base ruleset into two different subsets, with all rules including the '@' into one ruleset called d3ad0ne_purge, and all the other rules into another one called d3ad0ne_base. I then started manually going through d3ad0ne_purge and placing rules such as "append a @" into the d3ad0ne_base, but with over 1k rules in d3ad0ne_purge I quickly decided to remove the results of the append/prepend '@' after the fact instead of trying to fully isolate only purge rules in their own ruleset.


Dictinary Used: I used dic-0294 as my wordlist. Yes there are better input dictionaries out there, but this is a common one and strikes a good balance between size and coverage, plus it is public vs other dictionaries I have that are based on cracked passwords


Experimental Results:

Step 1) Run a normal cracking session on the 1 million myspace passwords using dic-0294 and D3ad0ne_base. This is important since the purge rule will likely crack many passwords that would be cracked normally with other rules. Running a normal cracking session first remove those passwords so we can focus on password that would only be cracked by the purge rules. The command I ran was below, (note, I'm editing some of the path information out of the commands for clarity sake).
./hashcat -D1 -m 100 -a 0 --remove myspace_rand_1m_hc.txt -r rules/d3ad0ne_base.rule dic-0294.txt
A couple of notes about the above rule. I'm using a version of Hashcat that I updated on August 10th 2016. I ran it on a very old MacBook Pro so the -D1 is telling it to use CPU only, (since the GPU doesn't have enough memory). The -m 100 is telling it to crack unsalted SHA-1 hashes. The -a 0  is to do a basic dictionary attack. --remove was to remove any cracked hashes so they aren't counted twice in future cracking sessions. myspace_rand_1m_hc.txt is my target set, rules/d3ad0ne_base.rule is my ruleset, and dic-0294.txt is my input dictionary. Below are the results of running this first attack.



With 36% of the passwords cracked by a very vanilla attack on a slow computer, that isn't bad. Next up is running the purge rules.

Step 2) Delete the previous hashcat.pot file. Run a cracking session on the remaining passwords using the purge ruleset. The command I ran was very similar to the one above:
./hashcat -D1 -m 100 -a 0 myspace_rand_1m_hc.txt -r rules/d3ad0ne_purge.rule dic-0294.txt
Note, I took off the --remove option since I didn't care about removing cracked hashes for this. I also deleted the previous .pot file of cracked passwords since I only wanted to store passwords associated with this test. Here is a screenshot I took partway through the cracking session:



As you can see. many of the cracked passwords were due to "insert a @ symbol" vs. using the purge rule. Here are the final results:



The session managed to crack 405 unique hashes. I then went into the pot file and deleted any password containing the '@' character so what was left was due to the purge rule.  This left me a list containing 128 unique passwords. A screenshot is shown below:



Now it's hard to tell what people were thinking when they created these passwords, but glancing through the list, it certainly appeared that most of the cracked passwords were simply due to limitations in my input dictionary vs users purging characters from their passwords. I was actually surprised 'jayden' and 'fatguy' weren't in dic-0294 but after double checking it they were in fact missing from it.

Now, input dictionaries are always going to be limited to a certain extent so these cracks absolutely count. They only represent uniq cracked hashes though. For example, if 20 people used the password 'imabear' it would only be counted once. To figure out how many total accounts would have been cracked, I re-ran the above dictionary through John the Ripper against the myspace_1m_rand list. This was to get the files into John's cracked file (pot) format. For example here is 'imabear' in john.pot:

{SHA}QiPoQuc4sqqs3J+OulWLt3H09kY=:imabear

The reason I did this was because JtR has a really cool feature '-show' that will match up cracked passwords with the accounts in the target set. Running the command:

./john -format=raw-sha1 -show myspace_rand_1m_clean.txt

resulted in the following output:


Therefore the purge rules cracked a total of 164 passwords from the test set, or 0.0164% of the total. That's a really small amount. Admittedly every password cracked is nice, but still I was curious if the purge rules were better then just running random mangling rules instead. Luckily, Hashcat supports a command to test that out:
./hashcat -D1 -m 100 -a 0 myspace_rand_1m_hc.txt -g 500 dic-0294.txt
The only difference with the above command and the previous Hashcat commands I ran was that instead of a rules file I specified '-g 500'. What that does is tell Hashcat to generate 500 random rules to run on the input dictionary. I choose that number since there were over a thousand rules in my D3ad0ne_purge dictionary and I guestimated that about half of them were actual purge rules. When I ran the above I ended up cracking 23 more passwords. That's significantly less then the 164 the purge rules did but in the grand scheme of things it was about the same in effectiveness. Considering some of those rules were likely duplicates of rules in D3ad0ne_base ruleset as well I'd argue that running a purge rule is about equivalent of running a random mangling rule. In fact if you don't already have purge rules in your mangling set, I'd probably recommend not worrying about it and just running a brute force method like Markov mode to stretch your dictionary instead.

Conclusion:

For once my gut feeling was right and the value of Hashcat's purge rule '@' was limited in the tests that were run. That's not to say that it's not useful. It may help when targeting certain users or aid in keeping the size of your dictionary files on disk manageable. But at the same time, it's not a major feature that other password crackers should rush to mimic. I hope this blog post was informative in helping show different ways to evaluate the effectiveness of a mangling technique. If you have any questions, comments or suggestions please feel free to leave them in the comments section.

Thursday, July 7, 2016

Cracking the MySpace List - First Impressions

Alt Title: An Embarrassment of Riches

Backstory:

Sometime around 2008, a hacker or disgruntled employee managed to break into MySpace and steal all the usernames, e-mails, and passwords from the social networking site. This included information covering more than 360 million accounts. Who knows what else they stole or did, but for the purposes of this post I'll be focusing only on the account info. For excellent coverage of why the dataset appears to be from 2008 let me refer you to the always superb Troy Hunt's blog post on the subject. Side note, most of my information about this leak also comes from Troy's coverage.

This dataset has been floating around the underground crime markets since then, but didn't gain widespread notoriety until May 2016 when an advertisement offering it for sale was posted to the "Real Deal" dark market website. Then on July 1st, 2016, another researcher managed to obtain a copy and then posted a public torrent of then entire leak for anyone to download. That's where things stand at this moment.

Unpacking the Dataset:

The first thing that stands out about the dataset is how big it is. When uncompressed the full dump is 33 Gigs. Now, I've dealt with database dumps of similar size but they always included e-mails, forum posts, website code, etc. The biggest password dataset I previously had the chance to handle was RockYou set which weighed in at 33 million passwords and took up 275 MB of disk. Admittedly that didn't include user info and passwords were stored as plaintext, (the plaintexts are generally shorter than hex representation of hashes), but still that's a huge leap in data to process. Heck, even the full RockYou list is a bit of a pain to processes.

Let me put this another way. Here is a simple question, "How many accounts are in the MySpace list?" Normally that's quick and easy. Just run:
wc -l
And then you wait ... and wait ... and wait ... and then Google if there is a faster way to count lines .. and then wait. 16 minutes and 24 seconds later, I fount out there were 360,213,049 lines in the file. Does that equal the number of total accounts or is there junk in that file? Well, I don't want to spend the 30+ minutes to run a more complicated parser so that sounds about right to me ¯\_(ツ)_/¯.  Long story short, doing anything with this file takes time. Eventually I plan on moving over to a computer with a SSD and more hardware which should help but it's something to keep in mind.

That being said, the next question is "What does the data look like?" Well here is a screenshot of the first couple of lines.


As you can see, it takes the form of unique ID that increments, e-mail address, username, and then two hashes. All of the fields except the unique ID can be blank.To answer the next question, "Why two hashes?" well ... ¯\_(ツ)_/¯. That's something I plan on looking at but I haven't gotten around to it yet.

Update: 7/7/16: Just as I was finalizing this post, I ran across CynoSure Prime's analysis where they managed to crack almost every single hash in this dataset. You can find their blog post here. It turns out the second hash is actually the original password, (full length with upper case characters) salted with the user_id. I'm going to leave most of this blog entry unmodified even though how to parse the list can certainly be optimized based on this new info. </Update>

Other random tidbits: The final unique ID is 1005290998. That's significantly higher than the number of accounts in this dataset so there are large chunks of accounts that were deleted at some point in time. My guess is when a user deleted their MySpace account it really was deleted in which case, kudos to MySpace for doing that! That's just a guess though. As you would expect the first accounts were administrative accounts and system process accounts. I know I blocked out the user e-mails but I will admit I googled the first name. When I found his LinkedIn profile my first reaction was, "Wow, he needs brag about his accomplishments more than just saying:"
Developed, and launched the initial Myspace community which currently has over 100 million members and was acquired by Fox Corp. for $580 million.
I mean if it was me I would post that database dump on my resume! Of course further googling led me to to the book "Stealing MySpace." Reading about all the drama that went on and suddenly there went my evening. Needless to say, the general layout of the dataset looks legit but one more interesting fact was all those gmail accounts. MySpace was created in 2003, Gmail opened for invitation access in 2004, and the lead engineer of MySpace left in 2003. So employees were able to update their accounts after they had left the company. Once again, kudos to MySpace but that was surprising.

Password Hash Format:

I initially learned from Troy Hunt's posts that the hashes were unsalted SHA1 with the plaintext lowercased and then truncated to 10 characters long. Therefore the password:
123#ThisIsMyPassword
would be saved as:
 123#thisis
I've heard some people say that this means hackers can just brute force the entire key-space. If I was feeling nit-picky I could argue *technically* that's beyond the reach of commercial setups as 70^10 is still a really big number (27 characters + 10 digits, + 33 special characters). In reality though by intelligently searching the key-space, (who uses commas in their password?), a vast majority of unsalted password hashes can be cracked under that format. It's a bit of a moot point though since the real issue is using such a fast unsalted hash. Ah 2008, when it was still acceptable to claim ignorance for using a bad hashing set-up.

Long story short, from my experiments so far I can confirm that it appears all the hashes had their plaintexts lowercased and truncated to 10 characters. Also, yes, serious attackers are very likely to crack almost every password in this list.

Cracking MySpace Passwords With John the Ripper (Take 1):

After glancing around the dataset, the next thing I wanted to do was start cracking. To do this, I needed to extract and format the hashes. My first attempt to do this yielded the following script:
cat Myspace.com.txt | awk -F':' '{if (length($2) > 3) {print "myspace_big_hash1:" substr($4,3); if (length($5) > 3) {print "myspace_big_hash2:" substr($5,3)}}}'  > myspace_clean_big.hsh
To point out a couple of features, I was labeling my data-sets so they are correctly identified in my input file, (I maintain different input files for different data sets but still having that name there has saved me trouble in the past), and I was removing blank hashes. Also I was stripping the username and e-mail addresses since I really didn't want to see passwords associated with names. The problem was the resulting file was huge. I didn't save it, but it was bigger than the original list! I couldn't afford the full naming convention. Therefore I switched to to following script:
cat Myspace.com.txt | awk -F':' '{if (length($2) > 3) {print substr($4,3); if (length($5) > 3) {print substr($5,3)}}}'  > myspace_temp.hsh
 And then to remove duplicates I ran:
sort -u myspace_temp.hsh > myspace_big.hsh
The resulting file was a little under 8 gigs which was better. Problems occurred though when I tried to load the resulting hash file into JtR. More specifically after letting it run overnight, JtR still hadn't loaded up the password list and started making guesses. That kind of makes sense, That's way more passwords than normal to parse and my laptop only had 8 gigs of ram so even in an ideal case the whole list probably couldn't be stored in memory. That's not an ideal cracking situation. Being curious, I then decided to try and load it up in Hashcat.

Cracking MySpace Passwords With Hashcat:

Loading up the dump in Hashcat was interesting since it gave me warnings about records in the dataset that weren't parsed correctly.


Regardless, once all was said and done, I ended up with the following error:
ERROR: cuMemAlloc() 2


Doing some quick Googling, I found out the cause was that the GPUs ran out of memory trying to load the hashes. Not surprising but it meant I had to take a different approach if I wanted to crack any hashes from this set.

The easiest way to do this was to split the full list up into smaller chunks and then crack each section by itself. One way to do that is with the split command
split -l 5000000000 myspace_big.hs myspace_split_
This will break up the list into 5 million hash chunks that follow the line of myspace_split_aa, myspace_split_ab .... The downside is since you have to crack each file individually, the total cracking time has been increased by close to a factor of 40.  I'd recommend playing with the file size to maximize the total number of hashes per file that your GPU supports. On the plus side, after all that I can now finally crack passwords!

Finally cracking passwords

One issue I had was that there were so many hashes cracking all the time that it was hard to see the status of my session. It's not that my attack was effective, but with a list that large it's hard not to crack something. I belatedly realized I could pause hashcat, print the status and then resume. Or are Jeremi Gosney replied on Twitter, I could have used the following switch with Hashcat:
-o /dev/null 

Closing Thoughts:

I'll admit I'm writing this conclusion with CynoSure Prime's analysis fresh in my mind. While the MySpace list is great for giving me a real world challenge to knock my head against, I'm not sure how useful it'll be from a research perspective. The 66 million salted hashes that were created from the original plaintexts will be nice for new training and testing sets so researcher's don't have to keep using RockYou for everything. That being said, MySpace is actually an older list than RockYou. Also I fully expect there to be a lot of overlap in the passwords between the two datasets. RockYou's entire business model was allowing apps to work across multiple social networking sites in the era before federated logins. RockYou was storing MySpace + LiveJournal + Facebook passwords in the clear so its app could post cross-post across all of them. Statistically I expect MySpace and RockYou to be very similar. 

What worries me though, and what makes the MySpace list special, is it has user information associated with all those 360 million accounts + password hashes. Just about everyone who did any social networking and is between the ages of 24 and 40 is in this dump. I realize this list has been in the hands of criminals for the last eight years and a lot of the damage has already been done. Still, now that this list is public it enables many more targeted attacks to be carried out by malicious actors from all over the internet. How long before we start seeing the top 100 celebrity passwords posted on sites like Gawker? What about ex's using this information against former partners? Previous public password dumps have been much more limited or didn't contain e-mail addresses. I really don't know what will happen with this one. Hopefully I'm being overly paranoid but it's hard not to think about the downsides associated with this dump being widely distributed. On the plus side, hopefully this is the only mega-breach we'll see with weak password storage. Sites like Google and Facebook are now using very strong hashes which will limit a lot of damage if their user information is disclosed in the future.