Sunday, July 19, 2009

Pass-Phrase Input Dictionary

I could write some elaborate five thousand word post about this, but the following is fairly self explanatory. I created an input dictionary of all the phrases in wikiquotes for use in cracking pass-phrases. You can download it off my tools site here. The final wordlist has around 187k phrases in it. I limited the phrases to a maximum size of 140 characters since anything longer than a twitter quote probably won't be used in real life. On that note, anyone have any good ideas how to spider all the twitter postings?

A couple of things: First of all I only used the primary quotes, not the derivations, since it's fairly hard to automatically parse them out without bringing in a ton of garbage as well. Second, talking about garbage, I'm parsing user generated data so there are still some "artifacts" in the wordlist. Third, I left capitalization and punctuation in the actual quotes. If anyone wants a list with those removed please let me know. Also if you want a list that only contains the first letter of every word, I can do that as well.  Enjoy.

