Monday, June 1, 2009

Frequency Analysis for Stronger Passwords

As a commenter pointed out in my last post, the previous frequency analysis was based on a set of passwords where there was no strong password creation policy in place. What happens when you look at only "strong" passwords? Well, I went through the MySpace list, the Phpbb.com list and the Finnish list and extracted all the passwords that would meet stronger password creation rules, (at least 8 characters long, containing at least 1 lowercase letter, 1 uppercase letter, 1 digit, and 1 special character). This gave me a grand total of 214 passwords, (an impressive number I know...).  I belatedly realized that I forgot to copy a couple of other lists, (such as from Millw0rm, singles.org, etc), from my school computer back in Tallahassee, so I'll try to get someone to send them to me so I can update this post with a larger data set.

As you can see below, uppercase characters dominate the first character set, and numbers/special characters dominate the last character set. Admittedly this is a small sample size. If anyone has a better data set or can point me in the right direction I'd love to take a look at it. Oh, and keep those good comments up ;)

Here is the data:

Overall Character Frequency Charset:
1ear!i0t2soln3#dbA4mcu5$h89S7y*kgPCD@_w-TG6EB.pRHxvQFMLJqYONfKW%VI/&zZXj^U}]\[)(:,+

First Character Frequency Charset:
SPDFCAGT1MBRLNKJVE*!mdcQIH$tqbO432#}zvusk^YXW960-(

Last Character Frequency Charset:e
1!5*327$r94#0e.%kdba]8-wuomlgc^\ZVTSQNKHCA6/

Overall Character Frequecy Analysis:
1 5.81745
e 5.2658
a 4.81444
r 3.91174
! 3.81143
i 3.36008
0 3.15948
t 2.70812
2 2.65797
s 2.60782
o 2.60782
l 2.25677
n 2.10632
3 2.05617
# 2.00602
d 1.80542
b 1.80542
A 1.65496
4 1.65496
m 1.60481
c 1.60481
u 1.55466
5 1.50451
$ 1.45436
h 1.40421
8 1.40421
9 1.35406
S 1.25376
7 1.20361
y 1.15346
* 1.15346
k 1.10331
g 1.10331
P 1.10331
C 1.10331
D 1.05316
@ 1.05316
_ 1.00301
w 0.952859
- 0.952859
T 0.852558
G 0.802407
6 0.802407
E 0.752257
B 0.752257
. 0.752257
p 0.702106
R 0.651956
H 0.651956
x 0.601805
v 0.601805
Q 0.601805
F 0.601805
M 0.551655
L 0.551655
J 0.551655
q 0.501505
Y 0.501505
O 0.501505
N 0.501505
f 0.451354
K 0.451354
W 0.401204
% 0.401204
V 0.351053
I 0.351053
/ 0.351053
& 0.351053
z 0.300903
Z 0.250752
X 0.200602
j 0.150451
^ 0.150451
U 0.150451
} 0.100301
] 0.100301
\ 0.100301
[ 0.100301
) 0.100301
( 0.100301
: 0.0501505
, 0.0501505
+ 0.0501505

----------------------------------------
First Character Frequecy Analysis:
S 7.94393
P 7.00935
D 6.07477
F 5.14019
C 5.14019
A 5.14019
G 4.6729
T 4.20561
1 3.73832
M 3.27103
B 3.27103
R 2.80374
L 2.80374
N 2.33645
K 2.33645
J 2.33645
V 1.86916
E 1.86916
* 1.86916
! 1.86916
m 1.40187
d 1.40187
c 1.40187
Q 1.40187
I 1.40187
H 1.40187
$ 1.40187
t 0.934579
q 0.934579
b 0.934579
O 0.934579
4 0.934579
3 0.934579
2 0.934579
# 0.934579
} 0.46729
z 0.46729
v 0.46729
u 0.46729
s 0.46729
k 0.46729
^ 0.46729
Y 0.46729
X 0.46729
W 0.46729
9 0.46729
6 0.46729
0 0.46729
- 0.46729
( 0.46729

----------------------------------------
Last Character Frequecy Analysis:
1 21.4953
! 18.2243
5 6.07477
* 4.6729
3 4.20561
2 3.73832
7 3.27103
$ 3.27103
r 2.80374
9 2.80374
4 2.80374
# 2.80374
0 2.33645
e 1.86916
. 1.86916
% 1.40187
k 0.934579
d 0.934579
b 0.934579
a 0.934579
] 0.934579
8 0.934579
- 0.934579
w 0.46729
u 0.46729
o 0.46729
m 0.46729
l 0.46729
g 0.46729
c 0.46729
^ 0.46729
\ 0.46729
Z 0.46729
V 0.46729
T 0.46729
S 0.46729
Q 0.46729
N 0.46729
K 0.46729
H 0.46729
C 0.46729
A 0.46729
6 0.46729
/ 0.46729


2 comments:

Brian said...

I know most of your attacks are built around the basis for the password being a word or phrase. What if the basis is a graphical symbol represented by characters? Unfortunately the most "obvious" example is the male genitalia:

8=====>~O~o

Which ironically meets many "strong" password requirements.

Matt Weir said...

Good point Brian. I'm of the feeling that most art style type passwords can best be taken care of as a dictionary based attack. Aka instead of having an input dictionary of words, instead make one of ASCII art. I know the human brain is inventive when it comes to stuff like this, but really people tend to copy what's out there. Doing a quick search for "ascii penises" (you DO NOT want to know my google browser history), came up with some other examples such as...

~ ~<====3
8====> )0(
8===D
(^_-) ~ ~ <===3

but for the most part they were variations of the same theme. You could include different shaft lengths and "external graphics" to account for most of the different ways people would type them. You can also set some bounds. For example it is unlikely someone would use

8=>

or

8============================>

so you can set some reasonable limits to generate your dictionary with. This doesn't just have to apply to genitalia. There are a lot of ASCII art combos people can use. I've seen emoticons used as part of a regular password before. They can also be used standalone. For example:

/><{{{{"> fish

///\oo/\\\ spider

_/\__/\__0> worm

----{,_,"><",_,}---- two mice

...---... S-O-S

»-(¯`·.·´¯)-> heart with an arrow through it

d[ o_0 ]b robot

well you get the idea. And THANK YOU so much for giving me the chance to draw genitalia on my blog ;)