Optimizing Password Composition Policies Jeremiah Blocki Saranga Komanduri Ariel Procaccia Or Sheffet To appear at EC 2013
Feb 22, 2016
Optimizing Password Composition Policies
Jeremiah BlockiSaranga Komanduri
Ariel ProcacciaOr Sheffet
To appear at EC 2013
2
3
Password Composition Policy
password
Password Composition Policy
4
How Do Users Respond?
Password1
6
Predictable Responses
1. password2. 1234563. 123456784. abc1235. qwerty6. monkey7. letmein8. dragon9. 111111….25. password1
7
Previous Work
• Initial password composition policies designed without empirical data [BDP, 2006].
• User’s respond to password composition policies in predictable ways [KSKMBCCE, 2011]
• Trivial password choices vary widely across contexts [BX, 2012].
• No theoretical models of password composition policies.
8
Our Contributions
We initiate an algorithmic study of password composition policies.
Theoretical Model
Security Goal
Policy Structure
User Model
9
Outline
• User Model• Policy Structure• Goal• Algorithms and Reductions• Experiments
10
Rankings ModelUser 1 User 2 User 3 User 4 User 5 User 6 User 7
password 123456 letmein password 12345 password password
letmein 12345 password 111111 123456 111111 111111
abc123 12345678 baseball Passw0rd 12345678 letmein Passw0rd
Passw0rd password Passw0rd abc123 baseball iloveyou iloveyou
… … … … … … …
qwerty Passw0rd qwerty1 letmein password baseball #$H%*@T
qwerty1 qwerty1 qwerty qwerty P@ssw0rd #$H%*@T letmein
Each User: Passwords P ordered by preference.n = 7 (number of users).
11
Rankings Model: Example 1User 1 User 2 User 3 User 4 User 5 User 6 User 7
password 123456 letmein password 12345 password password
letmein 12345 password 111111 123456 111111 111111
abc123 12345678 baseball Passw0rd 12345678 letmein Passw0rd
Passw0rd password Passw0rd abc123 baseball iloveyou iloveyou
… … … … … … …
qwerty Passw0rd qwerty1 letmein password baseball #$H%*@T
qwerty1 qwerty1 qwerty qwerty P@ssw0rd #$H%*@T letmein
Allowed Passwords All Passwords
𝑨=𝑷− { 𝑝𝑎𝑠𝑠𝑤𝑜𝑟𝑑 ′ }
12
Rankings Model: Example 1
Pr[111111 | A] = 3/7Pr[letmein | A] = 2/7Pr[123456 | A]=Pr[12345 | A]=1/7
User 1 User 2 User 3 User 4 User 5 User 6 User 7
password 123456 letmein password 12345 password password
letmein 12345 password 111111 123456 111111 111111
abc123 12345678 baseball Passw0rd 12345678 letmein Passw0rd
Passw0rd password Passw0rd abc123 baseball iloveyou iloveyou
… … … … … … …
qwerty Passw0rd qwerty1 letmein password baseball #$H%*@T
qwerty1 qwerty1 qwerty qwerty P@ssw0rd #$H%*@T letmein
13
Rankings Model: Example 2User 1 User 2 User 3 User 4 User 5 User 6 User 7
password 123456 letmein password 12345 password password
letmein 12345 password 111111 123456 111111 111111
abc123 12345678 baseball Passw0rd 12345678 letmein Passw0rd
Passw0rd password Passw0rd abc123 baseball iloveyou iloveyou
… … … … … … …
qwerty Passw0rd qwerty1 letmein password baseball #$H%*@T
qwerty1 qwerty1 qwerty qwerty P@ssw0rd #$H%*@T letmein
)}(|{ wNoNumberswPAAllowed Passwords All Passwords
14
Warm-up
Fact: Let A’ A then for any w A’ Pr[w|A] ≤ Pr[w|A’]
User 1 User 2 User 3 User 4 User 5 User 6 User 7
password 123456 letmein password 12345 password password
letmein 12345 password 111111 123456 111111 111111
abc123 12345678 baseball Passw0rd 12345678 letmein Passw0rd
Passw0rd password Passw0rd abc123 baseball iloveyou iloveyou
… … … … … … …
qwerty Passw0rd qwerty1 letmein password baseball #$H%*@T
qwerty1 qwerty1 qwerty qwerty P@ssw0rd #$H%*@T letmein
Initially one person uses letmein as their password.
letmein
15
Warm-up
Fact: Let A’ A then for any w A’ Pr[w|A] ≤ Pr[w|A’]
User 1 User 2 User 3 User 4 User 5 User 6 User 7
password 123456 letmein password 12345 password password
letmein 12345 password 111111 123456 111111 111111
abc123 12345678 baseball Passw0rd 12345678 letmein Passw0rd
Passw0rd password Passw0rd abc123 baseball iloveyou iloveyou
… … … … … … …
qwerty Passw0rd qwerty1 letmein password baseball #$H%*@T
qwerty1 qwerty1 qwerty qwerty P@ssw0rd #$H%*@T letmein
Every user who used letmein before is still using the same password.
16
Outline
• User Model• Policy Structure– Positive Rules – Negative Rules – Singleton Rules
• Goal• Algorithms and Reductions• Experiments
17
Positive Rules
Rules R1,…,Rm P
R1 = {w | Length(w) 14}.
Active Rules: S {1,…,m}.
.
18
Positive Rules - Example
Rules R1,…,Rm P
R1 = {w | Length(w) 14}.
Active Rules: S {1,…,m}.
A{1}= {w | Length(w) 14}.
19
Negative Rules
Rules R1,…,Rm P
R1 = {w | Length(w) < 8}.
Active Rules: S {1,…,m}.
20
Negative Rules - Example
Rules R1,…,Rm P
R1 = {w | Length(w) < 8}.
Active Rules: S {1,…,m}.
A{1}= P - {w | Length(w) < 8}.
21
Singleton Rules
Rule Rw= {w} for each w P.
Can allow/ban any individual password.
Special Case of Positive Rules/Negative Rules.
22
Outline
• User Model• Policy Structure• Goal• Algorithms and Reductions• Experiments
23
Online Attack
password
Guess Limit: k-strikes policy
12345
12345
p(k, A) – probability of a successful untargeted attack given A.
25
p(k,A) - Example
p(1,A) = Pr[111111] = 3/7p(2,A) = p(1,A) + Pr[letmein] = 5/7p(3,A) = p(2,A) + Pr[123456]= 6/7
User 1 User 2 User 3 User 4 User 5 User 6 User 7
password 123456 letmein password 12345 password password
letmein 12345 password 111111 123456 111111 111111
abc123 12345678 baseball Passw0rd 12345678 letmein Passw0rd
Passw0rd password Passw0rd abc123 baseball iloveyou iloveyou
… … … … … … …
qwerty Passw0rd qwerty1 letmein password baseball #$H%*@T
qwerty1 qwerty1 qwerty qwerty P@ssw0rd #$H%*@T letmein
26
Goal: Optimize p(k,A)
Goal: Find a password composition policy S {1,…,m} which minimizes p(k,AS) for some k.
p(k, A) – Fraction of accounts an adversary can crack with k guesses per account given policy A.
p(1, A): minimum entropy of the password distribution resulting from policy A.
28
Outline
• User Model• Policy Structure• Goal• Algorithms and Reductions• Experiments
29
ResultsRankings Model
Constant k Large k
Singleton Rules P NP-HardAPX-Hard (UGC)
Positive Rules P NP-Hard
Negative Rules n1/3-approx is NP-Hard NP-Hard
This Talk: k=1
n1/3-approx is NP-Hard
Parameters: n, m, |P|
30
Negative Rules are Hard!
Theorem: Unless P = NP no polynomial time algorithm can even approximate p(1,AS) to a factor of n1/3- in the negative rules setting.
31
Reduction
Maximum Independent Set: g vertices e edges
Theorem [Hastad 1996]: NP-Hard to distinguish the following two cases (1) any independent set has size at most K = g or (2) the maximum independent set has size g1-.
32
Reduction (Preference Lists)Preference Lists: Type 1
W1 … W1
W2 … W2
… … …WK … WK
B1 … Bg
… … …
Observation: Unless we ban W1,…,WK we have p(1,AS) ≥ g/n
33
Reduction (Preference Lists)
Preference Lists: Type 2 (for each edge e = {u,v})(u,v,1) … (u,v,g)(v,u,1) … (v,u,g)
X … X… … …
Observation: If for any edge e = {u,v} we ban (u,v,1),…,(u,v,g) and (v,u,1),…,(v,u,g) then p(1,AS) ≥ g/n.
34
Reduction (Preference Lists)
Preference Lists: Type 3 (for each vertex v, i j [K])(v,i,j,1) … (v,i,j,g)(v,j,i,1) … (v,j,i,g)
X … X… … …
Observation: If we ban (v,i,j,1),…,(v,i,j,g) and (v,j,i,1),…,(v,j,i,g) then p(1,AS) ≥ g/n.
35
Reduction (Rules)
Ru,1
Rv,2Rw,3
Rx,4
K=4Preference Lists: Type 1
W1 … W1
W2 … W2
… … …
WK … WK
B1 … Bg
… … …
s
t
36
Reduction (Rules)
Ru,1
Rv,2Rw,3
Rx,4
K=4 Preference Lists: Type 2 (edge e = {u,x})
(u,x,1) … (u,x,g)
(x,u,1) … (x,u,g)
X … X
… … …
s
t
p(1,AS) ≥ g/n
37
Reduction (Rules)
Ru,1
Rv,2Rw,3
Rx,4
K=4 Preference Lists: Type 2 (edge e = {u,s})
(u,s,1) … (u,s,g)
(s,u,1) … (s,u,g)
X … X
… … …
s
t
38
Reduction (Rules)
Ru,1
Rv,2Rw,3
Rx,4
K=4 Preference Lists: Type 2 (edge e = {s,t})
(s,t,1) … (s,t,g)
(t,s,1) … (t,s,g)
X … X
… … …
s
t
39
Reduction (Rules)
Ru,1
Rv,2Rw,3
Rw,4
K=5Preference Lists: Type 1
W1 … W1
W2 … W2
… … …
WK … WK
B1 … Bg
… … …
s
t
p(1,AS) ≥ g/n
40
Reduction (Rules)
Ru,1
Rv,2Rw,3
Rw,4
K=5Preference Lists: Type 1
W1 … W1
W2 … W2
… … …
WK … WK
B1 … Bg
… … …
s
t
Rv,5
41
Reduction (Rules)
Ru,1
Rv,2Rw,3
Rw,4
K=5
s
t
Rv,5
Preference Lists: Type 3 (for each vertex u, i j [K])
(v,2,5,1) … (v,2,5,g)
(v,5,2,1) … (v,5,2,g)
X … X
… … …
p(1,AS) ≥ g/n
42
Reduction (Rules)
Ru,1
Rv,2Rw,3
Rw,4
K=4
s
tPreference Lists: Type 3 (w, i=4, j=2)
(w,4,2,1) … (w,4,2,g)
(w,2,4,1) … (w,2,4,g)
X … X
… … …
44
ReductionIndependent Set of Size K? maxS [m] p(1,AS)Yes 1/n
No g/n where n = O(g3)
45
ResultsRankings Model
Constant k Large k
Singleton Rules P NP-HardAPX-Hard (UGC)
Positive Rules P NP-Hard
Negative Rules n1/3-approx is NP-Hard NP-Hard
This Talk: k=1
P
Parameters: n, m, |P|
46
Key Difference: Positive vs. Negative
Let S w = {i | w Ri} (all rules Ri that contain w).
Negative Rules: Ban w - activate any rule in Sw.
Positive Rules: Ban w - deactivate all rules in Sw.
47
Positive Rules
Fact: Let S* {1,…m} denote the optimal solution, and let S S* then either
(1) p(1,AS) = p(1,AS*), or (S is optimal) (2) S-Sw S*, where Pr[w|AS] = p(1,AS).
All rules Ri that contain the most popular word in AS.
48
Positive Rules
Fact: Let S* {1,…m} denote the optimal solution, and let S S* then either
(1) p(1,AS) = p(1,AS*), or (S is optimal) (2) S-Sw S*, where Pr[w|AS] = p(1,AS).
Proof: Suppose for contradiction that w AS*, and observe that .
Therefore, . Contradiction!S*S AA
Si
iSi
i RR*
S*S*S AA|wA |PrPr,1 wp
49
Positive Rules Algorithm
Iterative Elimination: Initialize: S0 = {1,…,m} Repeat: (Ban w - current most popular password)
Si+1 = Si – Sw
Claim: One of the Si’s must be the optimal solution!
50
ResultsRankings Model
Constant k Large k
Singleton Rules P NP-HardAPX-Hard (UGC)
Positive Rules P NP-Hard
Negative Rules n1/3-approx is NP-Hard NP-Hard
This Talk: k=1
Question: What if we don’t have access to the full preference lists of each user? What if we don’t want to run in time n?
Parameters: n, m
51
ResultsRankings Model
Constant k Large k
Singleton Rules P NP-HardAPX-Hard (UGC)
Positive Rules P NP-Hard
Negative Rules n1/3-approx is NP-Hard NP-Hard
This Talk: k=1
Sampling Algorithm: ε-approximation with probability 1-δ
Parameters: m, 1/ε, 1/δ
52
Sampling Algorithm
Theorem: There is an efficient algorithm that makes O(m log (m/𝛿)/𝜀2) queries and with probability at least 𝛿 outputs positive rules S ⊆ [m] s.t
p(1,AS) ≤ p(1,AS*)+𝜀.
Sample: q(A) returns w with probability P[w|A].
Idea: Run iterative elimination. In each round use sampling to estimate the probability of the most popular word.
53
Sampling Lemma
Lemma: Let s=100 log (m/)/2 denote the number of samples in each round, and let BADi denote the event that in iteration i, there exists a password w s.t.
(e.g., our probability estimate off by /2). ThenPr[i.BADi]≤
2
|Pr
ssw w
iSA# times w sampled
54
Sampling Lemma
Partition P into buckets.
2
Pr
iSAw
4Pr
2
iS
Aw
… …
12Pr
2 iSi iAw
B0 B1 Biw
Contains at mot 2i+1/ such passwords.
55
Sampling Lemma
Partition P into buckets.
s=100 log (m/)/2
… …B0 B1 Bi
2122PrPr
iw
Sms
sAwi
Chernoff Bounds:
Contains at most 2i+1/ such passwords.
w
56
Sampling Lemma
Partition P into buckets.
s=100 log (m/)/2
… …B0 B1 Bi
Contains at most 2i+1/ passwords.
Union Bound: 1
1
21 22
22Pr.Pr
i
i
iw
Si mmssAwBw
i
57
Sampling Lemma
Partition P into buckets.
s=100 log (m/)/2
… …B0 B1 Bi
mm
BADi
ii
012
PrUnion Bound (buckets):
58
Sampling Lemma
Partition P into buckets.
s=100 log (m/)/2
… …B0 B1 Bi
mmBADi i.PrUnion Bound (rounds):
64
Outline
• User Model• Policy Structure• Goal• Algorithms and Reductions• Experiments– RockYou Dataset– Rules– Results
65
RockYou Dataset
• RockYou password leak: 32 million plaintext passwords.
• No Preference Lists: Insufficient for our sampling algorithm.
• We test our algorithm under an additional assumption…
66
0.51
Normalized Probabilities
RockYou: initial distribution over P.
0.5
letmein (0.1)
PA
1letmein (0.2)
A
67
Normalized ProbabilitiesRankings Model
Constant k Large k
Singleton Rules P NP-HardAPX-Hard (UGC)
Positive Rules P NP-Hard
Negative Rules n1/3-approx is NP-Hard NP-Hard
Normalized Probabilities Model
Constant k Large k
Singleton Rules P P
Positive Rules P NP-Hard
Negative Rules NP-Hard NP-Hard
71
72
Base Line Results
73
Results
74
Discussion
• Optimal solution was better under negative rules.
• However, sampled solutions were much better with positive rules.
• Interesting Directions:– Additional Rules?– Is the Normalized Probabilities Model reasonable?– General experiment in preference list model?
75
Open Questions
• Efficient approximation algorithm in negative rules setting with normalized probabilities assumption?
• Adversary with limited background knowledge about the user (e.g., age, gender, birthday).