Universal Hashing
Post on 20-Mar-2016
36 Views
Preview:
DESCRIPTION
Transcript
Universal Hashing
Worst case analysis Probabilistic analysis
Need the knowledge of the distribution of the inputs
Indicator random variables Given a sample space S and an event A, the
indicator random variable I{A} associated with event A is defined as: 10 if occurs
o/wAI A
E.g.: Consider flipping a fair coin:• Sample space S = { H,T }• Define random variable Y with Pr{ Y=H } =
Pr{ Y=T }=1/2• We can define an indicator r.v. XH associated with
the coin coming up heads, i.e. Y=H
10 if if H
Y HX I Y H Y T
1 Pr 0 Pr1Pr2
HE X E I Y HY H Y T
Y H
{ }
:
:
Pr
1 Pr 0 PrPr
A
A
A
S AS X I A
E X A
E X E I A A AA
Lemma
Proof
Given a sample space and an event in thesample space , let Then
H={ h: U→{0,…,m-1} }, which is a finite collection of hash functions.
H is called “universal” if for each pair of distinct keys k, ∈ U, the number of hash functions h∈H for which h(k)=h( ) is at most |H|/m
Define ni = the length of list T[i] Theorem:
Suppose h is randomly selected from H, using chaining to resolve collisions. If k is not in the table, then E[nh(k)] ≤ α. If k is in the table, then E[nh(k)] ≤ 1+α
Proof: For each pair k and of distinct keys,
define Xk =I{h(k)=h( )}. By definition, Prh{h(k)=h( )} ≤ 1/m, and so
E[Xk ] ≤ 1/m. Define Yk to be the number of keys other than k
that hash to the same slot as k, so that
1[ ] [ ]
k k
Tk
k k
T Tk k
Y X
E Y E Xm
If k∈T, then because k appears in T[h(k)] and the count Yk does not include k, we have nh(k) = Yk + 1and
( )
( )
, |{ : , } |
[ ] [ ]
h k k
h k k
If k T then n Y and T k nnthus E n E Ym
( )
|{ : , } | 11 1[ ] [ ] 1 1 1 1h k k
T k nnThus E n E Ym m
Corollary: Using universal hashing and collision resolution by chaining in an initially empty table with m slots, it takes expected time Θ(n) to handle any sequence of n Insert, Search and Delete operations containing O(m) Insert operations.
Proof: Since n= O(m), the load factor is O(1). By the Thm, each Search takes O(1) time. Each of Insert and Delete takes O(1). Thus the expected time is Θ(n).
Designing a universal class of hash functions: p:prime
For any and , define ha,b : Zp→Zm
1,,1,0 pZ p 1,,2,1 pZ p
pZb pZa
mpbakkh ba mod)mod)(()(,
ppbamp ZbandZah *,, :
Theorem:Hp,m is universal.
Pf: Let k, be two distinct keys in Zp.Given ha,b, Let r= (ak +b) mod p, and s= (a +b) mod p.Then r-s ≡ a(k- ) mod p
For any ha,b∈Hp,m, distinct inputs k and map to distinct r and s modulo p.
Each possible p(p-1) choices for the pair (a,b) with a≠0 yields a different resulting pair (r,s) with r≠s, since we can solve for a and b given r and s:
a=((r-s)((k- )-1 mod p)) mod p b=(r-ak) mod p
There are p(p-1) possible pairs (r,s) with r≠s, there is a 1-1 correspondence between pairs (a,b) with a≠0 and (r,s), r≠s.
For any given pair of inputs k and , if we pick (a,b) uniformly at random from
, the resulting pair (r,s) is equally likely to be any pair of distinct values modulo p.
pp ZZ
Pr[ k and collide]=Prr,s[ r≡s mod m]
Given r, the number of s such that s≠r and s≡r (mod m) is at most
⌈p/m⌉-1≤((p+m-1)/m)-1 =(p-1)/m ∵ s, s+m, s+2m,…., ≤p
Thus, Prr,s[r≡s mod m] ≤((p-1)/m)/(p-1) =1/mTherefore, for any pair of distinct k, ∈Zp, Pr[ha,b(k)=ha,b( )] ≤1/m,so that Hp,m is universal.
Perfect Hashing Good for when the keys are static; i.e. ,
once stored, the keys never change, e.g. CD-ROM, the set of reserved word in programming languages. A perfect hashing uses O(1) memory accesses for a search.
Thm :If we store n keys in a hash table of size m=n2 using a hash function h randomly chosen from a universal class of hash functions, then the probability of there being any collisions is < ½ .
Proof: Let h be chosen from an universal family.
Then each pair collides with probability 1/m , and there are pairs of keys.
Let X be a r.v. that counts the number of collisions. When m=n2,
2n
2
2
1 1 1[ ]2 2 2
' , Pr[ ] [ ] / ,1.
n n nE Xm n
By Markov s inequality X t E X tand take t
Thm: If we store n keys in a hash table of size m=n using a hash function h randomly chosen from universal class of hash functions, then , where nj is the number of keys hashing to slot j.
nnE m
j j 2][ 1
0
2
Pf: It is clear for any nonnegative integer
a,
222 a
aa
]2
[2][
]2
2[][
1
0
1
0
1
0
1
0
2
m
j
jm
jj
m
j
jj
m
jj
nEnE
nnEnE
]2
[2]2
[2][1
0
1
0
m
j
jm
j
j nEn
nEnE
total number of pairs of keys that collide
.2122
12][
. since ,2
12
)1(12
1
0
2 nnnnnE
nmnm
nnm
n
m
jj
Cor: If store n keys in a hash table of size m=n using a hash function h randomly chosen from a universal class of hash functions and set the size of each secondary hash table to mj=nj
2 for j=0,…,m-1, then the expected amount of storage required for all secondary hash tables in a perfect hashing scheme is < 2n.
Pf:
.2][][1
0
21
0
nnEmEm
jj
m
jj
Testing a few randomly chosen hash functions will soon find one using small storage. Cor: Pr[total storage for secondary
hash tables ] 4n] < 1/2 Pf: By Markov’s inequality, Pr[X t]
E[X]/t.
.21
42
4
][}4Pr{
:4 and Take
1
01
0
1
0
nn
n
mEnm
ntmX
m
jjm
jj
m
jj
top related