Top Banner
Hash Tables:
43

Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Mar 31, 2015

Download

Documents

Cole Slemmons
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Hash Tables:

Page 2: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Page 3: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Page 4: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Analysis of hashing with chaining:

• Given a hash table with m slots and n keys, define load factor = n/m : average number of keys per slot.

• Assume each key is equally likely to be hashed into any slot: simple uniform hashing (SUH).

Page 5: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Thm: In a hash table in which collisions are resolved by chaining, an unsuccessful search takes expected time Θ(1+ ) under SUH.

Proof:

Under the assumption of SUH, any un-stored key is equally likely to hash to any of the m slots.

The expected time to search unsuccessfully for a key k is the expected time to search to the end of list T[h(k)], which is exactly .

Thus, the total time required is Θ(1+ ). □

Page 6: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Thm: In a hash table in which collisions are resolved by chaining, a successful search takes time Θ(1+ ), on the average under SUH.

Proof:Let the element being searched for equally likely

to be any of the n elements stored in the table.The expected time to search successfully for a

key.

Elements before x in the list were inserted after x was inserted.

We want to find the expected number of elements added to x’s list after x was added to the list.

x.... ....

Page 7: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Let xi denote the ith element into the table, for i =1 to n, and let ki=key[xi].

Define Xij = I{ h(ki)=h(kj) }. Under SUH, we have Pr{ h(ki)=h(kj) } = 1/m = E[Xij ].

1 1 1 1

1 1 1

2

(

1 1E[ (1 )] (1 E[ ])

1 1 1(1 ) 1 ( )

1 ( 1)

2 ) (1 ).

11 ( ) 1 1

2 2 2

2 2

2

n n n n

ij iji j i i j i

n n n

i j i i

X Xn n

n in m mn

n n nn

mn m n

n

Page 8: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

The multiplication method:0<A<1

14

32

32

Knuth suggests that ( 5 1) / 2 0.6180339887...

Take 123456, 14, 2 16384 and 32.

2 2654

1761

435769

327706022297664 (76300 2 )

The 14 most significant bits of is

12864

176112 !67864

A

k p m w

s A

k s

Page 9: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Universal Hashing

• H={ h: U→{0,…,m-1} }, which is a finite collection of hash functions.

• H is called “universal” if for each pair of distinct keys k, U, the number of hash functions h H for ∈ ∈which h(k)=h( ) is at most |H|/m

• Define ni = the length of list T[i]

• Thm: suppose h is randomly selected from H, using chaining to resolve collisions. If k is not in the table, then E[nh(k)]

≤ α. If k is in the table, then E[nh(k)] ≤ 1+α

Page 10: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Proof:– For each pair k and of distinct keys,

define Xk =I{h(k)=h( )}.

– By definition, Prh{h(k)=h( )} ≤ 1/m, and so E[Xk ] ≤ 1/m.

– Define Yk to be the number of keys other than k that hash to the same slot as k, so that

1[ ] [ ]

k k

Tk

k k

T Tk k

Y X

E Y E Xm

Page 11: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

– If k T, then because k appears in T[h(k)] and the count ∈Yk does not include k, we have nh(k) = Yk + 1

and

( )

( )

, |{ : , } |

[ ] [ ]

h k k

h k k

If k T then n Y and T k n

nthus E n E Y

m

( )

|{ : , } | 1

1 1[ ] [ ] 1 1 1 1h k k

T k n

nThus E n E Y

m m

Page 12: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Designing a universal class of hash functions:

p:prime

For any and , define

, ha,b:Zp→Zm

1,,1,0 pZ p 1,,2,1 pZ p

pZb pZa

mpbakkh ba mod)mod)(()(,

ppbamp ZbandZah *,, :

Page 13: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Theorem:

Hp,m is universal.

Pf: Let k, be two distinct keys in Zp.

Given ha,b, Let r=(ak+b) mod p , and

s=(a +b) mod p.

Then r-s≡a(k- ) mod p

Page 14: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

For any ha,b∈Hp,m, distinct inputs k and map to distinct r and s modulo p.

Each possible p(p-1) choices for the pair (a,b) with a≠0 yields a different resulting pair (r,s) with r≠s, since we can solve for a and b given r and s:

a=((r-s)((k- )-1 mod p)) mod p b=(r-ak) mod p

Page 15: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• There are p(p-1) possible pairs (r,s) with r≠s, there is a 1-1 correspondence between pairs (a,b) with a≠0 and (r,s), r≠s.

Page 16: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• For any given pair of inputs k and , if we pick (a,b) uniformly at random from

, the resulting pair (r,s) is equally likely to be any pair of distinct values modulo p.

pp ZZ

Page 17: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Pr[ k and collide]=Prr,s[r≡s mod m]

• Given r, the number of s such that s≠r and s≡r (mod m) is at most

⌈p/m⌉-1≤((p+m-1)/m)-1 =(p-1)/m ∵ s, s+m, s+2m,…., ≤p

Page 18: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Thus,

Prr,s[r≡s mod m] ≤((p-1)/m)/(p-1)

=1/mTherefore, for any pair of distinct k, ∈Zp,

Pr[ha,b(k)=ha,b( )] ≤1/m,

so that Hp,m is universal.

Page 19: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Open addressing:– There is no list and no element stored

outside the table.– Advantage: avoid pointers, potentially

yield fewer collisions and faster retrieval.

Page 20: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

– – For every k, the probe sequence

is a permutation of .– Deletion from an open-address hash

table is difficult.– Thus chaining is more common when

keys must be deleted.

: 0,1, , 1 0,1, , 1h U m m

,0 , ,1 , , , 1h k h k h k m

0,1, , 1m

Page 21: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Page 22: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Page 23: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Linear probing:– ~ an ordinary hash

function (auxiliary hash function).– .

: 0,1, , 1h U m

, mod h k i h k i m

Page 24: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Quadratic probing:– ,where h’ is an

auxiliary hash function, c1 and c2≠0 and are constants.

21 2, mod h k i h k c i c i m

Page 25: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Double hashing:– ,where h1 and h2

are auxiliary hash functions.– probe sequences; Linear and

Quadratic have probe sequences.

1 2, mod h k i h k ih k m

2m m

Page 26: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Page 27: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Analysis of open-addressing hashing

: load factor,

with n elements and m slots.

n

m

Page 28: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Thm:

Given an open-address hash table with load factor , the expected number of probes in an unsuccessful search is at most , assuming uniform hashing.

1n m

1 1

Page 29: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Pf:– Define the r.v. X to be the number of probes

made in an unsuccessful search.– Define Ai: the event there is an ith probe and it

is to an occupied slot.– Event . 1 2 1iX i A A A

Page 30: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

– The prob. that there is a jth probe and it is to an occupied slot, given that the first j-1 probes were to occupied slots is (n-j+1)/(m-j+1). Why?

1 2 1

1 2 1 3 1 2

1 1 2 2

Pr Pr

Pr Pr | Pr |

Pr |

i

i i

X i A A A

A A A A A A

A A A A

1Prn

Am

Page 31: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

– ∵n<m, (n-j)/(m-j) ≤ n/m for all 0 ≤ j<m.–

1 1

1

1 1 0

1 2Pr[ ]

1 2

( )

1[ ] Pr[ ]

1

i i

i i

i i i

n n n iX i

m m m in

m

E X X i

Page 32: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Cor: Inserting an element into an open-addressing hash table with load factor α requires at most 1/(1- α) probes on average, assuming uniform hashing.

Page 33: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Thm: Given an open-address hash table with load factor α<1, the expected number of probes in a successful search is at most , assuming uniform hashing and that each key in the table is equally likely to be searched for.

1

1ln

1

Page 34: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Pf: Suppose we search for a key k.

If k was the (i+1)st key inserted into the hash table, the expected number of probes made in a search for k is at most

1/(1-i/m)=m/(m-i).

Page 35: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Averaging over all n keys in the hash table gives us the average number of probes in a successful search:

1

1ln

1ln

1111

)(111

1

1

0

1

0

nm

m

x

dx

k

HHimn

m

im

m

nm

nm

m

nmk

n

inmm

n

i

Page 36: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Perfect Hashing:

Page 37: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Perfect hashing :good for when the keys are static; i.e. , once stored, the keys never change, e.g. CD-ROM, the set of reserved word in programming language.

• Thm :If we store n keys in a hash table of size m=n2 using a hash function h randomly chosen from a universal class of hash functions, then the probability of there being any collisions < ½ .

Page 38: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Proof:Let h be chosen from an universal family. Then each pair collides with probability 1/m , and there are pairs of keys.Let X be a r.v. that counts the number of collisions. When m=n2,

2

n

2

2

1 1 1[ ]

2 2 2

' , Pr[ ] [ ] / ,

1.

n n nE X

m n

By Markov s inequality X t E X t

and take t

Page 39: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Thm: If we store n keys in a hash table of size m=n using a hash function h randomly chosen from universal class of hash functions, then , where nj is the number of keys hashing to slot j.

nnEm

j j 2][1

0

2

Page 40: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Pf:– It is clear for any nonnegative integer a,

222 a

aa

]2

[2][

]2

2[][

1

0

1

0

1

0

1

0

2

m

j

jm

jj

m

j

jj

m

jj

nEnE

nnEnE

Page 41: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

]2

[2]2

[2][1

0

1

0

m

j

jm

j

j nEn

nEnE

total number of collisions

.2122

12][

. since ,2

1

2

)1(1

21

0

2 nnn

nnE

nmn

m

nn

m

n

m

jj

Page 42: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Cor: If store n keys in a hash table of size m=n using a hash function h randomly chosen from a universal class of hash functions and we set the size of each secondary hash table to mj=nj

2 for j=0,…,m-1, then the expected amount of storage required for all secondary hash tables in a perfect hashing scheme is < 2n.

Page 43: Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

• Cor: Same as the above, Pr{total storage 4n} < 1/2

• Pf:– By Markov’s inequality, Pr{ X t } E[X]/t.–

.2

1

4

2

4

][

}4Pr{

:4 and Take

1

01

0

1

0

n

n

n

mE

nm

ntmX

m

jjm

jj

m

jj