Top Banner
Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park
21

Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Hashing

Nelson Padua-Perez

Chau-Wen Tseng

Department of Computer Science

University of Maryland, College Park

Page 2: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Hashing

ApproachTransform key into number (hash value)

Use hash value to index object in hash table

Use hash function to convert key to number

Page 3: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Hashing

Hash TableArray indexed using hash values

Hash Table A with size N

Indices of A range from 0 to N-1

Store in A[ hashValue % N]

Page 4: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Hash Function

GoalScatter values uniformly across range

Hash( <everything> ) = 0

Satisfies definition of hash function

But not very useful

Multiplicative congruency methodProduces good hash values

Hash value = (a int(key)) % N

Where

N is table size

a, N are large primes

Page 5: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Hash Function

Example

hashCode("apple") = 5hashCode("watermelon") = 3hashCode("grapes") = 8hashCode("kiwi") = 0hashCode("strawberry") = 9hashCode("mango") = 6hashCode("banana") = 2

Perfect hash functionUnique values for each key

kiwi

bananawatermelon

applemango

grapesstrawberry

0

1

2

3

4

5

6

7

8

9

Page 6: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Hash Function

Suppose now

hashCode("apple") = 5hashCode("watermelon") = 3hashCode("grapes") = 8hashCode("kiwi") = 0hashCode("strawberry") = 9hashCode("mango") = 6hashCode("banana") = 2

hashCode(“orange") = 3

CollisionSame hash value for multiple keys

kiwi

bananawatermelon

applemango

grapesstrawberry

0

1

2

3

4

5

6

7

8

9

Page 7: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Types of Hash Tables

Open addressingStore objects in each table entry

Chaining (bucket hashing)

Store lists of objects in each table entry

Page 8: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Open Addressing Hashing

ApproachHash table contains objects

Probe examine table entry

Collision

Move K entries past current location

Wrap around table if necessary

Find location for X

1. Examine entry at A[ key(X) ]

2. If entry = X, found

3. If entry = empty, X not in hash table

4. Else increment location by K, repeat

Page 9: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Open Addressing Hashing

ApproachLinear probing

K = 1

May form clusters of contiguous entries

Deletions

Find location for X

If X inside cluster, leave non-empty marker

Insertion

Find location for X

Insert if X not in hash table

Can insert X at first non-empty marker

Page 10: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Open Addressing Example

Hash codesH(A) = 6 H(C) = 6

H(B) = 7 H(D) = 7

Hash tableSize = 8 elements

= empty entry

* = non-empty marker

Linear probingCollision move 1 entry past current location

12345678

Page 11: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Open Addressing Example

OperationsInsert A, Insert B, Insert C, Insert D

12345678

A

12345678

AB

12345678

ABC

12345678

DABC

Page 12: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Open Addressing Example

OperationsFind A, Find B, Find C, Find D

12345678

12345678

12345678

12345678

DABC

DABC

DABC

DABC

Page 13: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Open Addressing Example

OperationsDelete A, Delete C, Find D, Insert C

12345678

12345678

12345678

12345678

DCB*

D*BC

D*B*

D*B*

Page 14: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Efficiency of Open Hashing

Load factor = entries / table size

Hashing is efficient for load factor < 90%

Page 15: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Chaining (Bucket Hashing)

ApproachHash table contains lists of objects

Find location for X

Find hash code key for X

Examine list at table entry A[ key ]

Collision

Multiple entries in list for entry

Page 16: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Chaining Example

Hash codesH(A) = 6 H(C) = 6

H(B) = 7 H(D) = 7

Hash tableSize = 8 elements

= empty entry

12345678

Page 17: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Chaining Example

OperationsInsert A, Insert B, Insert C

12345678

A

12345678

A

B

12345678

C

B

A

Page 18: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Chaining Example

OperationsFind B, Find A

12345678

C

B

A

12345678

C

B

A

Page 19: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Efficiency of Chaining

Load factor = entries / table size

Average caseEvenly scattered entries

Operations = O( load factor )

Worse case Entries mostly have same hash value

Operations = O( entries )

Page 20: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Hashing in Java

CollectionshashMap & hashSet implement hashing

ObjectsBuilt-in support for hashing

boolean equals(object o)

int hashCode()

Can override with own definitions

Must be careful to support Java contract

Page 21: Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.

Java Contract

hashCode() Must return same value for object in each execution, provided no information used in equals comparisons on the object is modified

equals()if a.equals(b), then a.hashCode() must be the same as b.hashCode()

if a.hashCode() != b.hashCode(), then !a.equals(b)

a.hashCode() == b.hashCode()Does not imply a.equals(b)

Though Java libraries will be more efficient if it is true