Top Banner
12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design. Implementations of sets and maps using hash tables. © 2001, D.A. Watt and D.F. Brown
61

12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

Jan 14, 2016

Download

Documents

Gabriel McCoy
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-1

12Hash-Table Data Structures

• Hash-table principles.

• Closed-bucket and open-bucket hash tables.

• Searching, insertion, deletion.

• Hash-table design.

• Implementations of sets and maps using hash tables.

© 2001, D.A. Watt and D.F. Brown

Page 2: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-2

Hash-table principles (1)

• If a map’s keys are small integers, we can represent the map by a key-indexed array. Search, insertion, and deletion then have time complexity O(1).

• Can we approach this performance with keys of other types? Yes!

• Hashing: translate each key to a small integer, and use that integer to index an array.

• A hash table is an array of m buckets, together with a hash function hash(k) that translates each key k to a bucket index (in the range 0…m–1).

Page 3: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-3

Hash-table principles (2)

• Illustration:

key valuek1 v1

k2 v2

k3 v3

k4 v4

kn vn

collision012345

m–1m–2

hash table(array of buckets)

hashing (translating keys to bucket indices)

Page 4: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-4

Hash-table principles (3)

• Each key k has a home bucket in the hash table, namely the bucket with index hash(k).

• To insert a new entry with key k into the hash table, assign that entry to k’s home bucket.

• To search for an entry with key k in the hash table, look in k’s home bucket.

• To delete an entry with key k from the hash table, look in k’s home bucket.

Page 5: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-5

Hash-table principles (4)

• The hash function must be consistent:

k1 = k2 implies hash(k1) = hash(k2).

• In general, the hash function is many-to-one.

• Therefore different keys may share the same home bucket:

k1 k2 but hash(k1) = hash(k2).

This is called a collision.

• Always prefer a hash function that makes collisions relatively infrequent.

Page 6: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-6

Example 1: a hash function for words

• Suppose that the keys are English words.

• Possible hash function:

m = 26hash(w) = (initial letter of w) – ‘A’

• All words with initial letter ‘A’ share bucket 0;…all words with initial letter ‘Z’ share bucket 25.

• This is a convenient choice for illustrative purposes.

• This is a poor choice for practical purposes: collisions are likely to be frequent in some buckets.

Page 7: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-7

Hashing in Java (1)

• Instance method in class Object:

public int hashCode ();// Translate this object to an integer, such that x.equals(y) // implies x.hashcode() == y.hashcode().

• Note that hashCode is consistent. We can use it to implement a hash function for a hash table with m buckets:

int hash (Object k) {return Math.abs(k.hashCode()) % m;

}

Math.abs returns a nonnegative integer.

Modulo-m arithmetic then gives an integer in the range 0…m–1.

Page 8: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-8

Hashing in Java (2)

• Each subclass of Object should override hashCode.

• Examples:

Class of k Result of k.hashCode()

String weighted sum of characters of k

Integer integer value of k

Date (high 32 bits of k) exclusive-or (low 32 bits of k), where k is expressed in milliseconds since 1970-01-01

Page 9: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-9

Closed- vs open-bucket hash tables

• Closed-bucket hash table: Each bucket may be occupied by several entries.

Buckets are completely separate.

• Open-bucket hash table: Each bucket may be occupied by at most one entry.

Whenever there is a collision, displace the new entry to another bucket.

Page 10: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-10

Closed-bucket hash tables (1)

• Closed-bucket hash table (CBHT): Each bucket may be occupied by several entries.

Buckets are completely separate.

• Simplest implementation: each bucket is an SLL.

• In the following illustrations, keys are names of chemical elements. Assume:

m = 26hash(e) = (initial letter of e) – ‘A’

Page 11: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-11

Closed-bucket hash tables (2)

• Illustration (with no collisions):

element numberF 9

Ne 10Cl 17Ar 18Br 35Kr 36I 53

Xe 54

is represented by

Xe 54

F 9

Ne 10

Kr 36

I 53

Ar 180123456789

101112

232425

Cl 17

Br 35

13

Page 12: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-12

Closed-bucket hash tables (3)

• Illustration (with collisions):element number

H 1He 2Li 3Be 4Na 11Mg 12K 19Ca 20Rb 37Sr 38Cs 55Ba 56

is represented by

012

789

101112

25

13

Sr 38

K 19

Na 11

Mg12

Li 3

Ba 56

Ca 20

Be 4

H 1He 2

Rb 37

Cs 55

1718

Page 13: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-13

Closed-bucket hash tables (4)

• Java class implementing CBHTs:

public class CBHT {

private BucketNode[] buckets;

public CBHT (int m) {buckets = new BucketNode[m];

}

… // CBHT methods (see below)

private int hash (Object key) {return Math.abs(key.hashCode())

% buckets.length;}

Page 14: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-14

Closed-bucket hash tables (5)

• Java class (continued):

//////// Inner class for CBHT nodes ////////

private static class BucketNode {

private Object key, value;private BucketNode succ;

private BucketNode (Object key, Object val, BucketNode succ) {

…}

}}

Page 15: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-15

CBHT search (1)

• CBHT search algorithm:

To find which if any node of a CBHT contains an entry whose key is equal to target-key:

1. Set b to hash(target-key).2. Find which if any node of the SLL of bucket b contains an entry

whose key is equal to target-key, and terminate with that node as

answer.

Page 16: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-16

CBHT search (2)

• Implementation (in class CBHT):

public BucketNode search (Object targetKey) {int b = hash(targetKey);for (BucketNode curr = buckets[b];

curr != null; curr = curr.succ) {if (targetKey.equals(curr.key))

return curr;}return null;

}

Page 17: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-17

CBHT insertion (1)

• CBHT insertion algorithm:

To insert the entry (key, val) into a CBHT:

1. Set b to hash(key).2. Insert the entry (key, val) into the SLL of bucket b, replacing any

existing entry whose key is key.3. Terminate.

Page 18: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-18

CBHT insertion (2)

• Implementation (in class CBHT):

public void insert (Object key,Object val) {

int b = hash(key);for (BucketNode curr = buckets[b];

curr != null; curr = curr.succ) {if (key.equals(curr.key)) {

curr.value = val; return;}

}buckets[b] =

new BucketNode(key, val, buckets[b]);}

Page 19: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-19

CBHT deletion (1)

• CBHT deletion algorithm:

To delete the entry (if any) whose key is equal to key from a CBHT:

1. Set b to hash(key).2. Delete the entry (if any) whose key is equal to key from the SLL of

bucket b.3. Terminate.

Page 20: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-20

CBHT deletion (2)

• Implementation (in class CBHT):

public void delete (Object key) {int b = hash(key);for (BucketNode pred = null,

curr = buckets[b];curr != null;pred = curr, curr = curr.succ) {

if (key.equals(curr.key)) {if (pred == null)

buckets[b] = curr.succ;else pred.succ = curr.succ;return;

}}

}

Page 21: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-21

CBHTs: analysis

• Analysis of the CBHT search/insertion/deletion algorithms (counting comparisons):

Let the number of entries be n.

• In the best case, no bucket contains more than (say) 2 entries:

Max. no. of comparisons = 2

Best-case time complexity is O(1).

• In the worst case, one bucket contains all n entries:

Max. no. of comparisons = n

Worst-case time complexity is O(n).

Page 22: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-22

CBHTs: design

• CBHT design consists of: choosing the number of buckets m

choosing the hash function hash.

• Design aims: collisions are infrequent

entries are distributed evenly among the buckets, such that few buckets contain more than about 2 entries.

Page 23: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-23

CBHTs: choosing the no. of buckets

• The load factor of a hash table is the average number of entries per bucket, n/m.

• If n is (roughly) predictable, choose m such that the load factor is likely to be between 0.5 and 0.75. A low load factor wastes space.

A high load factor tends to cause some buckets to have many entries.

• Choose m to be a prime number. Typically the hash function performs modulo-m arithmetic. If m is

prime, the entries are more likely to be distributed evenly over the buckets, regardless of any pattern in the keys.

Page 24: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-24

CBHTs: choosing the hash function

• The hash function should be efficient (performing few arithmetic operations).

• The hash function should distribute the entries evenly among the buckets, regardless of any patterns in the keys.

• Possible trade-off: Speed up the hash function by using only part of the key.

But beware of any patterns in that part of the key.

Page 25: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-25

Example 2: hash table for English words (1)

• Suppose that a hash table will contain about 1000 common English words.

• Known patterns in the keys: Letters vary in frequency:

• A, E, I, N, S, T are common

• Q, X, Z are uncommon.

Word lengths vary in frequency:

• word lengths 4–8 are common

• other word lengths are less common.

Page 26: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-26

Example 2 (2)

• hash(w) can depend on any of w’s letters and/or length.

• Consider m = 20, hash(w) = length of w – 1.

– Far too few buckets. Load factor = 1000/20 = 50.

– Very uneven distribution.

• Consider m = 26, hash(w) = initial letter of w – ‘A’.

– Far too few buckets.

– Very uneven distribution.

Page 27: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-27

Example 2 (3)

• Consider m = 520, hash(w) = 26 (length of w – 1) + (initial letter of w – ‘A’).

– Too few buckets. Load factor = 1000/520 1.9.

– Very uneven distribution. Since few words have length 0–2, buckets 0–51 will be sparsely populated. Since initial letter Z is uncommon, buckets 25, 51, 77, 103, … will be sparsely populated. And so on.

• Consider m = 1499, hash(w) = (weighted sum of letters of w) modulo m

i.e., (c1 1st letter of w + c2 2nd letter of w + …) modulo m

+ Good number of buckets. Load factor 0.67.

+ Reasonably even distribution.

Page 28: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-28

Open-bucket hash tables (1)

• Open-bucket hash table (OBHT): Each bucket may be occupied by at most one entry.

Whenever there is a collision, displace the new entry to another bucket.

• Each bucket has three possible states: never-occupied (has never contained an entry)

occupied (currently contains an entry)

formerly-occupied (previously contained an entry, which has been deleted and not yet replaced).

Page 29: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-29

Open-bucket hash tables (2)

• In the following illustrations, keys are names of chemical elements. Assume:

m = 26hash(e) = (initial letter of e) – ‘A’

• On a collision, insert the new entry in the next unoccupied bucket (treating the array as cyclic).

Page 30: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-30

element numberF 9

Ne 10Cl 17Ar 18Br 35Kr 36I 53

Xe 54

Open-bucket hash tables (3)

• Illustration (with no collisions):

is represented by

never-occupied

occupied

Ne 10

Kr 36

I 53

Ar 180123456789

101112

F 9

Br 35

13

Cl 17

Xe 54232425

14

22

Page 31: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-31

element numberH 1He 2Li 3Be 4Na 11Mg 12K 19Ca 20Rb 37Sr 38Cs 55Ba 56

Open-bucket hash tables (4)

• Illustration (with collisions):

is represented by

cluster

cluster

Ba 56

Na 11

K 19

He 2

0123456789

101112

Be 4

13

Ca 20Cs 55

H 1

Li 3Mg 12

Rb 371718 Sr 38

25

Page 32: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-32

Open-bucket hash tables (5)

• Java class implementing OHBTs:

public class OBHT {

private BucketEntry[] buckets;

public OBHT (int m) {buckets = new BucketEntry[m];

}

private int hash (Object k) {return Math.abs(k.hashCode())

% buckets.length;}

OBHT methods

Page 33: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-33

Open-bucket hash tables (6)

• Java class (continued):

//////// Inner class for OBHT entries ////////

private static class BucketEntry {

private Object key, value;

private BucketEntry (Object k, Object v) {this.key = k; this.value = v;

}

private static final BucketEntry FORMER =new BucketEntry(null, null);

}}

Page 34: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-34

Example 3: populating an OBHT

element numberH 1He 2Li 3Be 4Na 11Mg 12K 19Ca 20Rb 37Sr 38Cs 55Ba 56

• Animation: 0123456789

10111213

1718

25

Ba 56

Na 11

K 19

He 2

Be 4Ca 20Cs 55

H 1

Li 3Mg 12

Rb 37Sr 38

Page 35: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-35

OBHT search (1)

• OBHT search algorithm:

To find which if any bucket of an OBHT is occupied by an entry whose key is equal to target-key:

1. Set b to hash(target-key).2. Repeat:

2.1. If bucket b is never-occupied:2.1.1. Terminate with answer none.

2.2. If bucket b is occupied by an entry whose key is equal to target-key:2.2.1. Terminate with answer b.

2.3. If bucket b is formerly-occupied, or is occupied by an entry

whose key is not equal to target-key:2.3.1. Increment b modulo m.

Page 36: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-36

OBHT search (2)

• Illustrations:

Searching for Mg:

Searching for Ba:

Searching for He:

Searching for Ra:

Ba 56

Na 11

K 19

He 2

0123456789

101112

Be 4

13

Ca 20Cs 55

H 1

Li 3Mg 12

Rb 371718 Sr 38

25

19

Page 37: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-37

OBHT search (3)

• Implementation (in class OBHT):

public int search (Object targetKey) {int b = hash(targetKey);for (;;) {

BucketEntry old = buckets[b];if (old == null)

return NONE;else if (old != BucketEntry.FORMER

&& targetKey.equals(old.key))return b;

elseb = (b + 1) % buckets.length;

}}

Page 38: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-38

OBHT insertion (1)

• OBHT insertion algorithm:

To insert the entry (key, val) into an OBHT:

1. Set b to hash(key).2. Repeat:2.1. If bucket b is never-occupied:2.1.1. If bucket b is the last never-occupied bucket, treat the OBHT as full.2.1.2. Make bucket b occupied by (key, val).2.1.3. Terminate.2.2. If bucket b is formerly-occupied, or is occupied by an entry whose key is equal to key:2.2.1. Make bucket b occupied by (key, val).2.2.2. Terminate.2.3. …

Page 39: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-39

OBHT insertion (2)

• OBHT insertion algorithm (continued):

2.3. If bucket b is occupied by an entry whose key is not equal to

key:2.3.1. Increment b modulo m.

Page 40: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-40

• Illustrations:

OBHT insertion (3)

Inserting (Fr, 87):

Inserting (B, 5):

Ba 56

Na 11

K 19

He 2

0123456789

101112

Fr 87

Be 4

13

Ca 20Cs 55

B 5H 1

Li 3Mg 12

Rb 371718 Sr 38

25

Ba 56

Na 11

K 19

He 2

0123456789

101112

Fr 87

Be 4

13

Ca 20Cs 55

H 1

Li 3Mg 12

Rb 371718 Sr 38

25

Ba 56

Na 11

K 19

He 2

0123456789

101112

Be 4

13

Ca 20Cs 55

H 1

Li 3Mg 12

Rb 371718 Sr 38

25

Page 41: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-41

OBHT insertion (4)

• Implementation (in class OBHT):

private int load = 0; // no. of occupied or formerly-// occupied buckets in this

OBHT

public void insert (Object key, Object val) {BucketEntry newest =

new BucketEntry(key, val);int b = hash(key);for (;;) {

BucketEntry old = buckets[b];if (old == null) {

if (++load == buckets.length) …;buckets[b] = newest;return;

Page 42: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-42

OBHT insertion (5)

• Implementation (continued):

} else if (old == BucketEntry.FORMER|| key.equals(old.key)) {

buckets[b] = newest;return;

} elseb = (b + 1) % buckets.length;

}}

Page 43: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-43

OBHT deletion (1)

• OBHT deletion algorithm:

To delete the entry (if any) whose key is equal to key from an OBHT:

1. Set b to hash(key).2. Repeat:

2.1. If bucket b is never-occupied:2.1.1. Terminate.

2.2. If bucket b is occupied by an entry whose key is equal to key:

2.2.1. Make bucket b formerly-occupied.2.2.2. Terminate.

2.3. If bucket b is formerly-occupied, or is occupied by an entry

whose key is not equal to key:2.3.1. Increment b modulo m.

Page 44: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-44

OBHT deletion (2)

• Illustrations:

Deleting Ca:

Deleting Ba:

formerly-occupied

Na 11

K 19

He 2

0123456789

101112

Be 4

13

Cs 55

H 1

Li 3Mg 12

Rb 371718 Sr 38

25

Ba 56

Na 11

K 19

He 2

0123456789

101112

Be 4

13

Cs 55

H 1

Li 3Mg 12

Rb 371718 Sr 38

25

Ba 56

Na 11

K 19

He 2

0123456789

101112

Be 4

13

Ca 20Cs 55

H 1

Li 3Mg 12

Rb 371718 Sr 38

25

Page 45: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-45

OBHT deletion (3)

• Implementation (in class OBHT):

public void delete (Object key) {int b = hash(key);for (;;) {

BucketEntry old = buckets[b];if (old == null)

return;else if (old != BucketEntry.FORMER

&& key.equals(old.key)) {buckets[b] = BucketEntry.FORMER;return;} elseb = (b + 1) % buckets.length;

}}

Page 46: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-46

OBHTs: analysis

• Analysis of OBHT search/insertion/deletion algorithm (counting comparisons):

Let the number of entries be n.

• In the best case, no cluster contains more than (say) 4 entries:

Max. no. of comparisons = 4

Best-case time complexity is O(1).

• In the worst case, one cluster contains all n entries:

Max. no. of comparisons = n

Worst-case time complexity is O(n).

Page 47: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-47

OBHTs: design

• OBHT design consists of: choosing the number of buckets m

choosing the hash function hash

choosing the step length s (explained later).

• Design aims: collisions are infrequent

entries are distributed evenly over the hash table, such that few clusters contain more than about 4 entries.

Page 48: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-48

OBHTs: choosing the no. of buckets

• Recall: The load factor of a hash table is the average number of entries per bucket, n/m.

• If n is (roughly) predictable, choose m such that the load factor is likely to be between 0.5 and 0.75. A low load factor wastes space.

A high load factor tends to result in long clusters.

• Choose m to be a prime number.

Page 49: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-49

OBHTs: choosing the hash function

• The hash function should be efficient.

• The hash function should distribute the entries evenly over the buckets, with few long clusters. In an OHBT with s = 1, a cluster will form when several entries

fall into the same or adjacent buckets.

Page 50: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-50

OBHTs: choosing the step length

• To resolve a collision, the search/insertion/deletion algorithm increments the bucket index and tries again.

• The step length, s, is the amount by which the bucket index is incremented. So far we have assumed s = 1.

• Alternatively, we can use a fixed s > 1.

• Choose m to be prime, and choose s to be in the range 2…m–1. This ensures that s and m have no common factors.

Otherwise, if (say) m = 10 and s = 2, a typical search path would be 6–8–0–2–4, never reaching the remaining buckets!

Page 51: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-51

OBHTs: double hashing (1)

• Better still, let different keys have different step lengths. (But each key always has the same step length.)

• Double hashing: To search/insert/delete key k, compute s from k, using a second hash function s = step(k).

• In the following illustration, keys are names of chemical elements. Assume:

m = 23hash(e) = (initial letter of e – ‘A’) modulo mstep(e) = 1, if e has a single letter, otherwise

2 + (second letter of e – ‘a’) modulo 21

Page 52: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-52

OBHTs: double hashing (2)

• Illustrations:

Inserting (Ba, 56): Inserting

(Cs, 55):

hash(Ba) = 1step(Ba) = 2 hash(Cs) = 2

step(Cs) = 20

He 2

K 19

0123456789

101112

Be 4

13

Ca 20Ba 56

H 1

Li 3Mg 12

Rb 371718 Sr 38

22 Cs 55

He 2

K 19

0123456789

101112

Be 4

13

Ca 20Ba 56

H 1

Li 3Mg 12

Rb 371718 Sr 38

22

He 2

K 19

0123456789

101112

Be 4

13

Ca 20

H 1

Li 3Mg 12

Rb 371718 Sr 38

22

Page 53: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-53

OBHTs: double hashing (3)

• OBHT insertion algorithm with double hashing:

To insert the entry (key, val) into an OBHT:

1. Set b to hash(key), and set s to step(key).2. Repeat:

2.1. If bucket b is never-occupied:…

2.2. If bucket b is formerly-occupied, or is occupied by an entry

whose key is equal to key:…

2.3. If bucket b is occupied by an entry whose key is not equal to

key:2.3.1. Increment b by s, modulo m.

Page 54: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-54

Hash tables in practice (1)

• The following trials compare the performance of three hash tables, each with m buckets: a CBHT

an OHBT with step length s = 1

an OHBT with step length s determined by double hashing.

• The hash function distributes keys uniformly among the buckets (i.e., the probability that a key is mapped to any particular bucket is 1/m).

• In each trial, all three hash tables are loaded with the same set of n randomly-generated keys.

Page 55: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-55

Hash tables in practice (2)

• Trial with m = 47 and n = 23 (load factor 0.5):

OBHT (s = 1)

OBHT(double hashing)

CBHT

long cluster

shorter cluster

Most buckets have 0–1 entries.

Page 56: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-56

Hash tables in practice (3)

• Trial with m = 47 and n = 36 (load factor 0.75):

OBHT(double hashing)

CBHT

very long cluster

very long cluster

OBHT (s = 1)

shorter cluster

Most buckets have 0–2 entries.

shorter cluster

Page 57: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-57

Example 4: hash table for student records (1)

• Consider a hypothetical university’s student records. About 1500 new students register each year, and most students stay for 4 years.

• Each student has a unique id, of the form yydddd (where yy are the last two digits of the year of first registration, and where dddd is a serial number).

• Suppose that the student records will be held in a hash table.

• hash(id) can depend on any or all of id’s digits.

Page 58: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-58

Example 4 (2)

• Consider m = 100, hash(id) = first two digits of id.

– Far too few buckets. Load factor 6000/100 60.

– Very uneven distribution. E.g., in academic year 2001–02, most ids start with 98, 99, 00, or 01.

• Consider m = 10000, hash(id) = last four digits of id.

+ Good number of buckets. Load factor 6000/10000 0.6.

– Uneven distribution. Most ids end with 0000…1500.

• Consider m = 9997, hash(id) = id modulo m.

+ Good number of buckets. Load factor 6000/9997 0.6.

– Even distribution (since m is prime).

Page 59: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-59

Example 4 (3)

• Consider OBHT with s = 1.

– Four clusters of about 1500 entries.

• Consider OBHT with s = 2000.

+ Should avoid clustering.

• Consider OBHT with double hashing.

+ Should avoid clustering.

Page 60: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-60

Implementations of sets using CBHTs

• Similar to implementation of maps (with set members instead of map entries).

• Summary of algorithms:

Operation Algorithm Time complexity

contains CBHT search O(1) bestO(n) worst

add CBHT insertion O(1) bestO(n) worst

remove CBHT deletion O(1) bestO(n) worst

Page 61: 12-1 12 Hash-Table Data Structures Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.

12-61

Implementations of maps using CBHTs

• Summary of algorithms:

Operation Algorithm Time complexity

get CBHT search O(1) bestO(n) worst

remove CBHT deletion O(1) bestO(n) worst

put CBHT insertion O(1) bestO(n) worst

putAll merge on corresponding buckets of both CBHTs

O(m) bestO(n1 n2) worst

equals equality test on corresponding buckets of both CBHTs

O(m) bestO(n1 n2) worst