Top Banner
Hash Tables Briana B. Morrison Adapted from William Collins
126

Hash Tables Briana B. Morrison Adapted from William Collins.

Mar 31, 2015

Download

Documents

Kelly Sibbett
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hash Tables Briana B. Morrison Adapted from William Collins.

Hash Tables

Briana B. Morrison

Adapted from William Collins

Page 2: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 2

averageTimeS(n), THE AVERAGE TIME

FOR A SUCCESSFUL SEARCH

averageTimeU(n), … UNSUCCESSFUL …

worstTimeS(n)

worstTimeU(n)

Page 3: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 3

LET’S START WITH A REVIEW OFEARLIER SEARCH TECHNIQUES:

Page 4: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 4

Sequential Search

Given a vector of integers:

v = {12, 15, 18, 3, 76, 9, 14, 33, 51, 44}

What is the best case for sequential search? O(1) when value is the first element

What is the worst case? O(n) when value is last element, or value is not in the list

What is the average case? O(1/2 * n) which is O(n)

Page 5: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 5

SEQUENTIAL SEARCH IN STL // Postcondition: if there is an item in the range of iterators // from first (inclusive) through last // (exclusive) that is equal to value, the // iterator returned is the first iterator i in that // range such that *i = value. Otherwise, // last is returned. The worstTime(n) is O(n). template <typename InputIterator, typename T> InputIterator find(InputIterator first, InputIterator last, const T& value) { while (first != last && *first != value) ++first; return first; }

Page 6: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 6

THE worstTimeU(n) IS LINEAR IN n.

DITTO FOR worstTimeS(n),averageTimeU(n), AND averageTimeS(n).

Page 7: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 7

Binary Search

Given a vector of integers:v = {3, 9, 12, 14, 15, 18, 33, 44, 51, 76}

What is the best case for binary search? O(1) when element is the middle element

What is the worst case? O(log n) when element is first, last, or not in list

What is the average case? O(log n)

Page 8: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 8

BINARY SEARCH OF A SORTED

CONTAINER: template <typename ForwardIterator, typename T> inline bool binary_search (ForwardIterator first, ForwardIterator last, const T& value) example: if (binary_search (vector.begin(), vector.end(), value))

Page 9: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 9

Do you remember how binary search works? Distance len = last - first; Distance half; RandomAccessIterator middle; while (len > 0) { half = len / 2; middle = first + half; if (*middle < value) { first = middle + 1; len = len - half - 1; } else len = half; } return first; }

Page 10: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 10

THE worstTimeU(n) IS LOGARITHMIC INn.

DITTO FOR worstTimeS(n),averageTimeU(n), AND averageTimeS(n).

Page 11: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 11

NOW LET’S FOCUS ON AN UNUSUALBUT VERY EFFICIENT SEARCHTECHNIQUE:

HASHING

Page 12: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 12

THE CLASS IN WHICH HASHING IS

IMPLEMENTED IS THE hash_map

CLASS. THIS IS NOT YET IN THE

STANDARD TEMPLATE LIBRARY.

Page 13: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 13

TO A USER, THE hash_map CLASS

IS SIMILAR TO THE map CLASS,

EXCEPT hash_map HAS ONLY A FEW

METHODS, SUCH AS insert, erase, AND

find. AND THE TIMING ESTIMATES

FOR THOSE METHODS ARE LOWERTHAN IN THE map CLASS.

Page 14: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 14

RECALL THAT EACH VALUE (THATIS, ITEM) IN A MAP IS A PAIR WHOSE

FIRST COMPONENT IS OF TYPE Key

AND WHOSE SECOND COMPONENT IS

OF TYPE T. THE KEYS ARE UNIQUE,THAT IS, NO TWO DISTINCT VALUESHAVE THE SAME KEY.

Page 15: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 15

HERE ARE THE METHOD

INTERFACES FOR THE hash_map

CLASS:

Page 16: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 16

1. // Postcondition: this hash_map is empty. hash_map( );

2. // Postcondition: the number of items in this hash_map// has been returned.

int size( );

Page 17: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 17

3. // Postcondition: If an item with x's key had already been// inserted into this hash_map, the pair// returned consists of an iterator positioned// at the previously inserted item, and false. // Otherwise, the pair returned consists of

// an iterator positioned at the newly inserted// item, and true. Timing estimates are// discussed later.

pair<iterator, bool> insert ( const value_type<const key_type, T>& x);

Page 18: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 18

4. // Postcondition: if this hash_map already contains a value// whose key part is key, a reference to that// value's second component has been// returned. Otherwise, a new value, <key,// T( )>, is inserted into this hash_map. Timing// estimates are discussed later.

T& operator[ ] (const key_type& key);

Page 19: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 19

5. // Postcondition: If this hash_map contains a value whose// first component equals key, an iterator// positioned at that value has been returned.// Otherwise, an iterator at the same

// position as end() has been returned. // Timing estimates are discussed later. iterator find (const key_type& key);

6. // Precondition: itr is positioned at value in this hash_map. // Postcondition: the value that itr is positioned at has been // deleted from this hash_map. Timing // estimates are discussed later in this chapter. void erase (iterator itr);

Page 20: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 20

7. // Postcondition: an iterator positioned at the beginning // of this hash_map has been returned. // Timing estimates are discussed later. iterator begin( );

8. // Postcondition: an iterator has been returned that can be// used in comparisons to terminate iterating// through this hash_map.

iterator end( );

9. // Postcondition: the space for this hash_map object has// been deallocated.~hash_map( );

Page 21: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 21

Map vs. Hashmap

What are the differences between a map and a hashmap? Interface Efficiency Applications Implementation

Page 22: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 22

WE’LL STUDY THE TIME ESTIMATES

AFTER WE DEFINE THE METHODS.

BUT BASICALLY, FOR find, insert, AND

erase,

averageTime(n) IS CONSTANT!

Page 23: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 23

FIELDS IN THE hash_map CLASS

Page 24: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 24

CONTIGUOUS

array? vector? deque? heap?

LINKED

Linked? list? map?

BUT NONE OF THESE WILL GIVE

CONSTANT AVERAGE TIME FOR

SEARCHES, INSERTIONS AND

REMOVALS.

Page 25: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 25

HERE IS THE BASIC IDEA:

buckets // an array of values

count // the number of values in the hash_map

Page 26: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 26

LET’S SEE WHERE THAT LEADS.

SUPPOSE persons IS A HASH MAPTHAT WILL HOLD UP TO 1000VALUES. EACH VALUE CONSISTSOF A UNIQUE 3-DIGIT INTEGER (THEKEY), AND A NAME.

Page 27: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 27

buckets count 0 1 2 . . . 999

Page 28: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 28

Persons [351] = “Prashant”;

persons [108] = “Barrett”;

persons[435] = “Lin”;

WHERE SHOULD WE STORE THEVALUE WHOSE KEY IS 351?

Page 29: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 29

buckets count

0

108 351 435

999

3

108 Barrett

351 Prashant

435 Lin

? ?…

Page 30: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 30

NOW FOR SOMETHING SLIGHTLY

DIFFERENT: SUPPOSE persons IS A

HASH MAP THAT HOLDS UP TO 1000

VALUES. EACH VALUE CONSISTS OF

A 10-DIGIT TELEPHONE NUMBER

(THE KEY), AND A NAME.

Page 31: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 31

persons [9876543210] = “Prashant”;

persons [6103301256] = “Barrett”;

persons [6103309816] = “Lin”;

persons [4153576256] = “Sutey”;

WHERE SHOULD THESE VALUES

BE STORED?

Page 32: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 32

9876543210 6103301256

6103309816 4153576256

To make these values fit into the table, we need to mod by the table size; i.e., key % 1000.

210

OOPS!

816

256

Page 33: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 33

WHEN TWO DIFFERENT KEYS MAP TOTHE SAME INDEX, THAT IS CALLED ACOLLISION.

KEYS THAT MAP TO THE SAME INDEXARE CALLED SYNONYMS.

Page 34: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 34

HASHING:

AN ALGORITHM THAT TRANSFORMSA KEY INTO AN ARRAY INDEX.

Page 35: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 35

THE ALGORITHM HAS TWO PARTS:

1. A HASH FUNCTION: AN EASILYCOMPUTABLE OPERATION ON THE

KEY THAT RETURNS AN unsigned

long, WHICH IS THEN CONVERTED

INTO AN INDEX IN THE ARRAY

buckets;

2. A COLLISION HANDLER.

Page 36: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 36

Hash Codes Suppose we have a table of size N A hash code is:

A number in the range 0 to N-1 We compute the hash code from the key You can think of this as a “default position” when

inserting, or a “position hint” when looking up A hash function is a way of computing a hash code Desire: The set of keys should spread evenly over

the N values When two keys have the same hash code: collision

Page 37: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 37

Hash Functions

A hash function should be quick and easy to compute.

A hash function should achieve an even distribution of the keys that actually occur across the range of indices for both random and non-random data.

Calculation should involve the entire search key.

Page 38: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 38

Examples of Hash Functions Usually involves taking the key, chopping it

up, mix the pieces together in various ways Examples:

Truncation – ignore part of key, use the remaining part as the index

Folding – partition the key into several parts and combine the parts in a convenient way (adding, etc.)

After calculating the index, use modular arithmetic. Divide by the size of the index range, and take the remainder as the result

Page 39: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 39

Example Hash Function

h f(2 2 ) = 2 2 2 2 % 7 = 1

h f(4 ) = 4 4 % 7 = 4

0

1

4

6

23

5

t ab leE n t ry [1 ]

tab leE n t ry [4 ]

Page 40: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 40

Devising Hash Functions Simple functions often produce many collisions

... but complex functions may not be good either! It is often an empirical process

Adding letter values in a string: same hash for strings with same letters in different order

Better approach:size_t hash = 0;for (size_t i = 0; i < s.size(); ++i)

hash = hash * 31 + s[i];

Page 41: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 41

Devising Hash Functions (2) The String hash is good in that:

Every letter affects the value The order of the letters affects the value The values tend to be spread well over the integers

Page 42: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 42

Devising Hash Functions (3)

Guidelines for good hash functions:

Spread values evenly: as if “random”

Cheap to compute

Generally, number of possible values much greater than table size

Page 43: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 43

Hash Code Maps

Memory address: We reinterpret the memory

address of the key object as an integer

Good in general, except for numeric and string keys

Integer cast: We reinterpret the bits of the

key as an integer Suitable for keys of length

less than or equal to the number of bits of the integer type (e.g., char, short, int and float on many machines)

Component sum: We partition the bits of

the key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows)

Suitable for numeric keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double on many machines)

Page 44: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 44

Hash Code Maps (cont.)

Polynomial accumulation: We partition the bits of the key

into a sequence of components of fixed length (e.g., 8, 16 or 32 bits) a0 a1 … an1

We evaluate the polynomial

p(z) a0 a1 z a2 z2 … … an1zn1

at a fixed value z, ignoring overflows

Especially suitable for strings (e.g., the choice z 33 gives at most 6 collisions on a set of 50,000 English words)

Polynomial p(z) can be evaluated in O(n) time using Horner’s rule:

The following polynomials are successively computed, each from the previous one in O(1) time

p0(z) an1

pi (z) ani1 zpi1(z) (i 1, 2, …, n 1)

We have p(z) pn1(z)

Page 45: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 45

HERE IS THE START OF THE

hash_map CLASS:

template<typename Key, typename T, typename HashFunc> class hash_map {

THE THIRD TEMPLATE PARAMETER

IS A FUNCTION CLASS: A CLASS IN

WHICH THE FUNCTION-CALL

OPERATOR, operator( ), IS

OVERLOADED. THIS IS THE HASH

FUNCTION CLASS.

Page 46: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 46

THE HEADING FOR operator( ) IS

unsigned long operator( ) (const key_type& key)

FOR EXAMPLE, WE CAN DEFINE A

SIMPLE HASH FUNCTION CLASS IF

EACH KEY IS AN int:

class hash_func { public: unsigned long operator( ) (const int& key) { return (unsigned long)key; } // overloaded operator( ) } // class hash_func

Page 47: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 47

HERE IS A PROGRAM WITH A

hash_map CLASS IN WHICH EACHVALUE CONSISTS OF A TELEPHONE

EXTENSION AND THE PERSON ATTHAT EXTENSION. THE ABOVE

hash_func IS USED.

Page 48: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 48

int main() { typedef hash_map<int, string, hash_func> hash_class; hash_class extensions; hash_class::iterator itr; extensions [5520] = "Yvonne"; extensions [5415] = "Jim"; extensions [5416] = "Penny"; extensions [5537] = "Chun Wai"; extensions [5273] = "Jim"; for (itr = extensions.begin(); itr != extensions.end(); itr++) cout << (*itr).first << " " << (*itr).second << endl; cout << "The number of items is " << extensions.size() << endl;

Page 49: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 49

if (extensions.find (5537) != extensions.end()) { cout << endl << "At extension " << 5537 << " is " << extensions [5537] << endl; extensions.erase (extensions.find (5537)); } // if for (itr = extensions.begin( ); itr != extensions.end( ); itr++) cout << (*itr).first << " " << (*itr).second << endl; return 0; } // main

Page 50: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 50

HERE IS THE OUTPUT: 5520 Yvonne 5537 Chun Wai 5415 Jim 5416 Penny 5273 Jim The number of items is 5 At extension 5537 is Chun Wai 5520 Yvonne 5415 Jim 5416 Penny 5273 Jim

Page 51: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 51

THERE IS NO OBVIOUS ORDER OFTHE KEYS. IF THE CONTAINER MUST

ALWAYS BE IN ORDER, USE A map

INSTEAD OF A hash_map.

Page 52: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 52

HERE IS ANOTHER hash_func CLASS,ONE IN WHICH THE KEY IS A STRINGOF UP TO 20 CHARACTERS.BASICALLY, WE ADD UP THE ASCIIVALUES OF THE KEY’S CHARACTERS.TO FURTHER SPREAD OUT THERESULT, PARTIAL TOTALS ARE MUL-TIPLIED BY 13, AND THE FINAL TOTALIS MULTIPLIED BY A BIG PRIME.

Page 53: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 53

class hash_func{ public:

unsigned long operator( ) (const string& key) { const unsigned long BIG_PRIME = 4294967291; unsigned long total = 0;

for (unsigned i = 0; i < key.length(); i++) total += 13 * key [i]; return total * BIG_PRIME; } // operator( )}; // class hash_func

Page 54: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 54

THE hash_func CLASS IS SUPPLIED BY

THE USER / CLIENT PROGRAMMER.

THE hash_map CLASS CONVERTS THE

unsigned long RETURNED BY operator( )

INTO AN ARRAY INDEX BY TAKING

THE REMAINDER % CAPACITY OF

buckets.

Page 55: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 55

EXERCISE: SUPPOSE THE CAPACITY OF buckets IS 203, AND FOR key1, key2, AND key3,

THE unsigned long NUMBERS

RETURNED BY hash_func (const string&

key) ARE 202, 203, AND 204

RESPECTIVELY. AT WHAT

LOCATIONS WOULD THE VALUES

WITH KEYS key1, key2, AND key3 BE

STORED?

Page 56: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 56

AS YOU MIGHT HAVE GUESSED,

HASHING IS INEFFICIENT WHEN

THERE ARE A LOT OF COLLISIONS.

Page 57: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 57

USERS OF THE hash_map CLASS“HOPE” THAT THE KEYS ARE

SCATTERED RANDOMLYTHROUGHOUT THE TABLE. THIS

HOPE IS FORMALLY STATED AS

FOLLOWS:

Page 58: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 58

THE UNIFORM HASHING ASSUMPTION

EACH KEY IS EQUALLY LIKELY TOHASH TO ANY ONE OF THE TABLEADDRESSES, INDEPENDENTLY OFWHERE THE OTHER KEYS HAVEHASHED.

Page 59: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 59

EVEN IF THE UNIFORM HASHINGASSUMPTION HOLDS, THERE MAYSTILL BE COLLISIONS.

Page 60: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 60

Collision Handlers

NOW WE’LL LOOK AT SPECIFIC COLLISION HANDLERS:

Chaining Linear Probing (Open Addressing) Double Hashing Quadratic Hashing

Page 61: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 61

Collision Handling

Collisions occur when different elements are mapped to the same cell

Chaining: let each cell in the table point to a linked list of elements that map there

Chaining is simple, but requires additional memory outside the table

01234 451-229-0004 981-101-0004

025-612-0001

Page 62: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 62

CHAINING (ALSO CALLED CHAINED

HASHING): AT INDEX i IN buckets,

STORE THE LIST OF ALL VALUES

WHOSE KEYS HASH TO i. HERE ARE THE FIELDS FOR CHAINED

HASHING:

Page 63: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 63

list <value_type< const key_type, T> >* buckets; // at each index in the array buckets, // we will store the list of all // items whose keys hashed to that index int count, // number of items in this hash_map length; // number of buckets in this hash_map // these two fields are used to calculate the load to // know when to increase the size of the table hash_func hash; // hash is a function object

Page 64: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 64

Chaining with Separate Lists Example

< b u ck et 0 >

< b u ck et n - 1 >

< b u ck et 2 >

< b u ck et 1 >

. . . .

< B uc k e t 1 > 8 9 ( 1 ) 4 5 ( 2 )

< B uc k e t 0 >

< B uc k e t 3 > 1 4 ( 1 )

< B uc k e t 2 > 3 5 ( 1 )

< B uc k e t 1 0 > 5 4 ( 1 ) 7 6 ( 2 )

< B uc k e t 6 > 9 4 ( 1 )

< B uc k e t 9 >

< B uc k e t 8 >

< B uc k e t 7 >

< B uc k e t 5 >

< B uc k e t 4 >

7 7 ( 1 )

Page 65: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 65

Chaining Picture

Two items hashed to bucket 3

Three items hashed to bucket 4

Page 66: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 66

INSERT VALUES WITH THESE KEYS:

21555516127178626358610330935861033090007178621359717862745121555543586103300451

ASSUME length = 1000. IGNORE 2ND COMPONENT

IN VALUE, IGNORE prev FIELD, USE ‘X’ AT END.

Page 67: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 67

buckets count 0 1

... 358

359... 451 ... 612

6103309000 X 8

X

7178626358 6103309358

7178627451

2155551612 X

6103300451 X

7178621359 2155554358 X X

Page 68: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 68

FOR THE find METHOD,

averageTimeS(n, m) n / 2m iterations.

<= 0.75 / 2

SO averageTimeS(n, m) <= A CONSTANT.

averageTimeS(n, m) IS CONSTANT.

Page 69: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 69

EVEN IF THE UNIFORM HASHING

ASSUMPTION HOLDS, IT IS POSSIBLE

FOR EACH KEY TO HASH TO THE

SAME INDEX. TO SEARCH THE LIST

AT THAT INDEX TAKES LINEAR-IN-n

TIME.

SO worstTimeS(n, m) IS LINEAR IN n.

Page 70: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 70

THE SAME RESULTS, CONSTANT

AVERAGE TIME AND LINEAR WORST

TIME, HOLD FOR insert AND erase.

Page 71: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 71

The next collision handler is Linear Probing (OPEN-ADDRESS HASHING). AT MOST ONE VALUE IS STORED AT

EACH INDEX IN buckets.

Page 72: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 72

HERE IS HOW THE unsigned long

RETURNED BY hash_func IS

CONVERTED INTO AN INDEX: int index = hash_func (key) % length; THIS IS DONE IN THE HASH_MAP CLASS, BECAUSE ONLY THE HASH_MAP CLASS KNOWS THE LENGTH OF THE ARRAY.

Page 73: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 73

WHEN COLLISION OCCURS: SEARCH THE TABLE UNTIL AN

“OPEN” SLOT IN buckets IS FOUND.

THIS IS ALSO KNOWN AS “OFFSET-

OF-1” COLLISION HANDLER.

Page 74: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 74

OFFSET-OF-1 COLLISION HANDLER:

IF buckets [index] ALREADY HAS

ANOTHER ELEMENT, TRY

buckets [index + 1], buckets [index + 2], …,

buckets [length – 1], buckets [0],

buckets [1], …, buckets [index – 1].

Page 75: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 75

Hash Table Using Open Probe Addressing Example7 7

8 9

1 4

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(a)

1

1

1

1

1

In s ert5 4 , 7 7 , 9 4 , 8 9 , 1 4

2

7 7

8 9

4 5

1 4

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(b )

1

1

1

1

1

In s ert4 5

2

7 7

8 9

4 5

1 4

3 5

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(c)

1

1

1

1

1

In s ert3 5

3

2

7 7

8 9

4 5

1 4

3 5

7 6

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(d )

1

1

1

1

1

In s ert7 6

3

7

5 4 5 4 5 45 4

Insert 45

(mod by table size … % 11)

Page 76: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 76

Hash Table Using Open Probe Addressing Example7 7

8 9

1 4

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(a)

1

1

1

1

1

In s ert5 4 , 7 7 , 9 4 , 8 9 , 1 4

2

7 7

8 9

4 5

1 4

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(b )

1

1

1

1

1

In s ert4 5

2

7 7

8 9

4 5

1 4

3 5

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(c)

1

1

1

1

1

In s ert3 5

3

2

7 7

8 9

4 5

1 4

3 5

7 6

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(d )

1

1

1

1

1

In s ert7 6

3

7

5 4 5 4 5 45 4

Insert 35

Page 77: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 77

Hash Table Using Open Probe Addressing Example7 7

8 9

1 4

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(a)

1

1

1

1

1

In s ert5 4 , 7 7 , 9 4 , 8 9 , 1 4

2

7 7

8 9

4 5

1 4

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(b )

1

1

1

1

1

In s ert4 5

2

7 7

8 9

4 5

1 4

3 5

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(c)

1

1

1

1

1

In s ert3 5

3

2

7 7

8 9

4 5

1 4

3 5

7 6

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(d )

1

1

1

1

1

In s ert7 6

3

7

5 4 5 4 5 45 4

Insert 76

Page 78: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 78

Hash Table Using Open Probe Addressing Example7 7

8 9

1 4

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(a)

1

1

1

1

1

In s ert5 4 , 7 7 , 9 4 , 8 9 , 1 4

2

7 7

8 9

4 5

1 4

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(b )

1

1

1

1

1

In s ert4 5

2

7 7

8 9

4 5

1 4

3 5

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(c)

1

1

1

1

1

In s ert3 5

3

2

7 7

8 9

4 5

1 4

3 5

7 6

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(d )

1

1

1

1

1

In s ert7 6

3

7

5 4 5 4 5 45 4

Page 79: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 79

Linear Probing Open addressing: the

colliding item is placed in a different cell of the table

Linear probing handles collisions by placing the colliding item in the next (circularly) available table cell

Each table cell inspected is referred to as a “probe”

Colliding items lump together, causing future collisions to cause a longer sequence of probes

Example: h(x) x mod 13 Insert keys 18, 41, 22,

44, 59, 32, 31, 73, in this order

0 1 2 3 4 5 6 7 8 9 10 11 12

41 18445932223173 0 1 2 3 4 5 6 7 8 9 10 11 12

Page 80: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 80

WE NEED TO KNOW WHEN A SLOT IS FULL

OR OCCUPIED.

HOW?

INSTEAD OF JUST T() STORED IN THE BUCKETS (BECAUSE T() COULD BE A VALID VALUE), THE BUCKET WILL STORE AN INSTANCE OF THE VALUE_TYPE CLASS.

Page 81: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 81

TO INDICATE WHETHER A LOCATION

IS OCCUPIED, THE value_type CLASS

WILL HAVE bool occupied; IN ADDITION TO T key;

Page 82: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 82

key occupied

0

54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55

109 312 % 203 = 109

201 607 % 203 = 201 202

? false

… false

1069 true 460 true 1070 true

312 true

607 true false

Page 83: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 83

Retrieve

What about when we want to retrieve?

Consider the previous example….

Page 84: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 84

Hash Table Using Open Probe Addressing Example7 7

8 9

1 4

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(a)

1

1

1

1

1

In s ert5 4 , 7 7 , 9 4 , 8 9 , 1 4

2

7 7

8 9

4 5

1 4

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(b )

1

1

1

1

1

In s ert4 5

2

7 7

8 9

4 5

1 4

3 5

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(c)

1

1

1

1

1

In s ert3 5

3

2

7 7

8 9

4 5

1 4

3 5

7 6

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(d )

1

1

1

1

1

In s ert7 6

3

7

5 4 5 4 5 45 4

Find the value 35. (% 11)

Now find the value 76.

Now find the value 33.

Page 85: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 85

Hash Table Using Open Probe Addressing Example7 7

8 9

1 4

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(a)

1

1

1

1

1

In s ert5 4 , 7 7 , 9 4 , 8 9 , 1 4

2

7 7

8 9

4 5

1 4

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(b )

1

1

1

1

1

In s ert4 5

2

7 7

8 9

4 5

1 4

3 5

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(c)

1

1

1

1

1

In s ert3 5

3

2

7 7

8 9

4 5

1 4

3 5

7 6

9 4

0

1

2

3

4

5

6

7

8

9

1 0

(d )

1

1

1

1

1

In s ert7 6

3

7

5 4 5 4 5 45 4

Now delete 35. (% 11)

Now find the value 76.

Now find the value 33.

Page 86: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 86

Linear Probing Probe by incrementing the index If “fall off end”, wrap around to the beginning

Take care not to cycle forever!

1. Compute index as hash_fcn() % table.size()

2. if table[index] == NULL, item is not in the table

3. if table[index] matches item, found item (done)

4. Increment index circularly and go to 2 Why must we probe repeatedly?

hashCode may produce collisions remainder by table.size may produce collisions

Page 87: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 87

Search Termination

Ways to obtain proper termination Stop when you come back to your starting point Stop after probing N slots, where N is table size Stop when you reach the bottom the second time Ensure table never full

Reallocate when occupancy exceeds threshold

Page 88: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 88

IN THE SECOND EXAMPLE, SUPPOSE itr IS POSITIONED AT INDEX 54 AND THE MESSAGE IS my_map.erase (itr);

Page 89: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 89

key occupied

0

54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55

109 312 % 203 = 109

201 607 % 203 = 201 202

? false

… false

1069 true 460 true 1070 true

312 true

607 true false

Erase value 1069.

false

Page 90: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 90

key occupied

0

54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55

109 312 % 203 = 109

201 607 % 203 = 201 202

? false

… false

1069 false 460 true 1070 true

312 true

607 true false

Now search for 460.

Page 91: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 91

NOW A SEARCH 460 FOR WOULD BE

UNSUCCESSFUL BECAUSE 460

INITIALLY HASHES TO 54, AN

UNOCCUPIED LOCATION.

Page 92: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 92

SOLUTION:bool marked_for_removal;

THE CONSTRUCTOR FOR VALUE_TYPE SETS EACH bucket’s marked_for_removal FIELD TO false.insert SETS marked_for_removal TO false; erase SETS marked_for_removal TO true.SO AFTER THE INSERTIONS:

Page 93: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 93

marked_for_ key occupied removal

0

54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55

109 312 % 203 = 109

201 607 % 203 = 201 202

? false

… false

1069 true 460 true 1070 true

312 true

607 true false

false

false

falsefalsefalse

false

falsefalse

Page 94: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 94

AFTER DELETING THE VALUE WITH

KEY 1069:

Page 95: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 95

marked_for_ key occupied removal

0

54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55

109 312 % 203 = 109

201 607 % 203 = 201 202

? false

… false

1069 true 460 true 1070 true

312 true

607 true false

false

false

truefalsefalse

false

falsefalse

Page 96: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 96

FOR find, AN UNSUCCESSFUL

SEARCH CANNOT STOP UNTIL buckets

[index].marked_for_removal = false.

Page 97: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 97

CLUSTER: A SEQUENCE OF NON-EMPTY LOCATIONS

KEYS THAT HASH TO 54 FOLLOW THE SAME COLLISION-PATH AS KEYS THAT HASH TO 55, …

Page 98: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 98

marked_for_ key occupied removal

0

54 1069 % 203 = 54 55 460 % 203 = 54 56 1070 % 203 = 55

109 312 % 203 = 109

201 607 % 203 = 201 202

? false

… false

1069 true 460 true 1070 true

312 true

607 true false

false

false

falsefalsefalse

false

falsefalse

Page 99: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 99

PRIMARY CLUSTERING: THE

PHENOMENON THAT OCCURS WHEN

THE COLLISION HANDLER ALLOWS

THE GROWTH OF CLUSTERS TO

ACCUMULATE.

THIS WILL OCCUR WITH OFFSET-OF-

1 OR ANY CONSTANT OFFSET.

Page 100: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 100

SOLUTION 1: DOUBLE HASHING, THAT IS, OBTAIN BOTH INDICES AND OFFSETS BY HASHING:  

unsigned long hash_int = hash (key);int index = hash_int % length,offset = hash_int / length;

NOW THE OFFSET DEPENDS ON THEKEY, SO DIFFERENT KEYS WILL USUALLY HAVE DIFFERENT OFFSETS, SO NO MORE PRIMARY CLUSTERING!

Secondary hash function

Page 101: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 101

TO GET A NEW INDEX:

index = (index + offset) % length;

Notice that if a collision occurs, you rehash from the NEW index value.

Page 102: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 102

EXAMPLE: length = 11

key index offset15

4

119

8

116

5

158

3

527

5

235

2

330

8

247

3

4

WHERE WOULD THESE KEYS GO IN buckets?

0

1

2

3

4

5

6

7

8

9

10

Page 103: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 103

index

key 0

47 1 2

35 3

58 4

15 5

16 6

7

27 8

19 910

30

Page 104: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 104

PROBLEM: WHAT IF OFFSET IS A MULTIPLE OF length?

EXAMPLE: length = 11key index offset15

4

119

8

116

5

158

3

527

5

235

2

347

3

4246

4

22 // BUT 15 IS AT INDEX 4 // FOR KEY 246, NEW INDEX = (4 + 22) % 11 = 4. OOPS!

0 47

1

2 35

3 58

4 15

5 16

6

7 27

8 19

9

10

Page 105: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 105

SOLUTION:

if (offset % length == 0)

offset = 1;

ON AVERAGE, offset % length WILL

EQUAL 0 ONLY ONCE IN EVERY

length TIMES.

Page 106: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 106

FINAL PROBLEM: WHAT IF length HAS SEVERAL FACTORS?EXAMPLE: length = 20key index offset20 0 125 5 130 10 135 15 1110 10 5 // BUT 30 IS AT INDEX 10

FOR KEY 110, NEW INDEX = (10 + 5) % 20 = 15, WHICH IS OCCUPIED, SO NEW INDEX = (15 + 5) % 20, WHICH IS OCCUPIED, SO NEW INDEX = ...

Page 107: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 107

SOLUTION: MAKE length A PRIME.

Page 108: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 108

Consider a hash table storing integer keys that handles collision with double hashing

N13 h(k) k mod 13 d(k) 7 k mod 7

Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order

Example of Double Hashing

0 1 2 3 4 5 6 7 8 9 10 11 12

31 41 183259732244 0 1 2 3 4 5 6 7 8 9 10 11 12

k h (k ) d (k ) Probes18 5 3 541 2 1 222 9 6 944 5 5 5 1059 7 4 732 6 3 631 5 4 5 9 073 8 4 8

Page 109: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 109

THIS VERSION OF OPEN-ADDRESS

HASHING IS FAST. IF THE UNIFORM

HASHING ASSUMPTION HOLDS,

averageTime(n, m) FOR SEARCHING,

INSERTING AND REMOVING IS

CONSTANT O(1).

Page 110: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 110

ANOTHER SOLUTION: QUADRATIC HASHING, THAT IS, ONCE COLLISION OCCURS AT h, GO TO LOCATION h + 1, THEN IF COLLISION OCCURS THERE GO TO LOCATION h + 4, then h + 9, then h + 16, etc.unsigned long hash_int = hash (key);int index = hash_int % length,offset = i2;

Notice that h stays at the same location. No clustering.

Page 111: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 111

QUADRATIC REHASHINGEXAMPLE: length = 11

key index offset15

4

19

8

16

5

58

3

27

5

1, final place index = 635

2

30

8

1, final place index = 947

3

4, final place index = 7

0

1

2

3 58

4 15

5 16

6

7

8 19

9

10

Page 112: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 112

Performance

HOW DOES DOUBLE-HASHING COMPARE WITH CHAINED HASHING?

Page 113: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 113

Performance of Hash Tables Load factor = # filled cells / table size

Between 0 and 1 Load factor has greatest effect on performance Lower load factor better performance

Reduce collisions in sparsely populated tables Knuth gives expected # probes p for open addressing,

linear probing, load factor L: p = ½(1 + 1/(1-L)) As L approaches 1, this zooms up

For chaining, p = 1 + (L/2) Note: Here L can be greater than 1!

Page 114: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 114

Performance of Hash Tables (2)

L Number of Probes Linear Probing Chaining

0 1.00 1.00 0.25 1.17 1.13 0.5 1.50 1.25 0.75 2.50 1.38 0.83 3.38 1.43 0.9 5.50 1.45 0.95 10.50 1.48

Page 115: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 115

Performance of Hash Tables (3) Hash table: Insert: average O(1) Search: average O(1)

Sorted array: Insert: average O(n) Search: average O(log n)

Binary Search Tree: Insert: average O(log n) Search: average O(log n)

But balanced trees can guarantee O(log n)

Page 116: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 116

We know that hashing becomes inefficient as the table fills up. What to do?

EXPAND!

Page 117: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 117

WHAT ABOUT THE SIZE OF buckets,

AND SHOULD THAT ARRAY EVER BE

RE-SIZED? RE-SIZE WHENEVER THE LOAD FACTOR, THE RATIO OF count TO length, EXCEEDS 0.75.

Page 118: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 118

TO RE-SIZE, WE WILL DOUBLE THE

OLD CAPACITY, PLUS 1. WHY +1? ANOTHER OPTION…FIND NEXT PRIME NUMBER AFTER DOUBLING. NOTE THAT WE RE-SIZE WHENEVER

THE LOAD FACTOR, THAT IS, THE

AVERAGE LIST SIZE, EXCEEDS 0.75.

Page 119: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 119

IN check_for_expansion, IF count >=

length * 0.75, CREATE A NEW ARRAY

OF DOUBLE THE OLD LENGTH (PLUS

1). FOR EACH VALUE IN THE OLD

ARRAY, ITERATE THROUGH

AND HASH EACH VALUE TO

THE NEW ARRAY. FINALLY, ERASE

THE OLD ARRAY.

Page 120: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 120

GROUP EXERCISE: ASSUME THAT length = 13. INSERT THE FOLLOWING KEYS INTO A HASH TABLE USING 1) OPEN ADDRESS, 2) DOUBLE HASHING, and 3) CHAINING 20, 33, 49, 22, 26, 140, 38, 9, 7, 3, 0, 1

Page 121: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 121121

Summary Slide 1§- Hash Table - simulates the fastest searching technique, knowing

the index of the required value in a vector and array and apply the index to access the value, by applying a hash function that converts the data to an integer

- After obtaining an index by dividing the value from the hash function by the table size and taking the remainder, access the table. Normally, the

number of elements in the table is much smaller than the number of distinct data values, so collisions occur.

- To handle collisions, we must place a value that collides with an existing table element into the

table in such a way that we can efficiently access it later.

Page 122: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 122122

Summary Slide 2

§- Hash Table (Cont…) - average running time for a search of a hash table is

O(1)

- the worst case is O(n)

Page 123: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 123123

Summary Slide 3

§- Collision Resolution - Types:

1) linear open probe addressing

- the table is a vector or array of static size

- After using the hash function to compute a table index, look up the entry in the table.

- If the values match, perform an update if necessary.

- If the table entry is empty, insert the value in the table.

Page 124: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 124124

Summary Slide 4

§- Collision Resolution (Cont…) - Types:

1) linear open probe addressing

- Otherwise, probe forward circularly, looking for a match or an empty table slot.

- If the probe returns to the original starting point, the table is full.

- you can search table items that hashed to different table locations.

- Deleting an item difficult.

Page 125: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 125125

Summary Slide 5§- Collision Resolution (Cont…)

2) chaining with separate lists.

- the hash table is a vector of list objects

- Each list is a sequence of colliding items.

- After applying the hash function to compute the table index, search the list for the data value.

- If it is found, update its value; otherwise, insert the value at the back of the list.

- you search only items that collided at the same table location

Page 126: Hash Tables Briana B. Morrison Adapted from William Collins.

Hashing 126126

Summary Slide 6

§- Collision Resolution (Cont…)- there is no limitation on the number of values

in the table, and deleting an item from the table involves only erasing it from its

corresponding list