Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3 1 Hash Tables,
Post on 31-Mar-2015
221 Views
Preview:
Transcript
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
1
Hash Tables, Maps and SetsChapter 15
Wherein we throw all the data into random array slots and somehow obtain
O(1) retrieval time
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
2
Hash Tables
• Recall order of magnitude of searches– Linear search O(n)
– Binary search O(log2n)
– Balanced binary tree search O(log2n)
– Unbalanced binary tree can degrade to O(n)
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
3
Hash Tables
• In some situations faster search is needed– Solution is to use a hash function– Value of key field given to hash function– Location in a hash table is calculated
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
4
Hash Functions
• Simple function could be to mod the value of the key by some arbitrary integerint h(int i){ return i % someInt; }
• Note the max number of locations in the table will be same as someInt
• Note that we have traded speed for wasted space– Table must be considerably larger than number
of items anticipated
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
5
Hash Functions
• Observe the problem with same value returned by h(i) for different values of i– Called collisions
• A simple solution is linear probing– Linear search begins at
collision location– Continues until empty
slot found for insertion
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
6
Hash Functions
• When retrieving a valuelinear probe until found– If empty slot encountered
then value is not in table
• If deletions permitted – Slot can be marked so
it will not be empty and cause an invalid linear probe
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
7
Hash Functions
• Strategies for improved performance– Increase table capacity (less collisions)– Use different collision resolution technique– Devise different hash function
• Hash table capacity– Size of table must be 1.5 to 2 times the size of
the number of items to be stored– Otherwise probability of collisions is too high
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
8
Collision Strategies
• Linear probing can result in primary clustering
• Consider quadratic probing– Probe sequence from location i is
i + 1, i – 1, i + 4, i – 4, i + 9, i – 9, …– Secondary clusters can still form
• Double hashing – Use a second hash function to determine probe
sequence
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
9
Collision Strategies
• Chaining– Table is a list or vector of head nodes to linked
lists– When item hashes to location, it is added to that
linked list
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
10
Improving the Hash Function
• Ideal hash function– Simple to evaluate– Scatters items uniformly throughout table
• Modulo arithmetic not so good for strings– Possible to manipulate numeric (ASCII) value of
first and last characters of a name
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
11
Implementations
Hash TableResizable
ArrayBalanced
TreeLinked
List
Hash Table + Linked
List
Interfaces
Set HashSet TreeSet LinkedHashSet
List ArrayList LinkedList
Map HashMap TreeMap LinkedHashMap
• HashMap -- an associative data structure
• HashSet -- a container for holding values
• TreeMap – just like HashMap only items are ordered by key (internal red-black tree)
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
12
Hash Tables 13
the Map Interface
• Map ADT methods:– get(k): if the map M has an entry with key k, return
its associated value; else, return null – put(k, v): insert entry (k, v) into the map M; if key k
is not already in M, then return null; else, return old value associated with k
– remove(k): if the map M has an entry with key k, remove it from M and return its associated value; else, return null
– size(), isEmpty()– keys(): return an iterator of the keys in M– values(): return an iterator of the values in M
Hash Tables 14
the Set Interface
• Set ADT methods:– add(o): Adds the specified element to this set if it
is not already present. Returns true if the set does not already contain element (new item added)
– contains(o): returnts true if the set contains the specified element
– remove(o): Removes the specified element from this set if it is present. Returns true if the set contained the specified element
– size(), isEmpty()
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
15
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
16
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
17
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
18
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
19
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
20
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
21
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
22
Using “Chaining” to resolve collisions
• A linked list in every cell of array
• No rehashing required
• Unlimited collisions can be handled
• Lousy hash codes will still reduce performanceNyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson
Education, Inc. All rights reserved. 0-13-140909-3 23
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved. 0-13-140909-3
24
top related