Hashing 1 Maps and hashing
Mar 21, 2016
Hashing 1
Maps and hashing
Hashing 2Dr.Alagoz
MapsA map models a searchable collection of key-value entriesTypically a key is a string with an associated value (e.g. salary info)The main operations of a map are for searching, inserting, and deleting itemsMultiple entries with the same key are not allowedApplications: address book student-record database Salary information,etc
Hashing 3Dr.Alagoz
The Map ADT Map ADT methods: Get(k): if the map M has an entry with key k,
return its associated value; else, return null Put (k, v): insert entry (k, v) into the map M; if
key k is not already in M, then return null; else, return old value associated with k
Remove (k): if the map M has an entry with key k, remove it from M and return its associated value; else, return null
Size(), isEmpty() Keys(): return an iterator of the keys in M Values(): return an iterator of the values in M
Hashing 4Dr.Alagoz
ExampleOperation Output Map
isEmpty() true Øput(5,A) null (5,A) It returns null because it is not in Mput(7,B) null (5,A),(7,B)put(2,C) null (5,A),(7,B),(2,C)put(8,D) null (5,A),(7,B),(2,C),(8,D)put(2,E) C (5,A),(7,B),(2,E),(8,D)get(7) B (5,A),(7,B),(2,E),(8,D)get(4) null (5,A),(7,B),(2,E),(8,D) Because 4 is not
in M get(2) E (5,A),(7,B),(2,E),(8,D)size() 4 (5,A),(7,B),(2,E),(8,D)remove(5) A (7,B),(2,E),(8,D) Remove 5 and return Aremove(2) E (7,B),(8,D)get (2) null (7,B),(8,D) Because 4 is not in M isEmpty() false (7,B),(8,D) Because there are 2 in M
Hashing 5Dr.Alagoz
Comparison to java.util.MapMap ADT Methods java.util.Map Methods
size() size()isEmpty() isEmpty()get(k) get(k)put(k,v) put(k,v)remove(k) remove(k)
All same except:keys() keySet().iterator()
values() values().iterator()
Hashing 6Dr.Alagoz
Performance of a List-Based Map
Performance: put takes O(1) time, since we can insert the new item
at the beginning or at the end of the sequence get and remove take O(n) time since in the worst case
(the item is not found) we traverse the entire sequence to look for an item with the given key
The unsorted list implementation is effective only for maps of small size or for maps in which puts are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation)
Hashing 7Dr.Alagoz
Hashing
Hashing 8Dr.Alagoz
ContentIdea: when using various operations on binary search trees, we use hash table ADT
Hashing: implementation of hash tables for insert and find operations is called hashing.
Collision: when two keys hash to the same value
Resolving techniques for collision using linked listsRehashing: when a hash table is full, then operations will take longer time. Then we need to build a new double sized hash table. (HWLA??? Why double sized..)Extendible hashing: for fitting large data in main memory
Hashing 9Dr.Alagoz
Hashing as a Data Structure
Performs operations in O(1) Insert Delete FindIs not suitable for FindMin FindMax Sort or output as sorted
Hashing 10Dr.Alagoz
General IdeaArray of Fixed Size (TableSize)Search is performed on some part of the item (Key)Each key is mapped into some number between 0 and (TableSize-1)Mapping is called a hash functionEnsure that two distinct keys get different cellsProblem: Since there are a finite # of cells and virtually inexhaustible supply of keys, we need a hash function to distribute the keys evenly among the cells!!!!!!
Hashing 11Dr.Alagoz
Hash Functions and Hash Tables
A hash function h maps keys of a given type to integers in a fixed interval [0, N1]Example:
h(x) x mod Nis a hash function for integer keysThe integer h(x) is called the hash value of key xA hash table for a given key type consists of Hash function h Array (called table) of size N
When implementing a map with a hash table, the goal is to store item (k, o) at index i h(k)
Hashing 12Dr.Alagoz
ExampleWe design a hash table for a map storing entries as (SSN, Name), where SSN (social security number) is a nine-digit positive integerOur hash table uses an array of size N10,000 and the hash functionh(x)last four digits of x
01234
999799989999
…451-229-0004
981-101-0002
200-751-9998
025-612-0001
Hashing 13Dr.Alagoz
Hash Functions
Easy to compute Key is of type Integer
reasonable strategy is to return Key ModTableSize-1
Key is of type String (mostly used in practice) Hash function needs to be chased carefully! Eg. Adding up the ASCII values of the characters in
the string Proper selection of hash function is required
Hashing 14Dr.Alagoz
Hash Function (Integer)Simply return Key % TableSizeChoose carefully TableSize TableSize is 10, all keys end in
zero???To avoid such pitfalls, choose TableSize a prime number
Hashing 15Dr.Alagoz
Hash Function I (String)Adds up ASCII values of characters in the stringAdvantage: Simple to implement and computes quicklyDisadvantage: If TableSize is large (see Fig.5.2 in your book), function does not distribute keys well Example: Keys are at most 8 characters.
Maximum sum (8*256 = 2048), but TableSize 10007. Only 25 percent could be filled.
Hashing 16Dr.Alagoz
Hash Function II (String)Assumption: Key has at least 3 charactersHash Function: (26 characters for alphabet + blank)
key[0] + 27 * key[1] + 272 * key[2]Advantage: Distributes better than Hash Function I, and easy to compute.Disadvantage:
263 = 17,576 possible combinations of 3 characters However, English has only 2,851 different combinations
by a dictionary check. HWLA: Explain why? Read p.157 for the answer. Similar to Hash Function 1, it is not appropriate, if the
hash table is reasonably large!
Hashing 17Dr.Alagoz
Hash Function III (String)Idea: Computes a polynomial function of Key’s characters
P(Key with n+1 characters) =Key[0]+37Key[1]+372Key[2]+...+37nKey[n]
If find 37n then sum up complexity O(n2)Using Horner’s rule complexity drops to O(n)
((Key[n]*37+Key[n-1])*37+...+Key[1])*37+Key[0]Very simple and reasonably fast method, but there will be complexity problem if the key-characters are very long!The lore is to avoid using all characters to set a key. Eg: the keys could be a complete street address. The hash function might include a couple of characters from the street address, and may be a couple of characters from the city name, or zipcode.Think of other options??? Quiz: Lately 31 is proposed instead of 37.. Why not 19?
Hashing 18Dr.Alagoz
public static int hash( String key, int tableSize )
{
int hashVal = 0;
for( int i = 0; i < key.length( ); i++ )
hashVal = 37 * hashVal + key.charAt( i );
hashVal %= tableSize;
if( hashVal < 0 )
hashVal += tableSize;
return hashVal;
}
Hash Function III (String)
Hashing 19Dr.Alagoz
CollisionCollisions occur when different elements are mapped to the same cell
01234 451-229-0004 981-101-0004
025-612-0001
Hashing 20Dr.Alagoz
CollisionWhen an element is inserted, it hashes to the same value as an already inserted element we have collision. (e.g. 564 and 824 will collide at 4)Example: Hash Function (Key % 10)
Hashing 21Dr.Alagoz
Solving CollisionSeparate Chaining: keep a list of all elements hashing to the same
value, and traverse the list to find corresponding hash. Hint: lists should be large and kept as prime number table size to ensure a good distribution. (Limited use due to space limitations of lists, and needs linked lists!!!)
Open Addressing: at a collision, search for alternative cells until
finding an empty cell. Linear Probing Quadratic Probing Double Hashing
Hashing 22Dr.Alagoz
Solving CollisionDefine: The load factor: Lamda= #of elements/TablesizeHWLA: Search Internet for other techniques (is there any algorithm without using a linked list)
A) Binary search tree B) Using another hash table Explain why we donot use A and B? Solution: If the table size is large and a
proper hash function is used, then the list should be short. Therefore, it is not worth to find anything more complicated!!!
Hashing 23Dr.Alagoz
Separate Hashing
Separate Chaining: let each cell in the table point to a linked list of entries that map thereKeep a list of all elements that hash to the same valueEach element of the hash table is a Link ListSeparate chaining is simple, but requires additional memory outside the table
Insert keys [ 9, 8, 7, 6, 5,4,3,2,1,0] into the hast table using h(x)= x2 % TableSize
Hashing 24Dr.Alagoz
Load Factor (Lambda): Lambda: the number of elements in hash table divided by the
Tablesize. (Eg. Lambda = 1 for the table.) To perform a search is the constant time required evaluate the hash
function + the time to traverse the list
Unsuccessful search: (Lambda) nodes to be examined, on average. Successful search: (1+Lambda/2) links to be traversed. Note: Average number of other nodes in Tablesize of N with M lists:
(N-1)/M=N/M-1/M= Lambda-1/M=Lambda for large M. i.e., Tablesize is not important, but load factor is..
So, in separate chaining, Lambda should be kept nearer to 1. i.e. Make the has table as large as the number of elements
expected (for possible collision). Remember also that the tablesize should be prime for ensuring a
good distribution…....
Hashing 25Dr.Alagoz
Separate Hashing /** * Construct the hash table. */ public SeparateChainingHashTable( ){ this( DEFAULT_TABLE_SIZE ); }
/** * Construct the hash table. * @param size approximate table size. */ public SeparateChainingHashTable( int size ){ theLists = new LinkedList[ nextPrime( size ) ]; for( int i = 0; i < theLists.length; i++ ) theLists[ i ] = new LinkedList( ); }
Hashing 26Dr.Alagoz
Separate HashingFind Use hash function to determine which
list to traverse Traverse the list to find the element
public Hashable find( Hashable x ){
return (Hashable)theLists[ x.hash( theLists.length ) ] .find(x).retrieve( );
}
Hashing 27Dr.Alagoz
Separate HashingInsert Use hash function to determine in
which list to insert Insert element in the header of the list
public void insert( Hashable x ){
LinkedList whichList = theLists[x.hash(theLists.length) ];
LinkedListItr itr = whichList.find( x );
if( itr.isPastEnd( ) )
whichList.insert( x, whichList.zeroth( ) );
}
Hashing 28Dr.Alagoz
Separate HashingDelete Use hash function to determine from
which list to delete Search element in the list and delete
public void remove( Hashable x ){
theLists[ x.hash( theLists.length ) ].remove( x );
}
Hashing 29Dr.Alagoz
Separate HashingAdvantages Solves the collision problem totally Elements can be inserted anywhereDisadvantages Need the use of link lists.. And, all
lists must be short to get O(1) time complexity otherwise it take too long time to compute…
Hashing 30Dr.Alagoz
Separate Hashing needs extra space!!!
Alternatives to Using Link Lists Binary Trees Hash Tables However, If the Tablesize is large and a good
hash function is used, all the lists expected to be short already, i.e., no need to complicate!!!
Instead of the above alternative techniques, we use Open Addressing>>>>>
Hashing 31Dr.Alagoz
Open AddressingSolving collisions without using any other data structure such as link list
this is a major problem especially for other languages!!!Idea: If collision occurs, alternative cells are tried until an empty cell is found =>>> Cells h0(x), h1(x), ..., are tried in successionhi(x)=(hash(x) + f(i)) % TableSize
The function f is the collision resolution strategy with f(0)=0.
Since all data go inside the table, Open addressing technique requires the use of bigger table as compared to separate chaining.Lambda should be less than 0.5. (it was 1 for separate hashing)
Hashing 32Dr.Alagoz
Open AddressingDepending on the collision resolution strategy, f, we have Linear Probing: f(i) = i Quadratic Probing: f(i) = i2Double Hashing: f(i) = i hash2(x)
Hashing 33Dr.Alagoz
Linear Probing
Advantages: Easy to computeDisadvantages: Table must be big enough to get a free cell Time to get a free cell may be quite large Primary Clustering
Any key that hashes into the cluster will require several attempts to resolve the collision
f(i) = i is the amount to trying cells sequentially in search of an empty cell.
Hashing 34Dr.Alagoz
Example: Linear ProbingOpen addressing: the colliding item is placed in a different cell of the tableLinear probing handles collisions by placing the colliding item in the next (circularly) available table cellEach table cell inspected is referred to as a “probe”Colliding items lump together, causing future collisions to cause a longer sequence of probes
Example: h(x) x mod 13 Insert keys 18, 41,
22, 44, 59, 32, 31, 73, in this order
0 1 2 3 4 5 6 7 8 9 10 11 12
41 18445932223173 0 1 2 3 4 5 6 7 8 9 10 11 12
Hashing 35Dr.Alagoz
Example in the book: Linear Probing
Insert keys [ 89, 18, 49, 58, 69] into a hast table using hi(x)=(hash(x) + i) % TableSize
Hashing 36Dr.Alagoz
First collision occurs when 49 is inserted. (then, put in the next available cell, i.e. cell 0) 58 collides with 18, 89, and then 49 before an empty cell is found three awayThe collision 69 is handled as above.Note: insertions and unsuccessful searches require the same number of probes.Primary clustering?
If table is big enough, a free cell will always be found (even if it takes long time!!!)
If the table is relatively empty (lowerLambda) , yet key may require several attempts to resolve collision. Then, blocks of occupied cells start forming, i.e., need to add to cluster …..
Hashing 37Dr.Alagoz
Quadratic ProbingEliminates Primary Clustering problemTheorem: If quadratic probing is used, and the table size is prime, then a new element can always be inserted if the table is at least half emptySecondary Clustering Elements that hash to the same position
will probe the same alternative cells
Hashing 38Dr.Alagoz
Quadratic ProbingInsert keys [ 89, 18, 49, 58, 69]
into a hast table using hi(x)=(hash(x) + i2) % TableSize
Hashing 39Dr.Alagoz
Quadratic Probing /**
* Construct the hash table. */ public QuadraticProbingHashTable( ) { this( DEFAULT_TABLE_SIZE ); } /** * Construct the hash table. * @param size the approximate initial size. */ public QuadraticProbingHashTable( int size ) { allocateArray( size ); makeEmpty( ); }
Hashing 40Dr.Alagoz
Quadratic Probing /** * Method that performs quadratic probing resolution. * @param x the item to search for. * @return the position where the search terminates. */ private int findPos( Hashable x ) {/* 1*/ int collisionNum = 0;/* 2*/ int currentPos = x.hash( array.length );/* 3*/ while( array[ currentPos ] != null && !array[ currentPos ].element.equals( x ) ) {/* 4*/ currentPos += 2 * ++collisionNum - 1; /* 5*/ if( currentPos >= array.length ) /* 6*/ currentPos -= array.length; }/* 7*/ return currentPos; }
Hashing 41Dr.Alagoz
Double HashingDouble hashing uses a secondary hash function d(k) and handles collisions by placing an item in the first available cell of the series
(i jd(k)) mod N for j 0, 1, … , N 1The secondary hash function d(k) cannot have zero valuesThe table size N must be a prime to allow probing of all the cells
Common choice of compression function for the secondary hash function: d2(k) q k mod q
where q N q is a prime
The possible values for d2(k) are
1, 2, … , q
Hashing 42Dr.Alagoz
Double HashingPopular choice: f(i)=i. hash2(x) i.e., apply a second hash function to x and probe at a distance hash2(x) , 2hash2(x) , 3 hash2(x) …
hash2(x) = R – (x % R)Poor choice of hash2(x) could be disastrousObserve: hash2(x) =xmod9 would not work if 99 were inserted into the input in the previous example
R should also be a prime number smaller than TableSize
If double hashing is correctly implemented, simulations imply that the expected number of probes is almost the same as for a random collision resolution strategy
Hashing 43Dr.Alagoz
Consider a hash table storing integer keys that handles collision with double hashing
N13 h(k) k mod 13 d(k) 7 k mod 7
Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order
Example of Double Hashing
0 1 2 3 4 5 6 7 8 9 10 11 12
31 41 183259732244 0 1 2 3 4 5 6 7 8 9 10 11 12
k h (k ) d (k ) Probes18 5 3 541 2 1 222 9 6 944 5 5 5 1059 7 4 732 6 3 631 5 4 5 9 073 8 4 8
Hashing 44Dr.Alagoz
Example in the book Double Hashing
Hi(x) = (x + i (R – (x mod R))) % N, R = 7, N=10
The first collision occurs when 49 is inserted hash2 (49)=7-0=7 thus, 49 is inserted in position 6.hash2(58)=7-2=5, so 58 is inserted at location 3. Finally, 69 collides and is inserted at a distance hash2(69)=7-6=1 away. Observe a bad scenario: if we had an input 60, then what happens?
First, 60 collides with 69 in the position 0.Since hash2 (60)=7-4=3, we would then try positions 3, 6, 9, and then 2 until an empty cell is found.
Hashing 45Dr.Alagoz
RehashingIf Hash Table gets too full, running time for the operations will start taking too long timeInsertions might fail for open addressing with quadratic probingSolution: Rehashing build another table that is about twice as big…. Rehashing is used especially when too many removals intermixed with insertions
Hashing 46Dr.Alagoz
RehashingBuild another table that is about twice as big E.g. if N=11, then N’=23Associate a new hash functionScan down the entire original hash tableCompute the new hash value for each nondeleted element.Insert it in the new table
Hashing 47Dr.Alagoz
RehashingVery expensive operation; O(N)
Good news is that rehashing occurs very infrequently
If data structure is part of the program, effect is not noticeable.
If hashing is performed as part of an interactive system, then the unfortunate user whose insertion caused a rehash could observe a slowdown
Hashing 48Dr.Alagoz
RehashingWhen to apply rehashing? Strategy1: As soon as the table is half full Strategy2: Only when an insertion fails Strategy3: When the table reaches a certain
load factor No certain rule for the best strategy! Since
load factor is directly related to the performance of the system, 3rd strategy may work better.. Then what is the threshold???
Hashing 49Dr.Alagoz
Example: Rehashing
Suppose 13,15,23, 24, and 6 are inserted into a hash table of size 7.Assume h(x)= x mod 7, using linear probing we get the table on the left.But table is 5/7 full, rehashing is required…. The new tablesize is 17.17 is the first prime number greater than 2*7The new h(x)=x mod17. Scanned the old table, insert the elements 6,13,15,23,24 (as shown in table on the right).
Hashing 50Dr.Alagoz
Rehashing private void allocateArray( int arraySize ) { array = new HashEntry[ arraySize ]; } private void rehash( ) { HashEntry [ ] oldArray = array; // Create a new double-sized, empty table allocateArray( nextPrime( 2 * oldArray.length ) ); currentSize = 0; // Copy table over for( int i = 0; i < oldArray.length; i++ ) if( oldArray[ i ] != null && oldArray[ i ].isActive) insert( oldArray[ i ].element ); return; }
Hashing 51Dr.Alagoz
Extendible Hashing (Why?)Amount of data is too large to fit in main memory
Main consideration is the number of disk accesses required to retrieve data
Assume: we have N records to store, and at most M records fit in one disk block. (assume M=4)
Hashing 52Dr.Alagoz
Extendible Hashing (Why?)Open addressing or separate chaining is used, collisions could cause several disk blocks to be examined during a find, even for a well-distributed hash table.
When the table gets too full, rehashing requires O(N) disk accesses (very expensive!!!)
Instead, we use extendible hashing for a find with two disk access requirements only. Similarly, insertions may require only few accesses.
Hashing 53Dr.Alagoz
Extendible HashingUse of idea in B-Trees with a depth O(logM/2 N). Choose M too large so that B-Tree has a depth of 1
Now, a find needs one disk access, assuming that the root node could be stored in main memory. However?? We have a problem here!!!
Problem: Branching factor is too high, requires to much time to determine which leaf the data was in
This strategy works in practice only if the time to perform this step is reduced.. This is what we exactly do with extendible hashing strategy..
Hashing 54Dr.Alagoz
Example: Extendible HashingAssume our source data consists of several 6 bit integers. The root of the “tree” contains four links determined by the leading two bits of the data.Each leaf has at most M=4 elements based on the earlier assumption.(D=2) denotes the number of bits used by the root. D is known as the directory. (2^D) will be the number of entries in directory D. dL is the number of leading bits that all the elements of some leaf L have in common. dL<=D
Hashing 55Dr.Alagoz
Extendible HashingSuppose we want to insert 100100. since leading two bits is 10, this would go to 3rd leaf. But the 3rd leaf is already full (due to M=4)!! Thus, we split this leaf into tow leaves which are now determined with three bits..Need to increase the Directory size!! Note, although an entire directory is rewritten, none of the other leaves (1,2,4) is actually accessed..
Hashing 56Dr.Alagoz
Extendible HashingSuppose we want to insert 000000. since leading two bits is 00, this will split 1st leaf as shown below.. Only change in directory is updating 000 and 001.
Therefore, this is a very good and fast strategy for insert and find operations on large databases. However, READ: page 175-176 for scenarios when
this algorithm do not work, how to avoid possible problems!!!.
Hashing 57Dr.Alagoz
HWLAsProblem 2 in the book: When rehashing, we choose a table size that is roughly twice as large and prime. In our case,
the appropriate new table size is 19, with hash function h (x ) = x (mod 19).(a) Scanning down the separate chaining hash table, the new locations are 4371 in list 1, 1323 in
list 12, 6173 in list 17, 4344 in list 12, 4199 in list 0, 9679 in list 8, and 1989 in list 13.(b) The new locations are 9679 in bucket 8, 4371 in bucket 1, 1989 in bucket 13, 1323 in bucket
12, 6173 in bucket 17, 4344 in bucket 14 because both 12 and 13 are already occupied, and 4199 in bucket 0.
(c) The new locations are 9679 in bucket 8, 4371 in bucket 1, 1989 in bucket 13, 1323 in bucket 12, 6173 in bucket 17, 4344 in bucket 16 because both 12 and 13 are already occupied, and 4199 in bucket 0.
(d) The new locations are 9679 in bucket 8, 4371 in bucket 1, 1989 in bucket 13, 1323 in bucket 12, 6173 in bucket 17, 4344 in bucket 15 because 12 is already occupied, and 4199 in bucket 0.
Problems in CHP5: 1, 4, 5, 11, 16 Improved Merkle Cryptosystem
Hashing 58Dr.Alagoz
Java Example: hash table with linear probing (*)
/** A hash table with linear probing and the MAD hash function */public class HashTable implements Map { protected static class HashEntry implements Entry { Object key, value; HashEntry () { /* default constructor */ } HashEntry(Object k, Object v) { key = k; value = v; } public Object key() { return key; } public Object value() { return value; } protected Object setValue(Object v) { // set a new value, returning old Object temp = value; value = v; return temp; // return old value } } /** Nested class for a default equality tester */ protected static class DefaultEqualityTester implements EqualityTester
{ DefaultEqualityTester() { /* default constructor */ } /** Returns whether the two objects are equal. */ public boolean isEqualTo(Object a, Object b) { return a.equals(b); } } protected static Entry AVAILABLE = new HashEntry(null, null); // empty
marker protected int n = 0; // number of entries in the dictionary protected int N; // capacity of the bucket array protected Entry[] A; // bucket array protected EqualityTester T; // the equality tester protected int scale, shift; // the shift and scaling factors /** Creates a hash table with initial capacity 1023. */ public HashTable() { N = 1023; // default capacity A = new Entry[N]; T = new DefaultEqualityTester(); // use the default equality tester java.util.Random rand = new java.util.Random(); scale = rand.nextInt(N-1) + 1; shift = rand.nextInt(N); }
/** Creates a hash table with the given capacity and equality tester. */
public HashTable(int bN, EqualityTester tester) { N = bN; A = new Entry[N]; T = tester; java.util.Random rand = new java.util.Random(); scale = rand.nextInt(N-1) + 1; shift = rand.nextInt(N); }
Hashing 59Dr.Alagoz
Java Example (cont. *)
/** Determines whether a key is valid. */ protected void checkKey(Object k) { if (k == null) throw new InvalidKeyException("Invalid key: null."); } /** Hash function applying MAD method to default hash code. */ public int hashValue(Object key) { return Math.abs(key.hashCode()*scale + shift) % N; } /** Returns the number of entries in the hash table. */ public int size() { return n; } /** Returns whether or not the table is empty. */ public boolean isEmpty() { return (n == 0); } /** Helper search method - returns index of found key or -index-1, * where index is the index of an empty or available slot. */ protected int findEntry(Object key) throws InvalidKeyException { int avail = 0; checkKey(key); int i = hashValue(key); int j = i; do { if (A[i] == null) return -i - 1; // entry is not found if (A[i] == AVAILABLE) { // bucket is deactivated
avail = i; // remember that this slot is availablei = (i + 1) % N; // keep looking
} else if (T.isEqualTo(key,A[i].key())) // we have found our entry
return i; else // this slot is occupied--we must keep looking
i = (i + 1) % N; } while (i != j); return -avail - 1; // entry is not found } /** Returns the value associated with a key. */ public Object get (Object key) throws InvalidKeyException { int i = findEntry(key); // helper method for finding a key if (i < 0) return null; // there is no value for this key return A[i].value(); // return the found value in this case }
/** Put a key-value pair in the map, replacing previous one if it exists. */ public Object put (Object key, Object value) throws InvalidKeyException { if (n >= N/2) rehash(); // rehash to keep the load factor <= 0.5 int i = findEntry(key); //find the appropriate spot for this entry if (i < 0) { // this key does not already have a value A[-i-1] = new HashEntry(key, value); // convert to the proper index n++; return null; // there was no previous value } else // this key has a previous value return ((HashEntry) A[i]).setValue(value); // set new value & return old } /** Doubles the size of the hash table and rehashes all the entries. */ protected void rehash() { N = 2*N; Entry[] B = A; A = new Entry[N]; // allocate a new version of A twice as big as before java.util.Random rand = new java.util.Random(); scale = rand.nextInt(N-1) + 1; // new hash scaling factor shift = rand.nextInt(N); // new hash shifting factor for (int i=0; i<B.length; i++) if ((B[i] != null) && (B[i] != AVAILABLE)) { // if we have a valid entry
int j = findEntry(B[i].key()); // find the appropriate spotA[-j-1] = B[i]; // copy into the new array
} } /** Removes the key-value pair with a specified key. */ public Object remove (Object key) throws InvalidKeyException { int i = findEntry(key); // find this key first if (i < 0) return null; // nothing to remove Object toReturn = A[i].value(); A[i] = AVAILABLE; // mark this slot as deactivated n--; return toReturn; } /** Returns an iterator of keys. */ public java.util.Iterator keys() { List keys = new NodeList(); for (int i=0; i<N; i++) if ((A[i] != null) && (A[i] != AVAILABLE))
keys.insertLast(A[i].key()); return keys.elements(); }} // ... values() is similar to keys() and is omitted here ...
Hashing 60Dr.Alagoz
Hash Functions (*)A hash function is usually specified as the composition of two functions:Hash code: h1: keys integers
Compression function: h2: integers [0, N1]
The hash code is applied first, and the compression function is applied next on the result, i.e.,
h(x) = h2(h1(x))
The goal of the hash function is to “disperse” the keys in an apparently random way
Hashing 61Dr.Alagoz
Performance of Hashing(*)
In the worst case, searches, insertions and removals on a hash table take O(n) timeThe worst case occurs when all the keys inserted into the map collideThe load factor nN affects the performance of a hash tableAssuming that the hash values are like random numbers, it can be shown that the expected number of probes for an insertion with open addressing is
1 (1 )
The expected running time of all the dictionary ADT operations in a hash table is O(1) In practice, hashing is very fast provided the load factor is not close to 100%Applications of hash tables:
small databases compilers browser caches
Hashing 62Dr.Alagoz
Hash Codes (*)Memory address:
We reinterpret the memory address of the key object as an integer (default hash code of all Java objects)
Good in general, except for numeric and string keys
Integer cast: We reinterpret the bits of
the key as an integer Suitable for keys of length
less than or equal to the number of bits of the integer type (e.g., byte, short, int and float in Java)
Component sum: We partition the bits of
the key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows)
Suitable for numeric keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double in Java)
Hashing 63Dr.Alagoz
Hash Codes (* cont.)Polynomial accumulation:
We partition the bits of the key into a sequence of components of fixed length (e.g., 8, 16 or 32 bits) a0 a1 … an1
We evaluate the polynomialp(z) a0 a1 z a2 z2 … … an1zn1
at a fixed value z, ignoring overflows
Especially suitable for strings (e.g., the choice z 33 gives at most 6 collisions on a set of 50,000 English words)
Polynomial p(z) can be evaluated in O(n) time using Horner’s rule:
The following polynomials are successively computed, each from the previous one in O(1) timep0(z) an1
pi (z) ani1 zpi1(z) (i 1, 2, …, n 1)
We have p(z) pn1(z)
Hashing 64Dr.Alagoz
Compression Functions (*)
Division: h2 (y) y mod N The size N of the
hash table is usually chosen to be a prime
The reason has to do with number theory and is beyond the scope of this course
Multiply, Add and Divide (MAD): h2 (y) (ay b) mod N a and b are
nonnegative integers such that
a mod N 0 Otherwise, every
integer would map to the same value b
Hashing 65Dr.Alagoz
Map Methods with Separate Chaining used for Collisions (*)
Delegate operations to a list-based map at each cell:Algorithm get(k):Output: The value associated with the key k in the map, or null if there is no
entry with key equal to k in the mapreturn A[h(k)].get(k) {delegate the get to the list-based map at A[h(k)]}Algorithm put(k,v):Output: If there is an existing entry in our map with key equal to k, then we
return its value (replacing it with v); otherwise, we return nullt = A[h(k)].put(k,v) {delegate the put to the list-based map at A[h(k)]}if t = null then {k is a new key}
n = n + 1return tAlgorithm remove(k):Output: The (removed) value associated with key k in the map, or null if there
is no entry with key equal to k in the mapt = A[h(k)].remove(k) {delegate the remove to the list-based map at A[h(k)]}if t ≠ null then {k was found}
n = n - 1return t
Hashing 66Dr.Alagoz
Search with Linear Probing (*)
Consider a hash table A that uses linear probingget(k)
We start at cell h(k) We probe consecutive
locations until one of the following occurs
An item with key k is found, or
An empty cell is found, or
N cells have been unsuccessfully probed
Algorithm get(k)i h(k)p 0repeat
c A[i]if c
return null else if c.key () k
return c.element()else
i (i 1) mod Np p 1
until p Nreturn null
Hashing 67Dr.Alagoz
Updates with Linear Probing(*)
To handle insertions and deletions, we introduce a special object, called AVAILABLE, which replaces deleted elementsremove(k)
We search for an entry with key k
If such an entry (k, o) is found, we replace it with the special item AVAILABLE and we return element o
Else, we return null
put(k, o) We throw an
exception if the table is full
We start at cell h(k) We probe
consecutive cells until one of the following occurs A cell i is found that is
either empty or stores AVAILABLE, or
N cells have been unsuccessfully probed
We store entry (k, o) in cell i
Hashing 68Dr.Alagoz
A Simple List-Based Map (*)
We can efficiently implement a map using an unsorted list We store the items of the map in a list S
(based on a doubly-linked list), in arbitrary order
trailerheader nodes/positions
entries9 c 6 c 5 c 8 c
Hashing 69Dr.Alagoz
The get(k) AlgorithmAlgorithm get(k):
B = S.positions() {B is an iterator of the positions in S}while B.hasNext() do
p = B.next() if the next position in Bgif p.element().key() = kthen
return p.element().value()return null {there is no entry with key equal to k}
Hashing 70Dr.Alagoz
The put(k,v) AlgorithmAlgorithm put(k,v):B = S.positions()while B.hasNext() do
p = B.next()if p.element().key() = k then
t = p.element().value()B.replace(p,(k,v))return t {return the old value}
S.insertLast((k,v))n = n + 1 {increment variable storing number of
entries}return null {there was no previous entry with key
equal to k}
Hashing 71Dr.Alagoz
The remove(k) AlgorithmAlgorithm remove(k):B =S.positions()while B.hasNext() do
p = B.next()if p.element().key() = k thent = p.element().value()S.remove(p)n = n – 1 {decrement number of entries}return t {return the removed value}
return null {there is no entry with key equal to k}