Top Banner
hashing 1 Hashing Hashing It’s not just for breakfast anymore!
31

hashing1 Hashing It’s not just for breakfast anymore!

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: hashing1 Hashing It’s not just for breakfast anymore!

hashing 1

HashingHashing

It’s not just for breakfast anymore!

Page 2: hashing1 Hashing It’s not just for breakfast anymore!

hashing 2

Hashing: the factsHashing: the facts

• Approach that involves both storing and searching for values

• Behavior is linear in the worst case, but strong competitor with binary searching in the average case

• Hashing makes it easy to add and delete elements, an advantage over binary search (since the latter requires sorted array)

Page 3: hashing1 Hashing It’s not just for breakfast anymore!

hashing 3

Dictionary ADTDictionary ADT

• Previously, we have seen a dictionary ADT implemented as a binary search tree

• A hash table can be used to provide an array-based dictionary implementation

• Abstract properties of dictionary:– every item has a key– to retrieve an item, specify key and retrieval

process fetches associated data

Page 4: hashing1 Hashing It’s not just for breakfast anymore!

hashing 4

Setting up the arraySetting up the array

• One approach to an array-based dictionary would be to create consecutive keys, storing the records so that each key corresponds to its index -- this is the method used in MS Access, for example

• An alternative would be to use an existing attribute of the data to be stored as the key value; this approach is more typical of hashing

Page 5: hashing1 Hashing It’s not just for breakfast anymore!

hashing 5

Setting up the arraySetting up the array

• Use of existing key field presents challenges:– Value may be too large for indexing: e.g. social

security number– No guarantee that individual values will be

close enough together for effective indexing: e.g. last 4 digits of social security numbers of students in a class

Page 6: hashing1 Hashing It’s not just for breakfast anymore!

hashing 6

Solution: hashingSolution: hashing

• Instead of direct use of data field, a function is applied to the original value to produce a valid index: this is called the hash function

• The hash function maps the key to an index that can be used to insert data into the array or to retrieve data based on a given key

• An array that uses hashing for indexing is called a hash table

Page 7: hashing1 Hashing It’s not just for breakfast anymore!

hashing 7

Operations on a hash tableOperations on a hash table

• Inserting an item– calculate hash value (index) from item key– check index to determine if space is open

• if open, insert item

• if not open, collision occurs; search through array for next open slot

– requires some mechanism for recognizing an empty space; can’t just start with uninitialized array

Page 8: hashing1 Hashing It’s not just for breakfast anymore!

hashing 8

Open-address hashingOpen-address hashing

• The insertion scheme just described uses open-address hashing

• In open addressing, collisions are resolved by placing a new item in the next open spot in the array

• Scheme requires that the key field of each array element be initialized to some known value; -1, for example

Page 9: hashing1 Hashing It’s not just for breakfast anymore!

hashing 9

Inserting a New RecordInserting a New RecordInserting a New RecordInserting a New Record

• In order to insert a new record, the key must somehow be converted to an array index.

• The index is called the hash value of the key.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

Page 10: hashing1 Hashing It’s not just for breakfast anymore!

hashing 10

Inserting a New RecordInserting a New RecordInserting a New RecordInserting a New Record

• Typical hash function – 701 is the number of items in the

array

– Number is the original key value

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685

(Number mod 701)

What is (580625685 mod 701) ? 3

Page 11: hashing1 Hashing It’s not just for breakfast anymore!

hashing 11

Inserting a New RecordInserting a New RecordInserting a New RecordInserting a New Record• The hash value is used for

the location of the new record.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

[3]

Page 12: hashing1 Hashing It’s not just for breakfast anymore!

hashing 12

CollisionsCollisionsCollisionsCollisions• Here is another new record

to insert, with a hash value of 2.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

My hashvalue is [2].

Page 13: hashing1 Hashing It’s not just for breakfast anymore!

hashing 13

CollisionsCollisionsCollisionsCollisions• This is called a collision,

because there is already another valid record at [2].

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685

Number 701466868

When a collision occurs,move forward until you

find an empty spot.

When a collision occurs,move forward until you

find an empty spot.

Page 14: hashing1 Hashing It’s not just for breakfast anymore!

hashing 14

CollisionsCollisionsCollisionsCollisions

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

The new record goesin the empty spot.

The new record goesin the empty spot.

Page 15: hashing1 Hashing It’s not just for breakfast anymore!

hashing 15

Operations on a hash tableOperations on a hash table

• Retrieving an item– calculate hash value based on desired key– search array, beginning at calculated index, for

desired data– search is finished when:

• item is found; successful search

• an empty index is encountered; unsuccessful search

Page 16: hashing1 Hashing It’s not just for breakfast anymore!

hashing 16

Searching for a KeySearching for a KeySearching for a KeySearching for a Key• The data that's attached to a

key can be found fairly quickly.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

Page 17: hashing1 Hashing It’s not just for breakfast anymore!

hashing 17

Searching for a KeySearching for a KeySearching for a KeySearching for a Key• Calculate the hash value.• Check that location of the array

for the key.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Not me.

Number 701466868

My hashvalue is [2].

Page 18: hashing1 Hashing It’s not just for breakfast anymore!

hashing 18

Searching for a KeySearching for a KeySearching for a KeySearching for a Key• Keep moving forward until you

find the key, or you reach an empty spot.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Not me.

Number 701466868

My hashvalue is [2].

Not me. Yes!

Page 19: hashing1 Hashing It’s not just for breakfast anymore!

hashing 19

Searching for a KeySearching for a KeySearching for a KeySearching for a Key• When the item is found, the

information can be copied to the necessary location.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .Number 580625685 Number 701466868

Number 701466868

My hashvalue is [2].

Page 20: hashing1 Hashing It’s not just for breakfast anymore!

hashing 20

Operations on a hash tableOperations on a hash table

• Deleting an item:– find index based on hashed key, as with

insertion and retrieval– mark record at index to indicate the spot is open

• can’t use ordinary “empty” designation -- this could interfere with record retrieval

• use alternative “open” designation: indicate the slot is open for insertion, but won’t stop a search

Page 21: hashing1 Hashing It’s not just for breakfast anymore!

hashing 21

Deleting a RecordDeleting a RecordDeleting a RecordDeleting a Record• Records may also be deleted from a hash table.• But the location must not be left as an ordinary

"empty spot" since that could interfere with searches.

• The location must be marked in some special way so that a search can tell that the spot used to have something in it.

[ 0 ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 700]Number 506643548Number 233667136Number 281942902 Number 155778322

. . .

Number 580625685 Number 701466868

Pleasedelete me.

Page 22: hashing1 Hashing It’s not just for breakfast anymore!

Hash tables & Java: a breakfast Hash tables & Java: a breakfast classicclassic

• Java classes have built-in support for hash tables; all classes inherit the hashcode() method from Object

• The hashcode() method returns a pseudorandom number that can be used as input to a hash method

• One caution: if you define an equals() method for a new class, you should also override the hashcode method – this is because equal objects must have equal hashcodes

hashing 22

Page 23: hashing1 Hashing It’s not just for breakfast anymore!

A Table class implementationA Table class implementation

hashing 23

public class Table {private int manyItems;private Object[ ] data; // hash table dataprivate Object[ ] keys;// array of keys parallel to data arrayprivate boolean[ ] hasBeenUsed; // another parallel array used to indicate the status// of each index – if not currently in use, but has// been used previously, an index should not stop// a search operation

Page 24: hashing1 Hashing It’s not just for breakfast anymore!

ConstructorConstructor

hashing 24

public Table(int capacity) { if (capacity <= 0) throw new IllegalArgumentException("Capacity is negative"); keys = new Object[capacity]; data = new Object[capacity]; hasBeenUsed = new boolean[capacity]; }

Page 25: hashing1 Hashing It’s not just for breakfast anymore!

The put method (adds item to The put method (adds item to table)table)

hashing 25

public Object put(Object key, Object element) { int index = findIndex(key); Object answer; if (index != -1) { // The key is already in the table. answer = data[index]; data[index] = element; return answer; }

Page 26: hashing1 Hashing It’s not just for breakfast anymore!

put method continuedput method continued

hashing 26

else if (manyItems < data.length) { // key is not yet in Table. index = hash(key); while (keys[index] != null) index = nextIndex(index); keys[index] = key; data[index] = element; hasBeenUsed[index] = true; manyItems++; return null; } else { // The table is full. throw new IllegalStateException("Table is full."); }}

Page 27: hashing1 Hashing It’s not just for breakfast anymore!

remove methodremove method

hashing 27

public Object remove(Object key) { int index = findIndex(key); Object answer = null; if (index != -1) { answer = data[index]; keys[index] = null; data[index] = null; manyItems--; } return answer;}

Page 28: hashing1 Hashing It’s not just for breakfast anymore!

findIndex methodfindIndex method

hashing 28

private int findIndex(Object key) { int count = 0; int i = hash(key); while (count < data.length && hasBeenUsed[i]) { if (key.equals(keys[i])) return i; count++; i = nextIndex(i); } return -1;}

Page 29: hashing1 Hashing It’s not just for breakfast anymore!

nextIndex & containsKey nextIndex & containsKey methodsmethods

hashing 29

private int nextIndex(int i) { if (i+1 == data.length) return 0; else return i+1;}

public boolean containsKey(Object key) { return findIndex(key) != -1; }

Page 30: hashing1 Hashing It’s not just for breakfast anymore!

hash methodhash method

hashing 30

private int hash(Object key) { return Math.abs(key.hashCode( )) % data.length; }

Page 31: hashing1 Hashing It’s not just for breakfast anymore!

Java’s HashTable classJava’s HashTable class

• The Java API contains a HashTable class in the java.util package

• Unlike the implementation just presented, the API HashTable grows automatically when it approaches capacity

• This has important performance issues; when the array grows, the entire table must be rehashed

hashing 31