Hash Discrete Mathematics and Its Applications Baojian Hua bjhua@ustc.edu.cn.
Post on 21-Dec-2015
228 Views
Preview:
Transcript
Hash
Discrete Mathematics andIts Applications
Baojian Huabjhua@ustc.edu.cn
Searching A dictionary-like data structure
contains a collection of tuple data: <k1, v1>, <k2, v2>, … keys are comparable and pair-wise distinct
supports these operations: new () insert (dict, k, v) lookup (dict, k) delete (dict, k)
Examples
Application Purpose Key Value
Phone Book phone name phone No.
Bank transaction
visa $$$
Dictionary lookup word meaning
compiler symbol variable type
www.google.com
search key words contents
… … … …
Summary So Far
rep’op’
array sorted array
linked list
sorted linked list
binarysearch tree
lookup()
O(n) O(lg n) O(n) O(n) O(n)
insert()
O(n) O(n) O(n) O(n) O(n)
delete()
O(n) O(n) O(n) O(n) O(n)
What’s the Problem?
For every mapping (k, v)s After we insert it into the dictionary dict,
we don’t know it’s position! Ex: insert (d, “li”, 97), (d, “wang”, 99),
(d, “zhang”, 100), … and then lookup (d, “zhang”);
(“li”, 97) …(“wang”,
99)(“zhang”,
100)
Basic Plan
Start from the array-based approach Use an array A to hold elements (k, v)s For every key k:
if we know its position (array index) i from k then lookup, insert and delete are simple:
A[i] done in constant time O(1)
…
(k, v)
i
Example
Ex: insert (d, “li”, 97), (d, “wang”, 99), (d, “zhang”, 100), …;and then lookup (d, “zhang”);
…
(“li”, 97)
?
Problem#1: How to calculate index from the given key?
Example
Ex: insert (d, “li”, 97), (d, “wang”, 99), (d, “zhang”, 100), …;and then lookup (d, “zhang”);
…
(“li”, 97)
?
Problem#2: How long should array be?
Basic Plan
Save (k, v)s in an array, index i calculated from key k
Hash function: a method for computing index from given keys
…
(“li”, 97)
hash (“li”)
Hash Function Given any key, compute an index
Efficiently computable Ideal goals: for any key, the index is uniform
different keys to different indexes However, thorough research problem, :-(
Next, we assume that the array is of infinite length, so the hash function has type: int hash (key k); To get some idea, next we perform a “case analy
sis” on how different key types affect “hash”
Hash Function On “int”// If the key of hash is of “int” type, the hash
// function is trivial:
int hash (int i)
{
return i;
}
Hash Function On “char”// If the key of hash is of “char” type, the hash
// function comes with type conversion:
int hash (char c)
{
return c;
}
Hash Function On “float”// Also type conversion:
int hash (float f)
{
return (int)f;
}
// how to deal with 0.aaa, say 0.5?
Hash Function On “string”// Example: “BillG”:// A trivial one, but not so good:int hash (char *s){ int i=0, sum=0; while (s[i]) { sum += s[i]; i++; } return sum;}
Hash Function On “Point”// Suppose we have a user-define type:struct Point2d{
int x;int y;
};
int hash (struct Point2d pt){ // ???}
From “int” Hash to Index Recall the type:
int hash (T data); Problems with “int” return type
At any time, the array is finite no negative index (say -10)
Our goal: int i ==> [0, N-1] Ok, that’s easy! It’s just:abs(i) % N
Bug! Note that “int”s range: -231~231-1
So abs(-231) = 231 Overflow!
The key step is to wipe the sign bit offint t = i & 0x7fffffff;int hc = t % N; In summary:hc = (i & 0x7fffffff) % N;
Collision
Given two keys k1 and k2, we compute two hash codes hc1, hc2[0, N-1]
If k1<>k2, but h1==h2, then a collision occurs
…
(k1, v1)
i
(k2, v2)
Collision Resolution
Open Addressing Re-hash Chaining (Multi-map)
Chaining
For collision index i, we keep a separate linear list (chain) at index i
…
(k1, v1)
i
(k2, v2)
k1
k2
General Scheme
k1
k2
k5k8
k43
Load Factor
loadFactor=numItems/numBuckets defaultLoadFactor: default value of the l
oad factor
k1
k2
k5k8
k43
“hash” ADT: interface#ifndef HASH_H#define HASH_H
typedef void *poly;typedef poly key;typedef poly value;
typedef struct hashStruct *hash;
hash newHash ();hash newHash2 (double lf);void insert (hash h, key k, value v);poly lookup (hash h, key k);void delete (hash h, key k);
#endif
Hash Implementation#include “hash.h”
#define EXT_FACTOR 2
#define INIT_BUCKETS 16
struct hashStruct
{
linkedList *buckets;
int numBuckets;
int numItems;
double loadFactor;
};
In Figure
k1
k2
k5k8
k43
buckets
loadFactor
numItems
numBuckets
h
“newHash ()”hash newHash (){ hash h = (hash)malloc (sizeof (*h)); h->buckets = malloc (INIT_BUCKETS * sizeof (linkedList));
for (…) // init the array
h->numBuckets = INIT_BUCKETS; h->numItems = 0; h->loadFactor = 0.25;
return h;}
“newHash2 ()”hash newHash2 (double lf){ hash h = (hash)malloc (sizeof (*h)); h->buckets=(linkedList *)malloc (INIT_BUCKETS * sizeof (linkedList));
for (…) // init the array
h->numBuckets = INIT_BUCKETS; h->numItems = 0; h->loadFactor = lf;
return h;}
“lookup (hash, key)”value lookup (hash h, key k, compTy cmp)
{
int i = k->hashCode (); // how to perform this?
int hc = (i & 0x7fffffff) % (h->numBuckets);
value t =linkedListSearch ((h->buckets)[hc], k);
return t;
}
Ex: lookup (ha, k43)
k1
k2
k5k8
k43
bucketsha
hc = (hash (k43) & 0x7fffffff) % 8;
// hc = 1
Ex: lookup (ha, k43)
k1
k2
k5k8
k43
bucketsha
hc = (hash (k43) & 0x7fffffff) % 8;
// hc = 1
compare k43 with k8,
Ex: lookup (ha, k43)
k1
k2
k5k8
k43
bucketsha
hc = (hash (k43) & 0x7fffffff) % 8;
// hc = 1
compare k43 with k43,
found!
“insert”void insert (hash h, poly k, poly v){ if (1.0*numItems/numBuckets >=defaultLoadFactor) // buckets extension & items re-hash; int i = k->hashCode (); // how to perform this? int hc = (i & 0x7fffffff) % (h->numBuckets); tuple t = newTuple (k, v);
linkedListInsertHead ((h->buckets)[hc], t); return;}
Ex: insert (ha, k13)
k1
k2
k5k8
k43
bucketsha
hc = (hash (k13) & 0x7fffffff) % 8;
// suppose hc==4
Ex: insert (ha, k13)
k13
k1
k5k8
k43
bucketsha
hc = (hash (k13) & 0x7fffffff) % 8;
// suppose hc==4
k2
Complexity
rep’op’
array sorted array
linked list
sorted linked list
hash
lookup()
O(n) O(lg n) O(n) O(n) O(1)
insert()
O(n) O(n) O(n) O(n) O(1)
delete()
O(n) O(n) O(n) O(n) O(1)
top related