Static Dictionaries
Post on 08-Jan-2016
52 Views
Preview:
DESCRIPTION
Transcript
Static Dictionaries• Collection of items.
• Each item is a pair. (key, element) Pairs have different keys.
• Operations are: initialize/create get (search)
Hashing• Perfect hashing (no collisions).
• Minimal perfect hashing (space = n).
• CHD (compress, hash, and displace) algorithm.
• O(n) time to construct the perfect or minimal perfect hash function.
• O(1) search time.
• Bothelo, Belazzougui & Dietzfelbinger. Compress, hash, and displace. 17th European Symposium on Algorithms, 2009.
Search Tree
• Hashing not efficient for extended operations such as range search and nearest match.
• Will examine a binary search tree structure for static dictionaries.
• Each item/key/element has an estimated access frequency (or probability).
ExampleKey Probability
a 0.8
b 0.1
c 0.1
a
b
c
a < b < c
Cost = 0.8 * 2 + 0.1 * 1 + 0.1 * 2
= 1.9
a
b
c
Cost = 0.8 * 1 + 0.1 * 2 + 0.1 * 3
= 1.3
Search Types
• Successful. Search for a key that is in the dictionary. Terminates at an internal node.
• Unsuccessful. Search for a key that is not in the dictionary. Terminates at an external/failure node.
f0
a
b
c
f1 f2 f3
Internal And External Nodes
• A binary tree with n internal nodes has n + 1 external nodes.
• Let s1, s2, …, sn be the internal nodes, in inorder.• key(s1) < key(s2) < … < key(sn).• Let key(s0) = –infinity and key(sn+1) = infinity.• Let f0, f1, …, fn be the external nodes, in inorder.• fi is reached iff key(si) < search key < key(si+1).
f0
a
b
c
f1 f2 f3
Cost Of Binary Search Tree
• Let pi = probability for key(si).
• Let qi = probability for key(si) < search key < key(si+1).
• Sum of ps and qs = 1.
• Cost of tree = 0 <= i <= n qi (level(fi) – 1)
+ <= i <= n pi * level(si)
• Cost = weighted path length.
f0
a
b
c
f1 f2 f3
Brute Force Algorithm
• Generate all binary search trees with n internal nodes.
• Compute the weighted path length of each.
• Determine tree with minimum weighted path length.
• Number of trees to examine is O(4n/n1.5).
• Brute force approach is impractical for large n.
Dynamic Programming• Keys are a1 < a2 < …< an.
• Let Ti j= least cost tree for ai+1, ai+2, …, aj.
• T0n= least cost tree for a1, a2, …, an.
a3
a5
a6
a7a4f2
f3
f5
f4
f7f6
• Ti j includes pi+1, pi+2, …, pj and qi, qi+1, …, qj.
T2,7
• Ti j= least cost tree for ai+1, ai+2, …, aj.
• ci j= cost of Ti j
= i <= u <= j qu (level(fu) – 1)
+ i < u <= j pu * level(su).
• ri j= root of Ti j.
• wi j= weight of Ti j
= sum of ps and qs in Ti j
= pi+1+ pi+2+ …+ pj + qi + qi+1 + … + qj
Terminology
a3
a5
a6
a7a4f2
f3
f5
f4
f7f6
T2,7
i = j• Ti j includes pi+1, pi+2, …, pj and qi, qi+1, …, qj.
• Ti i includes qi only.
fiTii
• ci i = cost of Ti i = 0.
• ri i = root of Ti i = 0.
• wi i = weight of Ti i
= sum of ps and qs in Ti i
= qi
i < j• Ti j= least cost tree for ai+1, ai+2, …, aj.
• Ti j includes pi+1, pi+2, …, pj and qi, qi+1, …, qj.
• Let ak, i < k <= j, be in the root of Ti j.ak
L R
Ti j
• L includes pi+1, pi+2, …, pk-1 and qi, qi+1, …, qk-1.
• R includes pk+1, pk+2, …, pj and qk, qk+1, …, qj.
a3
a5
a6
a7a4f2
f3
f5
f4
f7f6
cost(L)
• L includes pi+1, pi+2, …, pk-1 and qi, qi+1, …, qk-1.
• cost(L) = weighted path length of L when viewed as a stand alone binary search tree.
L
a3
a5
a6
a7a4f2
f3
f5
f4
f7f6
Contribution To cij
• ci j = i <= u <= j qu (level(fu) – 1)
+ i < u <= j pu * level(su).
• When L is viewed as a subtree of Ti j , the level of each node is 1 more than when L is viewed as a stand alone tree.
• So, contribution of L to cij is cost(L) + wi k-1.
L
ak
L R
Ti j
cij
• Contribution of L to cij is cost(L) + wi k-1.
• Contribution of R to cij is cost(R) + wkj.
• cij = cost(L) + wi k-1 + cost(R) + wkj + pk
= cost(L) + cost(R) + wij
ak
L R
Ti j
a3
a5
a6
a7a4f2
f3
f5
f4
f7f6
cij
• cij = cost(L) + cost(R) + wij
• cost(L) = cik-1
• cost(R) = ckj
• cij = cik-1 + ckj + wij
• Don’t know k.• cij = mini < k <= j{cik-1 + ckj} + wij
ak
L R
Ti j
a3
a5
a6
a7a4f2
f3
f5
f4
f7f6
cij
• cij = mini < k <= j{cik-1 + ckj} + wij
• rij = k that minimizes right side.
ak
L R
Ti j
Computation Of c0n And r0n
• Start with ci i = 0, ri i = 0, wi i = qi, 0 <= i <= n (zero-key trees).
• Use cij = mini < k <= j{cik-1 + ckj} + wij to compute cii+1, ri i+1, 0 <= i <= n – 1 (one-key trees).
• Now use the equation to compute cii+2, ri i+2, 0 <= i <= n – 2 (two-key trees).
• Now use the equation to compute cii+3, ri i+3, 0 <= i <= n – 3 (three-key trees).
• Continue until c0n and r0n(n-key tree) have been computed.
Computation Of c0n And r0n
cij, rij, i <= j
1 2 3 4
1
2
3
4
00
Complexity
• cij = mini < k <= j{cik-1 + ckj} + wij
• O(n) time to compute one cij.
• O(n2) cijs to compute.
• Total time is O(n3).
• May be reduced to O(n2) by using
cij = min ri,j-1 < k <= ri+1,j {cik-1 + ckj} + wij
Construct T0n
• Root is r0n.
• Suppose that r0n = 10.
a10 T0n
T09 T10,n
• Construct T09 and T10,n recursively.
• Time is O(n).
top related