Top Banner
© 2004 Goodrich, Tamassia Dictionaries 1 Dictionaries 6 9 2 4 1 8
24
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 1. Dictionaries1 4 =8 2004 Goodrich, Tamassia Dictionaries 1

2. Dictionary ADT The dictionary ADT models a Dictionary ADT methods: searchable collection of key- findElement(k): if the element items dictionary has an item with The main operations of akey k, returns its element, dictionary are searching, else, returns the special element NO_SUCH_KEY inserting, and deleting items insertItem(k, o): inserts item Multiple items with the same key(k, o) into the dictionary are allowed removeElement(k): if the Applications: dictionary has an item with address bookkey k, removes it from the credit card authorization dictionary and returns its element, else returns the mapping host names (e.g., special element cs16.net) to internet addresses NO_SUCH_KEY (e.g., 128.148.34.101) size(), isEmpty() keys(), Elements() 2004 Goodrich, Tamassia Dictionaries2 3. Log File A log file is a dictionary implemented by means of an unsorted sequence We store the items of the dictionary in a sequence (based on adoubly-linked lists or a circular array), in arbitrary order Performance: insertItem takes O(1) time since we can insert the new item at thebeginning or at the end of the sequence findElement and removeElement take O(n) time since in the worstcase (the item is not found) we traverse the entire sequence to lookfor an item with the given key The log file is effective only for dictionaries of small size or for dictionaries on which insertions are the most common operations, while searches and removals are rarely performed (e.g., historical record of logins to a workstation) 2004 Goodrich, Tamassia Dictionaries3 4. Lookup Table A lookup table is a dictionary implemented by means of a sorted sequence We store the items of the dictionary in an array-based sequence,sorted by key We use an external comparator for the keys Performance: findElement takes O(log n) time, using binary search insertItem takes O(n) time since in the worst case we have to shiftn/2 items to make room for the new item removeElement take O(n) time since in the worst case we have toshift n/2 items to compact the items after the removal The lookup table is effective only for dictionaries of small size or for dictionaries on which searches are the most common operations, while insertions and removals are rarely performed (e.g., credit card authorizations) 2004 Goodrich, Tamassia Dictionaries 4 5. Binary Search TreeA binary search tree is An inorder traversal of aa binary tree storing binary search treeskeys (or key-elementvisits the keys inpairs) at its internalincreasing ordernodes and satisfyingthe following property: Let u, v, and w be three 6nodes such that u is in 2 9the left subtree of v and wis in the right subtree of14 8v. We havekey(u) key(v) key(w)External nodes do notstore items 2004 Goodrich, TamassiaDictionaries5 6. SearchTo search for a key k,Algorithm findElement(k, v)we trace a downward if T.isExternal (v)path starting at the root return NO_SUCH_KEYif k < key(v)The next node visitedreturn findElement(k, T.leftChild(v))depends on theelse if k = key(v)outcome of thereturn element(v)comparison of k with theelse { k > key(v) }key of the current nodereturn findElement(k, T.rightChild(v))If we reach a leaf, thekey is not found and we< 6return NO_SUCH_KEY2 9Example: >findElement(4)14 = 8 2004 Goodrich, Tamassia Dictionaries 6 7. Insertion 6 To perform operation< insertItem(k, o), we search 2 9 > for key k 1 4 8 Assume k is not already in> the tree, and let let w be the leaf reached by the w search 6 We insert k at node w and expand w into an internal 2 9 node Example: insert 514 8 w 5 2004 Goodrich, Tamassia Dictionaries 7 8. Deletion6 To perform operation < removeElement(k), we 2 9 search for key k > 14 v 8 Assume key k is in the tree,w and let let v be the node5 storing k If node v has a leaf child w, we remove v and w from the tree with operation6 removeAboveExternal(w)2 9 Example: remove 41 5 8 2004 Goodrich, TamassiaDictionaries 8 9. Deletion (cont.)1 We consider the case where v3 the key k to be removed is stored at a node v whose 2 8 children are both internal 6 9 we find the internal node w wthat follows v in an inorder5traversal z we copy key(w) into node v we remove node w and its1left child z (which must be a vleaf) by means of operation 5removeAboveExternal(z)2 8 Example: remove 36 9 2004 Goodrich, TamassiaDictionaries 9 10. Performance Consider a dictionary with n items implemented by means of a binary search tree of height h the space used is O(n) methods findElement ,insertItem andremoveElement takeO(h) time The height h is O(n) in the worst case and O(log n) in the best case 2004 Goodrich, Tamassia Dictionaries 10 11. Ordered Dictionaries Keys are assumed to come from a total order. New operations: first(): first entry in the dictionary ordering last(): last entry in the dictionary ordering successors(k): iterator of entries with keys greater than or equal to k; increasing order predecessors(k): iterator of entries with keys less than or equal to k; decreasing order 2004 Goodrich, Tamassia Bucket-Sort and Radix-Sort 11 12. Hash Tables0 1 025-612-00012 981-101-00023 4 451-229-0004 2004 Goodrich, Tamassia Hash Tables12 13. Recall the Map ADT Map ADT methods: get(k): if the map M has an entry with key k, returnits assoiciated value; else, return null put(k, v): insert entry (k, v) into the map M; if key kis not already in M, then return null; else, returnold value associated with k remove(k): if the map M has an entry with key k,remove it from M and return its associated value;else, return null size(), isEmpty() keys(): return an iterator of the keys in M values(): return an iterator of the values in M 2004 Goodrich, TamassiaHash Tables13 14. Hash Functions and Hash Tables A hash function h maps keys of a given type to integers in a fixed interval [0, N 1] Example:h(x) = x mod N is a hash function for integer keys The integer h(x) is called the hash value of key x A hash table for a given key type consists of Hash function h Array (called table) of size N When implementing a map with a hash table, the goal is to store item (k, o) at index i = h(k) 2004 Goodrich, Tamassia Hash Tables 14 15. Example We design a hash table for0 a map storing entries as1025-612-0001 (SSN, Name), where SSN2981-101-0002 3 (social security number) is a 4451-229-0004 nine-digit positive integer Our hash table uses an array of size N = 10,000 and9997 9998 200-751-9998 the hash function 9999 h(x) = last four digits of x 2004 Goodrich, TamassiaHash Tables 15 16. Hash FunctionsA hash function isThe hash code isusually specified as theapplied first, and thecompression functioncomposition of twois applied next on thefunctions:result, i.e.,Hash code: h(x) = h2(h1(x))h1: keys integers The goal of the hashfunction is toCompression function:disperse the keys in h2: integers [0, N 1]an apparently randomway 2004 Goodrich, Tamassia Hash Tables16 17. Hash CodesMemory address: Component sum: We reinterpret the memory We partition the bits ofaddress of the key object asthe key into componentsan integer (default hash codeof fixed length (e.g., 16 orof all Java objects)32 bits) and we sum the Good in general, except for components (ignoringnumeric and string keysoverflows)Integer cast: Suitable for numeric keys We reinterpret the bits of theof fixed length greaterkey as an integer than or equal to the Suitable for keys of length number of bits of theless than or equal to the integer type (e.g., longnumber of bits of the integerand double in Java)type (e.g., byte, short, int andfloat in Java) 2004 Goodrich, Tamassia Hash Tables17 18. Hash Codes (cont.) Polynomial accumulation: Polynomial p(z) can be We partition the bits of theevaluated in O(n) timekey into a sequence ofcomponents of fixed lengthusing Horners rule:(e.g., 8, 16 or 32 bits) The followinga0 a1 an1polynomials are We evaluate the polynomialsuccessively computed,p(z) = a0 + a1 z + a2 z2 + each from the previous + an1zn1 one in O(1) timeat a fixed value z, ignoring p0(z) = an1overflowspi (z) = ani1 + zpi1(z) Especially suitable for strings(i = 1, 2, , n 1)(e.g., the choice z = 33 givesat most 6 collisions on a set ofWe have p(z) = pn1(z)50,000 English words) 2004 Goodrich, Tamassia Hash Tables 18 19. Compression Functions Division: Multiply, Add and h2 (y) = y mod N Divide (MAD): The size N of the h2 (y) = (ay + b) mod Nhash table is usually a and b arechosen to be a prime nonnegative integers The reason has to do such thatwith number theory a mod N 0and is beyond the Otherwise, everyscope of this course integer would map to the same value b 2004 Goodrich, TamassiaHash Tables 19 20. Example (ideal) hash function0kiwi Suppose our hash function1gave us the following values: 2 bananahashCode("apple") = 5 3 watermelonhashCode("watermelon") = 3hashCode("grapes") = 84hashCode("cantaloupe") = 7hashCode("kiwi") = 05applehashCode("strawberry") = 96 mangohashCode("mango") = 6hashCode("banana") = 27 cantaloupe8 grapes9 strawberry 2004 Goodrich, Tamassia 21. Collisions When two values hash to the same array location, this is called a collision Collisions are normally treated as first come, first servedthe first value that hashes to the location gets it We have to find something to do with the second and subsequent values that hash to this same location 2004 Goodrich, Tamassia 22. Collision Handling Collisions occur when 0 1 025-612-0001 different elements are2 mapped to the same3 cell4 451-229-0004 981-101-0004 Separate Chaining: let each cell in theSeparate chaining is table point to a linked simple, but requires list of entries that mapadditional memory there outside the table 2004 Goodrich, TamassiaHash Tables22 23. Linear probing A simple open addressing collision handling strategy is called linear probing. In this if we try to insert an item (k,e) into a bucket A[i] that is already occupied , where i=h(k), then we try next at A[(i+1)mod N]. If A[(i+1)mod N] is occupied then we try at A[(i+2)mod N] and so on, until we find the empty bucket in A that can accept the new item. 2004 Goodrich, TamassiaHash Tables23 24. Example 26,5,21,16,13,370 12345 6 7 89 1013 26 5 1637 21 New element with key=15 to be inserted0 12345 6 7 89 1013 26 5 163715 21 2004 Goodrich, Tamassia Hash Tables 24