1 CSE 326: Data Structures: Advanced Topics Lecture 26: Wednesday, March 12 th , 2003
Jan 01, 2016
2
Today
• Dynamic programming for ordering matrix multiplication– Very similar to Query Optimization in
databases
• String processing
• Final review
4
Ordering Matrix Multiplication
• One solution: (A B) (C D):
)( )=(
=
Cost: (3 2 4) + (4 2 3) + (3 4 3) = 84
5
Ordering Matrix Multiplication
• Anoter solution: (A (B C)) D:
( )) =(
Cost: (2 4 2) + (3 2 2) + (3 2 3) = 46
( ) = =...
6
Ordering Matrix Multiplication
Problem:• Given A1 A2 . . . An, compute optimal
ordering
Solution:• Dynamic programming• Compute cost[i][j]
– the minimum cost to compute Ai Ai+1 . . . Aj
• Proceed iteratively, increasing the gap = j – i
7
Ordering Matrix Multiplication/* initialize */for i = 1 to n-1 do cost[i][i] = 0 /* why ? */
/* dynamic programming */for gap = 1 to n do { for i = 1 to n – gap do { j = i + gap; c = ; for k = i to j-1 do /* how much would it cost to do (Ai . . . Ak ) (Ak+1 . . . Aj) ? */ c = min(c, cost[i][k] + cost[k+1][j] + A[i].rows * A[k].columns * A[j].columns) cost[i][j] = c; }}
/* initialize */for i = 1 to n-1 do cost[i][i] = 0 /* why ? */
/* dynamic programming */for gap = 1 to n do { for i = 1 to n – gap do { j = i + gap; c = ; for k = i to j-1 do /* how much would it cost to do (Ai . . . Ak ) (Ak+1 . . . Aj) ? */ c = min(c, cost[i][k] + cost[k+1][j] + A[i].rows * A[k].columns * A[j].columns) cost[i][j] = c; }}
= A[k+1].rows
8
Ordering Matrix Multiplication
• Running time: O(n3)
Important variation:
• Database systems do join reordering
• A very similar algorithm
• Come to CSE 544...
9
String Matching• The problem• Given a text T[1], T[2], ..., T[n]
and a pattern P[1], P[2], ..., P[m]
• Find all positions s such that P “occurs” in T at position s:(T[s], T[s+1], ..., T[s+m-1]) = (P[1], ..., P[m])
• Where do we need this ?– text editors (e.g. emacs)– grep– XML processing
11
Naive String Matching
/* initialize */for i = 1 to n-m do if (T[i], T[i+1], ..., T[i+m-1]) = (P[1], P[2], ..., P[m]) then print i
/* initialize */for i = 1 to n-m do if (T[i], T[i+1], ..., T[i+m-1]) = (P[1], P[2], ..., P[m]) then print i
running time: O(mn)
12
Knuth-Morris-Pratt String Matching
• main idea: reuse the work, after a failure
T= b a c b a b a b a b a c a b a b a c b a
P= a b a b a c a
fail !
P= a b a b a c a
reuse ! precompute on P
13
Knuth-Morris-Pratt String Matching
• The Prefix-Function:
[q] = the largest k < q s.t.
(P[1], P[2], ..., P[k-1]) = (P[q-k+1], P[q-k+2], ..., P[q-1])
14
1 2 3 4 5 6 7
P= a b a b a c a [7] = 1
1 2 3 4 5 6
P= a b a b a c [6] = 4
1 2 3 4 5
P= a b a b a [5] = 3
1 2 3 4
P= a b a b [4] = 2
P= a b a b
P= a b a
P= a b
[3] = [2] = [1] = 1
1 2 3 4 5 6 7 8
P= a b a b a c a [8] = 2
P= a b
15
Knuth-Morris-Pratt String Matching
/* compute */. . . .
/* do the matching */q = 0; /* q = where we are in P */for i = 1 to n do { q = q+1; while (q > 1 and P[q] != T[i]) q = [q]; if (P[q] = T[i]) { if (q=m) print(i – m+1); q = q+1; }}
/* compute */. . . .
/* do the matching */q = 0; /* q = where we are in P */for i = 1 to n do { q = q+1; while (q > 1 and P[q] != T[i]) q = [q]; if (P[q] = T[i]) { if (q=m) print(i – m+1); q = q+1; }}
Time = O(n) (why ?)
16
Knuth-Morris-Pratt String Matching
/* compute */[1] = 0;for q = 2 to m+1 do { k = [q – 1]; while (k > 1 and P[k – 1] != P[q – 1]) k = [k]; if (k> 1 and P[k – 1] = P[q – 1]) then k = k+1; [q] = k;}
/* do the matching */ . . .
/* compute */[1] = 0;for q = 2 to m+1 do { k = [q – 1]; while (k > 1 and P[k – 1] != P[q – 1]) k = [k]; if (k> 1 and P[k – 1] = P[q – 1]) then k = k+1; [q] = k;}
/* do the matching */ . . .
Time = O(m) (why ?)
Total running time of KMP algoritm: O(m+n)
17
Final Review
• Basic math– logs, exponents, summations– proof by induction
• asymptotic analysis– big-oh, theta, omega– how to estimate running times
• need sums
• need recurrences
18
Final Review
• Lists, stacks queues– ADT definition– Array, v.s. pointer implementation– variations: headers, doubly linked, etc
• Trees:– definitions/terminology (root, parent, child, etc)– relationship between depth and size of a tree
• depth is between O(log N) and O(N)
19
Final Review• Binary Search Trees
– basic implementations of find, insert, delete– worst case performance: O(N)– average case performance: O(log N) (inserts only)
• AVL trees– balance factor +1, 0, -1– known single and double rotations to keep it balanced– all operations are O(log N) worst case time
• Splay trees– good amortized performance– single operation may take O(N)– know the zig-zig, zig-zag, etc
• B-trees: know basic idea behind insert/delete
20
Final Review
• Priority Queues– binary heaps: insert/deleteMin, percolate
up/down– array implementation– buildheap takes only O(N) !! Used in HeapSort
• Binomial queues– merge is fast: O(log N)– insert, deleteMin are based on merge
21
Final Review
• Hashing– hash functions based on the mod function– collision resolution strategies
• chaining, linear and quadratic probing, double hashing
– load factor of a hash table
22
Final Review
• Sorting– elementary sorting algorithm: bubble sort, selection sort,
insertion sort– heapsort O(N log N)– mergesort O(N log N)– quicksort O(N log N) average
• fastest in practice, but O(N2) worst case performance• pivot selection – median of the three works well
– known which of these are stable and in-place– lower bound on sorting– bucket sort, radix sort– external memory sort
23
Final Review
• Disjoint sets and Union-Find– up-trees and their array-based implementation– know how union-by-size and path compression
work– know the running time (not the proof)
24
Final Review
• graph algorithms– adjacency matrix v.s. adjacency list representation
– topological sort in O(n+m) time using a queue
– Breadth-First-Search (BFS) for unweighted shortest path
– Dijkstra’s shortest path algorithm
– DFS
– minimum spanning trees: Prim, Kruskal
25
Final Review
• Graph algorithms (cont’d)– Euler v.s. Hamiltonian circuits– Know what P, NP and NP-completeness mean
26
Final Review
• Algorithm design techniques– greedy: bin packing– divide and conquer
• solving various types of recurrence relations for T(N)
– dynamic programming (memoization)• DP-Fibonacci• Ordering matrix multiplication
– randomized data structures• treaps• primality testing
• string matching• Backtracking and game trees
27
The Final
• Details:– covers chapters 1-10, 12.5, and some extra material
– closed book, closed notes except:• you may bring one sheet of notes
– time: 1 hour and 50 minutes
– Monday, 3/17/2003, 2:30 – 4:20, this room
– bring pens/pencils/etc
– sleep well the night before