8/3/2019 ALGO Notes
1/62
ENGRECE 235: Design and Analysis of AlgorithmsCourse Code: 15845Quarter: Fall 2003
Professor Pai H. Chou
Electrical Engineering and Computer Science
September 26, 2003
8/3/2019 ALGO Notes
2/62
Contents
1 Introduction to Analysis and Design of Algorithms 5
1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Whats an algorithm? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Classes of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Example: sorting problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.4 Many sorting algorithms available: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Execution model assumption: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Algorithm runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Algorithm design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3.1 Approaches: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Example: merge sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Problem complexity vs. Algorithm Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Growth of Functions, Data structures 8
2.1 Asymptotic Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 big O = asymptotic upper bound: (page 44) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.2 big = asymptotic lower bound (page 45) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.3 big = asymptitic tight bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.4 little-o, little- not asymptotically tight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Polynomials, log, factorial, fibonacci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Polynomial in n of degree d: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.2 logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Factorial: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.4 Fibonacci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.5 misc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Python: Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Data Structures Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.1 Array implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Stack and Calls/Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.3 Linked List x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.4 Binary search tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.5 rooted trees w/ unbounded branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Recurrences; QuickSort 11
3.1 Recurrence: by Substitution method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Recursion tree, convert to summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 The Master Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 QuickSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.1 Worst case: uneven partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4.2 Best case: balanced partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.3 Average case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1
8/3/2019 ALGO Notes
3/62
4 Heap sort, Priority Queues 15
4.1 Heap data structure: Binary heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Heapify: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 BuildHeap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3.1 Complexity: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3.2 HeapSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.4 Python Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.5 Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.5.1 Heap Insert: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.5.2 Extract max: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.5.3 Heap Increase key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Linear Time sorting, Medians/Order Stat 18
5.1 Comparison-based sorting: prob. complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Non-comparison sorts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2.1 Counting sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2.2 RadixSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2.3 Bucket sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3 Median and Order statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3.1 Find min? max? both min and max? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3.2 How to find the i-th smallest w/out sorting? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3.3 Can show that worst-case select is O(n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.3.4 Justification: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6 Binary Search Trees and Red-Black Trees 22
6.1 Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.1.1 API for BST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.2 Python Programming with Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2.1 Node data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2.2 Tuple-encoded Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.3 Red-Black Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.3.1 Five Properties: (important!!) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.3.2 How to maintain the Red-Black Tree properties? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.3.3 Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.3.4 Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.3.5 Tree data structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7 Dynamic Programming 27
7.1 Optimization problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.1.1 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.2 Case study: Assembly line scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.3 Matrix-Chain multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.3.1 optimial parenthesization to inimize # scalar mults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.4 Longest common subsequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.5 Optimal Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
8 Greedy Algorithm 31
8.1 Activity Selection problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
8.2 Greedy choice property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
8.3 Huffman codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
9 Basic Graph Algorithm 35
9.1 Graphs G = (V,E) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359.2 Breadth-first search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
9.3 Depth-first search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
9.4 Topological sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9.5 Strongly Connected Components (SCC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
9.6 Python Programming: How to represent a graph!? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
9.6.1 Simple: Directed graph, no weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
9.6.2 Weighted graph, use dictionary of dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2
8/3/2019 ALGO Notes
4/62
9.6.3 Wrap data structure in a class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
10 Minimum Spanning Trees 40
10.1 Min-spanning trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
10.2 Kruskals algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
10.2.1 Runtime analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
10.3 Prims algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
11 Single Src Shortest Paths 43
11.1 DAG shortest paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
11.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
11.2 D ijkstras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
11.2.1 Algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
11.2.2 Example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
11.2.3 Correctness proof sketch: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
11.2.4 What about negative weights? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
11.3 B ellman-Ford . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
11.3.1 Algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
11.3.2 Correctness: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
11.3.3 Complexity: O(V E) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4511.4 Bellman-Ford: special case of linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
11.4.1 Bellman Ford as a Constraint Solver! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
12 All Pairs Shortest Paths 47
12.1 Special case Linear Programming (contd) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
12.1.1 Correctness sketch: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
12.2 All Pairs shortest path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
12.2.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
12.2.2 Repackage this as a matrix structure: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
12.2.3 Floyd-Warshall: O(V3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4812.2.4 Algorithm - very simple! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
12.3 Transitive closure: E (Warshalls) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4812.4 Sparse graphs: Johnsons algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
12.4.1 Proof weight works: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
13 Flow Networks 50
13.1 F low network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
13.1.1 Maxflow methods: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
13.2 FOR D-F ULKERSON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
13.3 EDMONDS-K AR P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
13.4 Maximal bipartite matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
14 NP-Completeness 53
14.1 Other Graph problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
14.1.1 Euler Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
14.1.2 Hamiltonian cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
14.1.3 Traveling Salesman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
14.1.4 Constructive vs. Decision vs. Verification Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
14.2 Problem complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
14.2.1 Formal Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
14.2.2 Decision problem in Formal language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
14.2.3 Decision in polynomial time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
14.2.4 P, NP, co-NP in terms of formal languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
14.3 R eduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
14.4 NP, NP-complete, NP-hard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
14.4.1 How to show L NP-Complete? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5514.4.2 Examples Circuit Satisfiability: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
14.4.3 NP completeness proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
14.5 NPC Proof for (Formula) Satisfiability problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
14.5.1 3-CNF satisfiability (3-SAT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3
8/3/2019 ALGO Notes
5/62
14.6 NPC proof contd: k-clique problem (CLIQUE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
15 Approx. Algorithms 58
15.1 k-VERTEX-COVER problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
15.1.1 Observation: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
15.1.2 Approximation algorithm for Vertex Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
15.2 Hamiltonian (HAM-CYCLE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
15.3 Traveling Salesman Problem (TSP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5915.3.1 Approximated TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
15.3.2 General TSP: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
15.4 Subset-sum problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
15.4.1 Exact solution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
15.4.2 NP Complete proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
15.4.3 polynomial-time approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
15.5 Set Covering problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
15.5.1 Greedy approximation for SET-C OVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4
8/3/2019 ALGO Notes
6/62
Lecture 1
Introduction to Analysis and Design of Algorithms
Administrative
Course syllabus: http://e3.uci.edu/03f/15845/syllabus.html
1.1 Algorithm
1.1.1 Whats an algorithm?
input, output problem statement
sequence of steps
data structures
must always halt and output must satisfy property
1.1.2 Classes of algorithms
sorting (classified by problem)
tree traversal, graph algorithms (classified by data struc-tures / representation)
dynamic programming (classified by technique)
incremental, greedy, divide-&-conquer algorithm (clas-sified by decision criteria)
Why this class?
fundamental cross cutting applications
analysis aspect: problem complexity, what problems areeasy/hard, how to quantify
many solutions to the same problem
many applications of a given solution
1.1.3 Example: sorting problem
input: array A = [a1, a2, . . .an];
output: array A = [a1, a2, . . . ,a
n] such that a
1 a2
. . . an
1.1.4 Many sorting algorithms available:
insertion sort
See Chapter 2 Sec. 2.1 for explanation; We will revisit sorting
later.
idea: partition A into two: [1 . . .(j 1)] already nonde-creasing, [j . . . n] not yet sorted.
grow this sorted subarray by one more element
insert it into the tail of the sorted one, but need to makeit sorted
shift everybody up by one until the nondecreasingproperty is satisfied again
done when j
1 = n
5 2 4 6 1 3input A
5 2 4 6 1 3
sortedA[1..j-1]
not sortedA[j .. n]
A[j]
5 4 6 1 3
new A[j]
j = 2
j = 3
2 5 6 1 3
2
4j = 4
new A[j]
j =52 5 64 1 3
new A[j]
j =62 5 64 3
new A[j]
1
Output 2 5 641 3
5
8/3/2019 ALGO Notes
7/62
1.2 Analysis
1.2.1 Execution model assumption:
RAM (random access machine) as opposed to stack,queue, or FSM!
sequential instruction execution
alternative assumptions: parallelism, communication,synchronization
associative memory (content-addressable)
data structures pointers/linked list instead of copying
1.2.2 Algorithm runtime
measure it, but want a formula T(n) where n is problemsize
want to factor out processor speed as a scaling factor
worst case, best case, average case
algorithm memory consumption (visit later)
Chapter 3 deals with Order of growth (n2)
Insertion sort: O(n2)
Quick sort: O(n2) worst case, O(n lg n) avg.
Merge sort, Heap Sort: (n lgn)
1.3 Algorithm design
1.3.1 Approaches:
incremental
divide-and-conquer (e.g., MergeSort, QuickSort)
greedy (e.g., Min-Spanning Tree)
backtracking
randomization
genetic
1.3.2 Example: merge sort
divide array into two halves
sort left and sort right
interleave the two halves
uses Recursion
(n lg n) complexity, where lg = log base 2
but uses at least twice as much memory in order to keepmerge linear
5 2 4 6 1 3input A[1:6]
5 2 4MergeSortA[1:3] 6 1 3
MergeSortA[4:6]
(1)
(2)
MergeSortA[1:2]
(3)5 2
(4)
4
5
MergeSort
A[1:1]
(5)MergeSort
A[2:2]2
(6) Merge(A[1:1],A[2:2])
2 5
MergeSortA[3:3]
(7)
4
(8) Merge(A[1:2],A[3:3])
2 4 5 1 3 6
(9)
1 2 3 4 5 6
Merge(A[4:5],A[6:6])
(15)
(16) Merge(A[1:3], A[4:6])
output A[1:6]
1.4 Problem complexity vs. Algorithm Com-
plexity
Measures how hard the problem is
Bounds the complexity of all algorithms for this problem
Example: Cant do better than O(n lg n) for comparision-based sorting
But can do better (linear time) using non-comparison-
based algo.
The problem itself may be hard! No efficient algorithmexists for those problems.
NP-Complete problems:Guess an answer, verify (yes/no) in polynomial time;
otherwise exponential or worse.
Approximation algorithms: not minimum/optimal cost,but comes within a factor.
1.5 Data Structures
fixed: arrays, matrices
growable: stacks, queues, linked list, tree, heap
key addressable: hash table
structures: graphs, flow networks
1.6 Python
Scripting language, free!
Already installed on unix; download for any platform
Easy, powerful, interactive
Best way is to try it out; helps you debug algorithm
Tutorial at http://www.python.org/doc/current/tut/tut.html
6
8/3/2019 ALGO Notes
8/62
Important features
No Semicolons, no C-style { }
Indentation is significant
Loosely typed
# starts a comment, unless inside string
Interactive mode
% python
>>> 2 + 35
>>> x = (2 + 3) * 4>>> x20
>>> print "hello world"hello world
Lists and Arrays
>>> x = [ 1 , 2 , 3 ]>>> x[1, 2, 3]
>>> x[0]1
>>> x.append(20)[1, 2, 3, 20]
>>> len(x) # get the number of elements4
>>> x.pop() # pop from tail like a stack20
>>> x[1, 2, 3]
>>> x.pop(0) # pop from head like a queue1
>>> x[2, 3]
You can have other data types in lists and mix them:
>>> dayofweek = [Sun,Mon,Tue,Wed]>>> dayofweek[3]Wed
Hash Table (Dictionary)
Q: What if we want to look up the number from the name?
Hash tables let you use the name string as an index. How-
ever, one big difference is that lists are ordered, whereas hash
tables are not.
>>> d = {} # make an empty hash table>>> d[Sun] = 0
>>> d[Mon] = 1>>> d[Tue] = 2>>> d[Wed] = 3>>> d{Wed: 3, Tue: 2, Mon: 1, Sun: 0}>>> d[Tue]2
>>> d.has_key(Mon)1 # this means true
>>> d.has_key(Blah blah)0 # this means false
>>> d.keys()[Wed, Tue, Mon, Sun]
7
8/3/2019 ALGO Notes
9/62
Lecture 2
Growth of Functions, Data structures
This lecture:
big O, big , and big , how they grow
fundamental data structures
2.1 Asymptotic Notations
purpose: express run time as T(n), n = input size
assumption: n N= {0,1, 2, . . .} natural numbers
2.1.1 big O = asymptotic upper bound: (page 44)
f(n) = O(g(n)) is a set of functions that are upper-bounded by 0 f(n) c g(n) for some n n0
f(n) grows no faster than g(n) example: 2n2 = O(n3), for c = 1, n0
2
funny equality, more like a membership: 2n2 O(n3) n2 and n3 are actually functions, not values.
sloppy but convenient
2.1.2 big = asymptotic lower bound (page 45)
swap the place between f(n) and c g(n):0 c g(n) f(n) for some n n0
example:
n = (lg n), for c = 1, n0 = 16
2.1.3 big = asymptitic tight bounds
need two constants c1 and c2 to sandwich f(n).
example: 12 n2 2n = (n2) for c1 = 14 , c2 = 12 , n0 = 8
Theorem: (O and )
g(n)
(n)
O(g(n))
0
g(n)
(g(n))
f(n)
0 0
g(n)
g(n)
f(n)
(g(n))
Example: insertion sort
O(n2) worst case, (n) best caseT(n) = c1 n // for loop
+ c2 (n1) //key A+
n
j=2Insert(j) //loop to insert j+ c8 (n1) //A[i + 1] key Best case of Insert(j) is (1), Worst is (j)
Issue: (n2) +(n) is still (n2)
Example: MergeSort(A,p, r)
where p, rare the lower and upper bound indexes to array A.
if p < r thenq := (p + r)/2MergeSort(A,p, q), MergeSort(A, q + 1, r)Merge(A,p, q, r)
T(n) =
(1) ifn = 12 T( n2 ) +(n) ifn > 1
T(n) is expressed in terms ofT itself! This is called recur-rence. We want a closed-form solution (no T on the right-hand-
side of the equation).
To solve recurrence:
rewrite as T(n) =
c ifn = 12 T( n2 ) + c n ifn > 1
T(n) = 2T( n2 ) + cn Expand
= 2(2T( n/22 ) + c n2 ) + cn= 4T( n4 ) + 2c
n2 + cn = 4T(
n4 ) + 2cn
= 8T( n8 ) + 3cn = . . .
= (2i)T( n2i ) + i cn We can divide n by 2 at most lg n times before n = 1. So,
T(n) = (lg n) cn + cn = (n lg n).Chapter 4 will show how to solve recurrence systematically.
2.1.4 little-o, little- not asymptotically tight
o(g(n)) = {f(n) : c > 0n0 > 0 : 0 f(n) < cg(n) n n0}(g(n)) = {f(n) : c > 0n0 > 0 : 0 cg(n) < f(n) n n0}
8
8/3/2019 ALGO Notes
10/62
Example
2n2 = O(n2) is a tight bound 2n2 = o(n2). 2n = O(n2) is not tight 2n = o(n2). does not make sense to have little-
2.2 Polynomials, log, factorial, fibonacci
2.2.1 Polynomial in n of degree d:
p(n) = di=0 aini = (nd)
Polynomially bounded: f(n) = O(nk) for constant k.
2.2.2 logarithm
useful identity: alogb c = clogb a.
polylogarithmically bounded if f(n) = O(lgkn) for con-stant k
grows slower than any polynimial: lgb
n = o(na) for anyconstant a > 0.
log-star: lg n = min{i 0 : lg(i) n 1}(log of log of log ... of log ofn) grows very very slowly.
Example: lg(265536) = 5.
2.2.3 Factorial:
Loose bound: n! is faster than 2n but slower than nn
precisely, n! = (nn+1/2e lg n) (Stirlings approxima-tion)
lg(n!) = (n lg n)
2.2.4 Fibonacci
grows exponentially
2.2.5 misc
22n
grows faster than n!
(n + 1)! grows faster than n!
n! grows faster than n 2n faster than 2n
2lg n = n
2.3 Python: Functions
function definition: deffunctionName (paramList) :
variables have local scope by default
control flow: for, while, if
range(a, b) function: returns the list [a, ... b 1]>>> range(1,5)[1, 2, 3, 4]
Algorithm from book (p.24) Python code (fileisort.py)
INSERTION-S ORT(A) defInsertionSort(A):1 for j 2 to length[A] for j in range(1, len(A)):2 do key A[j] key = A[j]3 i j 1 i = j 14 while i > 0 and A[i] > key while (i>= 0) and (A[i]>key5 do A[i + 1] A[i] A[i+1] = A[i]6 i
i
1 i = i
1
7 A[i + 1] key A[i+1] = key
Dont forgetthe colon : for def, for, and while constructs
BookA[1 . . . n] Python A[0 . . . n 1]Books 2 . . . n Python 1 . . . n 1.
line 1: range(1,len(A)) [1...len(A)-1]
% python
>>> from isort import * # read file isort.py>>> x = [2,7,3,8,1] # create test case>>> InsertionSort(x) # call routine>>> x # look at result[1, 2, 3, 7, 8]
Works for other data types, too!
2.4 Data Structures Review
Concepts:
Stacks: Last-in First-out
Queue: First-in First-out
Trees: Root, children
2.4.1 Array implementation
Stack S: need a stack pointer (top)
Push(x): S[++top] xPop(): return S[top--]
Note: Actually stacks can grow downward also.
Note: To be safe, check stack underflow/overflow.
Queue Q: need 2 pointers (head, tail)
Enqueue(x): Q[tail] xtail
(tail mod len(Q) ) + 1
Dequeue(): temp Q[head]head (head mod len(Q) ) + 1return temp
Note: To be safe, check queue underflow/overflow.
2.4.2 Stack and Calls/Recursion
Calling routine Push local, param, return address
Return Pop and continue
9
8/3/2019 ALGO Notes
11/62
Example: MergeSort (cf. MergeSort Fig. from last lecture)
L (Left), R (Right), M (Merge)
(1) (2) (3) (4) (5) (6) (7) (8)
[1, 6] [1, 6]L
[1, 3]
[1, 6]L
[1, 3]L
[1, 2]
[1, 6]L
[1, 3]L
[1, 2]L
[1, 1]
[1, 6]L
[1, 3]L
[1, 2]R
[1, 6]L
[1, 3]L
[1, 2]R
[2, 2]
[1, 6]L
[1, 3]L
[1, 2]M
[1, 6]L
[1, 3]R
[1, 6]L
[1, 3]R
[3, 3]
[1, 6]L
[1, 3]M
[1, 6]R
2.4.3 Linked List x
Singly linked (next) or Doubly linked (next, prev)
need a key
need a NI L value
How to implement linked lists?
Pre-allocate arrays prev[1 : n], next[1 : n], key[1 : n], andx is an index {0 . . . n}, where 0 is NI L.
Records (structures in C, classes in Java): x.prev,x.next, x.key, and NULL for NI L.
Rearrange by adjusting links, not moving data (key)
Recall: INSERTIONSORT with array need to copy objectto insert.
Linked list representation: once we know where to in-sert, adjust link in O(1) time.
Array version: best case O(1), worst case O(n), averagecase O( n2 ) = O(n).
2 5 641
2 5 641 6
2 541 65
2 541 64
2 531 64
2 5 641
insert(3)
3
2 5 641
(1) alllocate anode for thedata
headinsert(3)
(2) adjust links
3
Search for the item
e.g., where to insert?
list is unsorted, worst/average case O(n) time to find it!(compare keys one at a time).
list is sorted:
Array: binary search in O(lgn) times
Singly linked list: O(n) avg/worst case, need to dolinear search
Doubly linked list: does not help! still O(n) cant
jump to the middle of the list, because link is to thehead/tail.
2.4.4 Binary search tree
pointer to root; each node has pointer to left-child, right-child
optional pointer to parent
left smaller, right bigger key value O(lgn) search, if BALANCED
Can still be O(n) if not balanced!!
Red-black tree (Ch.13) is a way to keep BST balanced
a Heap (Ch.6) implements balanced BST using array; nopointers
18
2 6 1341
head tail
18 20
6
root
DoublyLinkedList
Binary(Search)Tree
2
1 4 13 20
n-ary treew/ left-child,right-sibling
18
6
root
2
1 4 13 20
2.4.5 rooted trees w/ unbounded branches
Could have arbitrary number of children!
left-child: head of linked list to children (down one level)
right-sibling: link to next on same level
optional link back to parent
10
8/3/2019 ALGO Notes
12/62
Lecture 3
Recurrences; QuickSort
This lecture:
Chapter 4: Recurrences, Master method
Substitution method
Recursion tree Master method: T(n) = aT(n/b) + f(n)
Quicksort (Ch. 7.17.2) (if we have time today)
Summation review (Appendix A)
3.1 Recurrence: by Substitution method
guess-and-verify
rename variables if necessary.
mathematical induction: assume true for n0, show truefor n
n
0.
issue: boundary conditions, floor/ceiling use inequal-ity to bound those cases.
avoid pitfall with constant terms.
Example w/ floor
T(n) =
(1) ifn = 1,2T(n/2) + n ifn > 1.
Guess T(n) = O(n lgn).
sufficient to show T(n) cn lg n for some c > 0.
Assume T(n/2) cn/2 lg(n/2).T(n) 2(cn/2 lg(n/2)) + n substitution
cn lg(n/2) + n drop floor
= cn lg n cnlg 2 + n (lg(a/b) = lg a lg b)
= cn lg n cn + n (lg 2 = 1)
cn lg n c 1
.
Avoid Pitfalls
What if we guessed T(n) = O(n)??
Try to show T(n) cn
T(n) 2(cn/2) + n subst. into T(n) 2T(n/2) + n
cn + n drop floor
= (c + 1)n factor out const
cn Fails!
.
must be careful not to drop constants arbitrarily.
subtractive term in guessed solution
Example: T(n) = T(n/2) + T(n/2) + 1
Guess: O(n). First attempt: verify T(n) cn:
T(n) cn/2+ cn/2+ 1
= cn + 1 cn
Oops! But this does not mean T(n) = O(n).
works if we guess T(n) cn b for some b 0:
T(n) (cn/2b) + (cn/2b) + 1
= cn 2b + 1
cn b works for b 1.
11
8/3/2019 ALGO Notes
13/62
Variable Renaming
introduce a new variable to replace a difficult expression, make
it easier to solveExample T(n) = 2T(n) + lgn
Rename: let m = lgn: T(n = 2lg n = 2m) = 2T(2m2 ) + m
Rename: let S(m) = T(2m) S(m) = 2S( m2 ) + m
This looks like S(m) = O(m lg m)
Substitute back, lg n = m: T(2m = 2lg n = n) = S(m)
= O(m lg m) = O((lg n) lg(lgn)).
3.2 Recursion tree, convert to summation
Example: T(n) = 3T(n/4) +(n2).
Visualize as recursion tree: T(n) = 3T(n/4) + cn2
forsome c > 0
cn2
T(n/4) T(n/4) T(n/4) = c(n/4)2
T(n/16) T(n/16) T(n/16)
cn2
(3/16) cn2
T(1) T(1) T(1)
Convert to summation:
T(n) = cn2 + 316 cn2 + ( 316 )
2cn2 + + ( 316 )Levels1 + width
# Levels = 1 + log4 n, similar argument to MergeSort.root is 1 level, divide log4 n times.
width at level i is 3i, so at the bottom level, width is3log4 n.
Apply the identity alogb c = clogb a, rewrite 3log4 n =nlog4 3.
T(n) =(log4 n)1i=0
(3
16)icn2 +(nlog4 3)
1
intuition: divide-and-conquer
a-way branch (recursive calls) per level,
each subproblem size = n/b,
Combine cost is f(n).
#levels = logb n
#nodes(at level i) = ai
#nodes(at leaf level) = alogb n = nlogb a
Example: MERGESORT
a = 2, b = 2, f(n) = (n) floor/ceiling dont matter
12
8/3/2019 ALGO Notes
14/62
Three cases, compare f(n) to nlogb a
if f(n) = then T(n) =
1. O(n(logb a)) (nlogb a)2. (nlogb a) (nlogb a lgn).
3. (n(logb a+) (f(n)).and a f(n/b)
c f(n)
for const c < 1, n n0,Restated:
1. if f(n) is polynomially slower than #leaves = n(logb a),then
# leaves dominate cost; its(alogb n) = (nlogb a)
2. f(n) (conquer) and nlogb a (divide) are equally fast,(with lg n levels) then its (nlogb a lgn).
3. The conquer cost f(n) is dominant.
Important note:
Master method does not cover all recurrences!
f(n) and nlogb a comparison must be differ polynomially(i.e., n); smaller differences dont apply.
Actually, a more general result for case 2:
iff(n)
nlogb a= (lgkn) for constant k 0
then T(n) = (nlogb a lgk+1 n)
Examples
T(
n) = 9
T(
n/3) +
n
f(n) = n, a = 9, b = 3, nlogb a = nlog3 9 = (n2)
case 1, because f(n) = O(nlog3 9) = O(n21),within a polynomial factor
T(n) = nlogb a = (n2).
T(n) = T(2n/3) + 1
a = 1, b = 3/2, f(n) = 1, nlogb a = nlog3/2 1 = n0 = 1
case 2, because f(n) = 1 = (nlogb a).
T(n) = (n0 lg n) = (lg n)
T(n) = 3T(n/4) + n lg n
a = 3, b = 4, f(n) = n lg n, nlogb a = nlog4 3
case 3: because f(n) = n lg n = (nlog4 3+)
T(n) = (f(n)) = (n lg n)
3.4 QuickSort
Divide and Conquer:
pick a pivot to serve as the dividing point
partition array A around the pivot: one side with ele-ments smaller than pivot, one side larger
recursively sort two sublists
no need for merge!
2 8 7 1 3 5 6 4Input
A[p, r]
pick pivot x = 4 (from A[r])
4
after PARTITION, want to get
x x >xin this case:
42, 1, 3 7, 5, 6, 8
QUICKSORTrecursively
QUICKSORTrecursively
QUICKSORT (A,p, r)1 if p < r2 then q PARTITION(A,p, r)3 QUICKSORT (A,p, q1)4 QUICKSORT (A, q + 1, r)
Algorithm from book (p.146) Python code
PARTITION(A,p, r) defPartition(A,p,r):1 x A[r] x = A[r]2 i p1 i = p - 13 for j p to r1 for j in range(p,r):4 do ifA[j] x if(A[j]
8/3/2019 ALGO Notes
15/62
T(n) = max0qn1
(T(q) + T(n q 1)) +(n)
max0qn1
(cq2 + c(nq 1)2) +(n)
= c max0qn1
(q2 + (n q1)2) +(n)
c(n 1)2
+(n) = c(n2
2n + 1) +(n)= cn2 c(2n 1) +(n) cn2
3.4.2 Best case: balanced partitioning
T(n) 2T(n/2) +(n) Can apply 2nd case of the Master theorem:
if f(n) = (nlogb a), then T(n) = (nlogb a lg n).
T(n) = (nlog2 2=1 lg n) = (n lg n)
3.4.3 Average case
(next time)
14
8/3/2019 ALGO Notes
16/62
Lecture 4
Heap sort, Priority Queues
This lecture:
Quick sort (continued from last time)
Heap property, Heap sort
Python implementation (for homework)
4.1 Heap data structure: Binary heap
nearly balanced binary tree
use array instead of pointer data structure
Definition:
parent(i) = i/2
leftChild(i) = 2i
rightChild(i) = 2i + 1
heap property
by default: Max Heap: A[parent(i)] A[i]Min Heap: A[parent(i)] A[i]
height of heap as a binary tree is (lg n).
x x
(max) heap
zs Left and Rboth z
x
y z
x
y z
ys L and Rboth y
1
2, 3
4, 5, 6, 7
y, z both x
index
Examples: Are these (max)-heaps?
[7, 5, 6,2, 3, 4]?(a) 7 5,6; (b) 5 2, 3; (c) 6 4 Yes.
[7, 6, 5,4, 3, 2]?(a) 7 6,5; (b) 6 4, 3; (c) 5 2 Yes.
[2,5, 7, 4,3, 6]? No. However, it is almost a heap if we dont considerthe root: (b) 5 4, 3; (c) 7 6.This can be fixed with a call to H EAPIFY.
Heap API
HEAPIFY(A, i) (incremental fix) assume L and R subtreesofi are heaps, make A[i . . . n] a heap.
BUILDHEA P(A) input unsorted array, rearrange to make it a heap
HEA PSORT (A) sorting alg., uses HEAPIFY and BUILDHEA P
4.2 Heapify:
Assumption: called when L(i), R(i) are heaps, but
A[i] itself isnt
its children
after call: propagate the largest of children up to rootposition, so that the whole thing is a heap again.
MAX -H EAPIFY(A, i, n) (see page 130, slightly different)1 l LEF T(i) index for left child2 r RIGHT(i) index for right child3 if(l n) and (A[l] > A[i])4 then largest l left child exists and is bigger5 else largest i either no L or root is bigger6 if(r n) and (A[r] > A[largest])7 then largest r right child exists and is bigger
no else clause: either no R or root is bigger. already taken care of by line 5
8 iflargest= i9 then exchange A[i] A[largest]
10 MAX -H EAPIFY(A, largest, n)
2
5 7
4 3 6
index
1
2, 3
4 7
largest 7
5 2
4 3 6
largest
7
5
24 3
6
done
Correctness: by induction.
15
8/3/2019 ALGO Notes
17/62
Time complexity of Heapify
O(lg n) binary tree argument.
Or, O(h) where h = height of the heap as a binary tree.
4.3 BuildHeap
input unsorted array, output a heap after the call
leaf = trivial heap, start with one level above the leaf
BUILD-M AX -H EA P(A) (page 133)1 n length[A]2 for i n/2 downto 13 do MAX -H EAPIFY(A, i, n)
7
2
3 4
5 6 7
index
1
2, 3
4 7
7
6
25 3
4done
HEAPIFY 2
3
5 6 4
2
6 7
5 3 4
HEAPIFY2
3 4
5 6 7
trivial heaps
1
2, 3
4 7
HEAPIFY
Build (Max) Heap:
7
6
5 3
2
4
largestlargest
Correctness: assumption for Heapify always hold.
4.3.1 Complexity:
loose upper bound is O( n2 lg n) = O(n lg n)MAX -H EAPIFY is O(lg n) iterate n/2 times.
However! tighter bound is O(n).HEAPIFY near leaf level is much cheaper than near root
level!
Height ofn-element heap = lg n. (binary tree) # nodes at level h is n/2h+1
Height 0 (leaf) n/20+1 = n/2 nodes.One level higher half as many nodes
HEAPIFY called on a node at height h is O(h) (see lastpage if you forgot)
T(n) =lg n
h=0
(#of HEAPIFY calls at level h)
Time of HEAPIFY at level h
=lg n
h=0
(n
2h+1)O(h),factor out n, mult. by 2,
= O(nlg n
h=0
h
2h).
Apply the integrating and differentiating series (Ap-pendix A)
k=0
kxk =x
(1 x)2 .
plug in x = 12 , we get
h=0
(h
2h) =
(1/2)
(1 1/2)2= 2.
T(n) = O(n
2) = O(n)
4.3.2 HeapSort
Use HEAPIFY to find the largest, stick it in the end A[n]
Use HEAPIFY on [1 . . . n 1] to find the second largest,stick it in position A[n 1]
HEA PSORT(A) (page 136, slightly different)1 BUILD-M AX -H EA P(A, n) O(n) time2 for i n downto 2 n 1 times3 do exchange A[1] A[i] O(1) time4 MAX -H EAPIFY(A,1, i 1) O(lg(i 1)) time
Total time:O(n)BUILDHEA P
+ O(n)loop O(lgn)HEAPIFY= O(n lgn)
7 6 25 34
input: [2, 3, 4, 5, 6, 7]
Line 1: BUILDMAXHEAP
Line 2: loop, i = 6heap
Line 3: swap, A[1], A[i]762 5 34
sortedalmost a heap(root is wrong, butchildren are OK)
Line 4: Heapify(A, 1, 5) 756 2 34
753 2 64
heap again sorted
Line 3: swap, A[1], A[i]
Line 2: loop, i = 5
sortedalmost a heap
Line 4: Heapify(A, 1, 4) 735 2 64
Line 2: loop, i = 4
heap againsorted
Continue until the whole array is sorted.
4.4 Python Programming
See http://e3.uci.edu/01f/15545/heap-hint.html; or for PDF, tryhttp://e3.uci.edu/01f/15545/heap-hint.pdf
4.5 Priority Queues
want to get the highest priority element,
MAX HEA PINSERT(A,x)
EXTRACTMAX(A)
16
8/3/2019 ALGO Notes
18/62
Implementation choices:
unsorted: O(n) to extract, O(1) to insert (anywhere)
sorted linked list: O(1) to extract, O(n) to insert
Heap: O(lg n) to insert , O(lg n) to extract
4.5.1 Heap Insert:
assume A is already a heap
increase size ofA by one, put new element at the end, andpropagate up until its parent is larger (or its the largest)
MAX -H EA P-I NSERT(A, key) (diff. from book page 140)1 heap-size[A] heap-size[A] + 12 i heap-size[A]3 while i > 1 and A[Parent(i)] < key4 A[i] A[Parent(i)]5 i Parent(i)6 A[i] key
7
7
6
5 3 2
key = 87
6
5 3 2
key = 87
6 4
5 3 2
make room
MaxHeapInsert(8)
44
4 7
8
6
5 3 2
key = 8
4
O(lg n) time (climb up from tail to at most root)
assume short circuit evaluation (i.e., check index, ifi > 1then A[parent(i)] is not even evaluated!
Even though the algorithm saysheap-size[A] heap-size[A] + 1,you probably cant do that in any programming lan-
guage. You need to actually make room for A[n + 1], orelse your program can crash. In Python, call A.append
to increase physical size ofA.
in Python, heap-size[A] can be derived from len(A),rather than a global variable.
4.5.2 Extract max:
pull the max (already at the root)
move the last one to the front and heapify (since bothchildren are heaps)
in Python, use A.pop to accomplish heap-size[A] heap-size[A] 1.
HEA P-E XTRACTMAX(A)1 if(heap-size[A] < 1)2 then error heap underflow;
3 max
A[1]
4 A[1] A[heap-size[A]]5 heap-size[A] heap-size[A] 16 MAX -H EAPIFY(A, 1)7 return max
4.5.3 Heap Increase key
HEA P-I NCREASE-K EY (A, i, key)
A is a heap, i is an index to an element.
Purpose: overwrite element A[i] with key (which is ex-
pected to be larger than A[i] hence increase key),while making minor changes to restore its heap property.
Strategy: propagate A[i] up towards the root, similar toMAX -H EA P-I NSERT.
HEA P-I NCREASE-K EY (A, i, key) (page 140)1 ifkey < A[i]2 then error new key is smaller than current key
3 A[i] key4 while i > 1 and A[Parent(i)] < A[i]5 do exchange A[i] A[Parent(i)]6 i Parent(i)
not a common routine in priority queues; used later inmin-spanning tree algorithms
very unusual to implement MAX -H EA P-I NSERT asshown on page 140.
implies a max heap. (HEA P-D ECREASE-K EY would im-ply a min heap.)
timing complexity similar to HEA PINSERT except no re-allocation of storage size
17
8/3/2019 ALGO Notes
19/62
Lecture 5
Linear Time sorting, Medians/Order Stat
This lecture
problem complexity of comparison-based sorting (8.1)
noncomparison sorting: counting sort (8.2), radix sort(8.3), bucket sort (8.4)
5.1 Comparison-based sorting: prob. complex-
ity
Decision Tree argument
Use a tree to represent all possible execution traces of analgorithm (Fig. 8.1)
1:2a[1] a[2]
2 : 3
a[1] a[2] a[3]
a[2] a[3]
1 : 3
a[1] a[3] < a[2]
a[1] a[2]a[3] < a[2]
a[1] a[3]
a[3] < a[1] a[2]
a[3] < a[1]
1 : 3
2 : 3
>
>
a[1] > a[2]
two-way branch on >,
8/3/2019 ALGO Notes
20/62
C[1..k] key count; overloaded to be index
Initialize C[1..k] 0foreach A[i]
do C[A[i]]++foreach i 2 to k
do C[i]+ = C[i
1]
computes Offset[i] := Offset[i 1] +C[i] converts C from count to index to the end of its region
for i n downto 1 doB[Offset[i] ] A[i]
2 5 3 0 2 3 0 3A
input
count #times each key appears in input
2 2 3 1
0 1 2 3 4 5
C
outputlayout
2 530
2 2 3 1
index to endof region
C2 4 7 8
Copy fromA[i] toB[C[i]]in reverseorder
B 3
30
30 3
2 5 3 0 2 3 0 3A
30 32
30 320
30 320 3
30 320 3 5
30 320 3 52
6
1
5
3
0
4
7
2
stable sort (relative positions preserved)
uses 2 buffer
complexity = O(n + k), effectively linear ifk= O(n).
5.2.2 RadixSort
Idea: For keys that have digits
use stable sorting to sort by one digit at a time
start from least significant digit!
3 2 9
4 5 7
6 5 7
8 3 9
4 3 6
7 2 0
3 5 5
input
Stable-sort byleast significantdigit first
7 2 0
3 5 5
4 3 6
4 5 7
6 5 7
3 2 9
8 3 9
Stable-sort by2nd LSD
7 2 0
3 2 9
4 3 6
8 3 9
3 5 5
4 5 7
6 5 7
Stable-sortby MSD
3 2 9
3 5 5
4 3 6
4 5 7
6 5 7
7 2 0
8 3 9
Important!! does not work if you start from MSD!!!
3 2 9
4 5 7
6 5 7
8 3 9
4 3 6
7 2 0
3 5 5
input
Stable-sort byMSD first
Stable-sort by2nd MSD
7 2 0
3 2 9
4 3 6
8 3 9
3 5 5
4 5 7
6 5 7
Stable-sortby MSD
3 2 9
3 5 5
4 3 6
4 5 7
6 5 7
7 2 0
8 3 9
3 2 9
3 5 5
4 5 7
4 3 6
6 5 7
7 2 0
8 3 9
Oops! Not sorted
Need to do a lot of bookkeeping if really want to sort from
MSD.
3 2 9
4 5 7
6 5 7
8 3 9
4 3 6
7 2 0
3 5 5
input
Stable-sort byMSD first
Stable-sort by2nd MSD
7 2 0
3 2 9
4 3 6
8 3 9
3 5 5
4 5 7
6 5 7
Stable-sortby MSD
3 2 9
3 5 5
4 3 6
4 5 7
6 5 7
7 2 0
8 3 9
3 2 9
3 5 5
4 5 7
4 3 6
6 5 7
7 2 0
8 3 9
also useful for sorting by multiple keys by name, bySAT score, etc. start from the secondary before the
primary key
runtime complexity: (d(n + k)), whered = # of digits in key,k= radix (# possible values per digit)
reason: make d passes. Each pass is (n + k) (e.g.,counting sort)
5.2.3 Bucket sort
assume values are distributed within some range.
project the value into n buckets, (insertion-) sort eachbucket
input7 8 1 7 3 9 2 6 7 2 9 4 2 1 1 2 2 3 6 8
bucket#
7 1 3 2 7 9 2 1 2 6
1
2
3
9
12 17
21 2623
.
.
.
39
94
0
Timing Complexity: expected to be linear (to be revisited)
19
8/3/2019 ALGO Notes
21/62
5.3 Median and Order statistics
5.3.1 Find min? max? both min and max?
n 1 comparisons for min (keep the smallest so far)
also n 1 comparisons for max
both min/max: 2n2but could do better: 3n/2 comparisons
1. compare pairs (n/2 comparisons)2. search for min in the smaller set
3. search for max in the bigger set
3 7 5 8 6 1 9 4input
3
7
5
8 6
1
9
4smaller
bigger
n/2 for pairwise compare
n/2 1 for min
n/2 1 for max
1
9
5.3.2 How to find the i-th smallest w/out sorting?
keep track of min & max so far
find the median
Use QuickSorts partitioning:each pass: pivot is in the correct rank (absolute sorted
position)
if pivot is the position we want, return itotherwise, continue partitioning in either one of the two
intervals.
RANDOMIZED -S ELECT(A,p, r, i)1 if p = r2 then return A[p]3 q RANDOMIZED -PARTITION(A,p, r)4 k qp + 15 ifi = k the pivot value is the answer6 then return A[q]7 else ifi < k8 then return RANDOMIZED -S ELECT(A,p, q1, i)9 else return RANDOMIZED -S ELECT(A, q + 1, r, i k)
Complexity of (randomized) Select:
Best case: O(n) first pass, very lucky, pivot is what wewant
Even partitioning: T(n) + T(n/2) + T(n/4) + ... +T(2) + T(1) = T(2n) = O(n)
Worst case: T(n) + T(n 1) + T(n 2).. = O(n2)
Average time: use Randomized1/n probability of each position i as pivot. expected cost
= sum up probability cost ofi as pivot
T(n) =n1i=1
1
nT(max(i, n i)) + O(n)
=n1
i=
n/2
2
nT(i) + O(n)
Guess T(n) an, Verify
T(n) 2n
n1
i=n/2(a i) + cn
2n
n
2
(a(n 1) + an/22
+ cn =3
4an + cn.
Choose a = 4c O(4n) = O(n)
5.3.3 Can show that worst-case select is O(n)
(Theoretical interest only)
Divide n elements into sublists of LENGTH 5 each, forn/5 lists
Find the medians from these lists. Brute force (constanttime) on each list.
Recurse to find the median x of these n/5 medians. Partition around x. Let k= rank(x)
if(i = k) return x
if (i < k) then Select recursively to find ith smallest infirst part
else Select recursively to find (i
k)th smallest in second
part.
5.3.4 Justification:
(x is dont know, s = smaller, L = larger)x x x L L L L
x x x L L L L
s s s * L L L
s s s s x x x
s s s s x x x
pivot
example: for the list [2, 3, 5, 5, 7, 8]
a binary search tree:
5
3
2 5
7
8
another valid BST for the
same list:
5
3
2
5
7
8
6.1.1 API for BST
INORDER-T RE E-WAL K(x): print keys in BST in sortedorder
TRE E-S EARCH(x, k): find tree node with key k.
TRE E-M AX(x) and TRE E-M IN(x), assume x is root
TRE E-S UCCESSOR(x) and TRE E-P REDECESSOR(x),for any tree node (need not be root)
TRE E-I NSERT(T,z) and TRE E-D ELETE(T,z)
INORDER-T RE E-WAL K(x)1 ifx = NI L2 then INORDER -T RE E-WAL K(l e f t [x])3 print key[x]4 INORDER -T RE E-WAL K(right[x])
Note: variants of tree visits (binary trees in general, not
necessarily BST)
inorder = recursive left, root, recursive right
preorder = root, recursive left, recursive right
postorder = recursive left, recursive right, root
Timing complexity of INORDER-T RE E-WAL K:
(n) time
Assume root x, left-subtree[x] has knodes right-subtree[x] has n k 1 nodes.assume each call incurs overhead d
Proof by substitution method: into T(n) = (c + d)n + c
T(n) = T(k) + T(n
k
1) + d
= ((c + d)k+ c)+((c + d)(n k1) + c) + d= (c + d)n + c.
TRE E-S EARCH(x, k)1 ifx =NI L or k= key[x]2 then return x
3 ifk< key[x]4 then return TRE E-S EARCH(l e f t [x], k)5 else return TRE E-S EARCH(right[x], k)
returns x if found (key[x] matches k), NI L if not found
22
8/3/2019 ALGO Notes
24/62
8/3/2019 ALGO Notes
25/62
if z has two children find and cut successor y, pasteover z
successor y must be either leaf or has one child; it
cant have two children.
Delete(5)
5 has 2 children
15
5
3
13
7
16
12
10
6
Step 1: Cut 5s successor = 6(by splicing.)(in case of leaf, just cut)
15
5
3
13
7
16
12
10
6
Step 2: Paste 6 over 5
15
3
13
7
16
12
10
6
Equivalently, could find predecessor x, cut x and pasteover z.
Rotation:
Transform a BST while still preserveing BST property
x
y x
y
LEFT-ROTATE(x)
RIGHT-ROTATE(y)
Right rotate:
x is left child ofy, want to make y the right child of
x.
x gets to keep its left child as before. xs old right tree (which all have values x >> x = Node(3) # makes a new n ode w ith key =
>>> x.key
3
Note that calling Node(3) actually calls Node classsinit (constructor) routine.
Use x.key, x.left instead ofkey[x], left[x], etc.
We can insert additional fields even after the constructoris called!
example: x.color = red # add a new color at-
tribute
Example routine:
def TreeMaximum(x):
while x.right != None:
x = x.right
return x
6.2.2 Tuple-encoded Trees
Tuples vs Lists:
list: L = [1, 2, a, b, c ]
tuple: T = (1, 2, a, b, c)
Similarities:
Both tuples and lists let you access elements by their in-dices: e.g. L[3] and T[3] both yield the value ofb.
len works for both. Both len(L) and len(T) yield5.
Differences:
syntax: tuples use ( ); lists use [ ]
tuples are immutable (read only)not allowed to say T[3] = z
OK to do lists L[3] = z
tuples can be compared lexigraphically:(1, 2) < (1, 2, 3) < (1, 3)
Can have tuples within tuples:(1, (2, 3, (4, 5)), 6)
24
8/3/2019 ALGO Notes
26/62
8/3/2019 ALGO Notes
27/62
8/3/2019 ALGO Notes
28/62
Lecture 7
Dynamic Programming
This lecture: Dynamic programming
Optimization problems and Dynamic programming
Assembly line scheduling
matrix multiply
longest common subsequencce
optimal binary search tree
7.1 Optimization problems
many possible solutions with different costs
want to maximize or minize some cost function
unlike sorting its sorted or not sorted.. partially sorteddoesnt quite count.
examples: matrix-chain multiply (same results, justfaster or slower)
knap-sack problem (thief filling up a sack), compression
7.1.1 Dynamic programming
programming here means tabular method
Instead of re-computing the same subproblem, save re-sults in a table and look up (in constant-time!)
significance: convert an otherwise exponential/factorial-time problem to a polynomial-time one!
Problem characteristic: Recursively decomposable
Search space: a lot of repeated sub-configurations
optimal substructure in solution
7.2 Case study: Assembly line scheduling
two assembly lines 1 and 2
each line i has n stations Si,j for n stages of assembly
process
each station takes time ai,j
chassis at stage j must travel stage (j + 1) next
option to stay in same assembly line or switch to
the other assembly line
time overhead ofti,j if decided to switch to line to
go to Si,j.
assemblyline 1
assemblyline 2
stage 1 stage 2 stage 3 stage 4
2
2
1
3
2
1
stage 5 stage 6
2
3
1
4
42
32
7 9
5
3
6
4
4
8
5
4
75
Want to minimize time for assembly
Line-1 only: 2 + 7 + 9 + 3 + 4 + 8 + 4 + 3 = 40
Line-2 only: 4 + 8 + 5 + 6 + 4 + 5 + 7 + 2 = 41
Optimal: 2 + 7 + (2) + 5 + (1) + 3 + (1) + 4 + 5 +(1) + 4 + 3 = 38
How many possible paths? 2n (two choices each stage)
Optimal substructure
Global optimal contains optimal solutions to subprob-lems
Fastest way through any station Si,j must consist of
shortest path from beginning to Si,j
shortest path from Si,j to the end
That is, cannot take a longer path to Si,j and make
up for it in stage (j + 1) . . . n.
27
8/3/2019 ALGO Notes
29/62
Notation: fi[j] = fastest possible time from start throughstation Si,j (but not continue)
ei,xi are entry/exit costs on line iGoal is to find f global optimal
initially, at stage 1 (for line l = 1 or 2),fl[1] = el (entry time) + al,1 (assembly time)
at any stage j > 1, line l, (and m donotes the other line)
fl[j] = min
fl [j 1] same line l
fm[j 1] + tm,(j1) other line m + transfer
+ al,j (assembly time at station Sl,j)
Can write this as a recursive program:
F(i, j)if j = 1
then return ei + ai,1else return min(F(i, j 1),
F(i%2 + 1, j 1) + ti%2+1,j1)+ai,j
But! There are several problems:
many repeat evaluation of F(i, j), could be O(2n) time use a 2-D array f[i, j] to remember the running mini-mum
does not track the path use array li[j] to remember which path gave us thismin
Iterative version shown in book on p. 329.
Run time is (n).
7.3 Matrix-Chain multiply
Basic Matrix Multiply
A is p q, B is q rq
pq
r
p
r
A B AB
product is p
rmatrix: ci,j =
y=1...qai,y
by,j
total number of scalar multiplications = p q r
Multiply multiple matrices
matrix multiplication is associative: (AB)C= A(BC)Both yield p s matrix
Total # multiplications can be different! (added)
(AB)Cis
pqr to multiply AB first,
+ prs to multiply (AB)(p r) w/C(r s)= pqr+ prs total # multiplications
On the other hand, A(BC) is pqs + qrs
Example: if p = 10, q = 100, r= 5, s = 50, then
pqr+ prs = 5000 + 2500 = 7500
pqs + qrs = 50000 + 25000 = 75000 ten times asmany!
Generalize to matrix chain: A1,A2,A3 . . .AnBut there are many ways!
P(1) = 1 (A) nothing to multiplyP(2) = 1 (AB)P(3) = 2 A(BC), (AB)CP(4) = 5 A(B(CD)),A((BC)D), (AB)(CD), (A(BC))D,((AB)CP(5) = 14 . . . Exponential growth
P(n) =
1 ifn = 1,
n1k=1 P(k) P(nk) for n 2(4n/n3/2) at least exponential!
7.3.1 optimial parenthesization to inimize # scalar
mults
((A1 A2)
A1...2
A3) (A4 A5)
A4...5
1..k k+1..n
A1...5
A1...3
Notation:
let Ai...j denote matrix product AiAi+1 Aj matrix Ai has dimension pi1 pi
Optimal substructure
if optimal parenthesization for A1...j at the top level is(L)(R) = (A1...k)(Ak+1...j), then
L must be optimal for A1...k, and
R must be optimal for Ak+1...j
Proof by contradiction
Let M(i, j) = Minimum cost from the ith to the jth matrix
M(i, j) =
0 ifi = j,
minik
8/3/2019 ALGO Notes
30/62
Observation
dont enumerate the space!bottom up no need to take the min so many times!
instead of recomputing M(i, k), remember it in arraym[i, k]
book keeping to track optimal partitioning point. SeeFig.7.1
O(n3) time, (n2) space (for m and for s arrays)
7.4 Longest common subsequence
Example sequence X = A, B, C, B, D, A, B,Y = B , D , C , A , B , A
a subsequence ofX is Z= B, C, D, B
Longest common subsequence (LCS) of length 4:
B , C , B , A, B , D , A , B
This is a maximization, also over addition, but add costby 1 (length increment)
Brute force:
enumerate all subsequences ofx (length m), check if itsa subsequence ofy (length n)
#of subsequences of x = 2m (binary decision at eachpoint whether to include each letter)
worst case time is (n
2m) because for each one check
against ys length = n
Better way:
Notation: Xk = length-kprefix of string Xxi is i
th character in string X
Z = z1 . . .zk is an LCS of X = x1 . . .xm and Y =y1 . . .yn
ifxm = yn then zk = xm = yn, andZk1 is an LCS of Xm1,Yn1.
ifxm= yn then
ifzk = xm then Z is LCS ofXm1,Y ifzk = yn then Z is LCS ofX,Yn1.
c[i, j] = LCS length ofXi,Yj
c[i, j] =
c[i 1, j 1] + 1 ifx[i] = x[j](match)max
c[i, j 1]c[i 1, j]
(no match, advance either)
Algorithm
c[1 : m,0] 0; c[0,1 : n] 0for i 1 to m
do for j 1 to ndo if(xi = yj)
then c[i, j] c[i 1, j 1] + 1b[i, j] match ()
else if(c[i 1, j] c[i, j 1])then c[i, j] c[i 1, j] copy the longer length
b[i, j] dec i ()else c[i, j 1] > c[i 1, j]
c[i, j] c[i 1, j]b[i, j] dec j ()
0
0 0 0 0 0 0 0
j 0 1 2 3 4 5 n =6i
0
1
2
3
4
5
6
m=7
A
B
C
B
D
A
B
B D C A B A
0 0 0 0 0 0 0
0
0
0
0
0
0
0
i=1 A 0
B D C A B A
0
0 0 0 1 11
i=2 B
B D C A B A
1 1 1 1 22
i=3 C 0
B D C A B A
1 1 2 2 22
i=4 B 0
B D C A B A
1 1 2 2 33
i=5 D 0
B D C A B A
1 2 2 2 33
i=6 A 0
B D C A B A
1 2 2 3 43
i=7 B 0
B D C A B A
1 2 2 3 44
Time (mn), Space (mn).
7.5 Optimal Binary Search Trees
input: n keys K= k1, k2, . . . ,knn + 1 dummy keys D = d0, d1, . . . ,dn
d0 < k1 < d1 < k2 < d2 < ... < kn < dn
key ki has probability pi, anddummy key di has probability qi, and
n
i=1
pi +n
i=0
qi = 1
want: Binary tree that yields fastest search: (fewer steps)for frequently used words
ki keys should be internal nodes, and di dummy keysshould be leaves.
29
8/3/2019 ALGO Notes
31/62
MATRIX-CHAIN-O RDER(p)1 n length[p] 12 for i 1 to n3 do m[i, i] 04 for l 2 to n: l = length of interval considered5 do for i 1 to (n l) + 1:
starting index, from 1 up to n
length for each length
6 do j i + l 1 ending index, always length away from the starting index
7 m[i, j] 8 for k i to j 1:
different partitions between i and j9 do q m[i, k] + m[k+ 1, j] + pi1pkpj.
10 if(q < m[i, j]):11 then m[i, j] q12 s[i, j] k remember best k between i, j13 return m, s
Ai...jlength
l
i = startingindexof chain
n
n l + 1
j = i + l 1= ending
indexof chain
1
kfromitoj 1
chain length lfrom 2 ton
starting index ifrom 1 ton (implies ending indexj
Figure 7.1: Matrix-Chain-Order Algorithm and graphical illustration.
optimize for common case. Balanced tree might not begood!
Example tree (not optimal):
k2
0.10
k1
0.15
k4
0.10
d0
0.05
d1
0.10
k3
0.05
k5
0.20
d2
0.05
d3
0.05
d4
0.05
d5
0.10
Expected search cost of tree T
Optimal substructure: if root=kr, L = (i . . . r 1), R =(r+ 1 . . . j) L,R must be optimal subtrees.
Expected cost e[i, j]
e[i, j] =
qi1 (a dummy leaf) if j = i 1minirj{ e[i, r 1] (left subtree)
+e[r+ 1, j] (right subtree)+w(i, j) (add one depth) }ifi j
k2
0.10
k1
0.15
k4
0.10
d0
0.05
d1
0.10
k3
0.05
k5
0.20
d2
0.05
d3
0.05
d4
0.05
d5
0.10
d0
0.05
e[1,0]
= q0= 0.05
d1
0.10
e[2,1]
= q1= 0.10
e[1,1] =
p1 (0.15 for k1)+ e[1,0] (left tree)+ e[1,1] (right tree)+ one more depth
for all nodes inLeft & Right subtrees
k1
0.15
d0
0.05
d1
0.10
d0
0.05
d1
0.10
k1
0.15
d0
0.05
d1
0.10
e[1,5] = w[1, 5]+ e[1,0] (left tree)+ e[1,1] (right tree)
w[1,1] = p1
+ one more depthfor all nodesin subtrees
e[1,1] = w[1,1] + e[1,0] + e[1, 1]
k1
0.15
d0
0.05
d1
0.10
k4
0.10
k3
0.05
d2
0.05
d3
0.05
k5
0.20
d4
0.05
d5
0.10
use arrays to remember e[i, j], w[i, j] instead of recom-puting
use array root[i, j] to remember root positions
OPTIMAL -BST(p, q, n) (page 361)1 for i 1 to n + 12 do e[i, i
1]
qi
1
3 w[i, i 1] qi14 for l 1 to n5 do for i 1 to n l + 16 do j i + l 17 e[i, j] 8 w[i, j] w[i, j 1] + pj + qj9 for r i to j
10 do t e[i, r 1] + e[r+ 1, j] + w[i, j]11 ift< e[i, j]12 then e[i, j] t13 root[i, j] r14 return e, root
30
8/3/2019 ALGO Notes
32/62
Lecture 8
Greedy Algorithm
This lecture
greedy algorithms: make choice that looks best at themoment
activity selection problem
Huffman (with Python programming hints)
8.1 Activity Selection problem
Set S ofn activities
si = start time of activity i,fi = finish time ofi.assume 0 si < fi < . (finite, positive/nonzero execu-tion delay)
i, j are compatibleif fi < fj implies fi < sj. (i.e., no overlap)
Goal: find a maximal subset A of compatible activities
maximize |A| = # of activities
NOT maximize utilization!!
there many be several optimal solutions, but its
sufficient to find just one.
Greedy Solution: pick next compatible task with earliestfinish time.
Example
0 2 4 6 8 10 12 14
listoftaskssortedbyfinishtim
efi
optim
alschedules
Notation
S = {a1 . . . an} set of activities.
a0 = start, an+1 = end (fake) activities: f0 =0, sn+1 = .
Si j = {ak S : fi sk < fk sj}(i.e., set of activities compatible with ai, aj) S = S0,n+1
assumption: sorted by finish time:f0 f1 f2 fn fn+1
Claim: ifi j then Si j = /0.impossible to start later and finish earlier
31
8/3/2019 ALGO Notes
33/62
Optimal substructure
Let A = an optimal set (activity selection) for SAi,j = optimal for Si,j.
ifak Ai,j then Ai,k must be optimal solution to Si,k
Proof: Assume there exist Ai,k with more activitiesthan our Ai,k (that is |Ai,k| > |Ai,k|. we can construct a longer Ai,j by using Ai,k pre-fix. Contradiction to the original claim that A is
optimal.
Formulation:
Want to maximize c[0, n + 1] where c[i, j] = max # of compati-ble activities in Si,j.
c[i, j] = 0 ifSi,j = /0
maxi
8/3/2019 ALGO Notes
34/62
Ratios $6, $5, $4
Greedy solution: take item 1, item 2, but
item 3 wont fit get $160/30 Optimal is take item 2, item 3, leave item 1
$220/50
$6010kg0
10
20
30
40
50
kg
$10020kg
$12030kg
knapsack50kg
$:wtratio
6 5 4
Greedy is not optimal for 0-1 knapsack:
highest $:wt first, keep adding until exceedingknapsack capacity
$6010kg
$10020kg
add first
add second
$12030kg
oops! wont fit!must leave behind
result: $160 / 30kg
Optimal for 0-1:
in this case, best fit
$10020kg
$12030kg
result: $220 / 50kg
Greedy would work for fractional knapsack:take a fraction of the pink part
result: $240 / 50kg
How to solve 0-1 knapsack?
dynamic programming, in O(nW) time.
idea: iterate over total weights up to the limit
need to assume unit weight
8.3 Huffman codes
fixed-length code:
same # bits for all characters
variable length code:
more frequent use fewer bits (e.g. vowels like a ei o u vs. q z)
Goal: minimize codewordcC
frequency(c) bitlength(c)
non-lossy! preserves the same information
Comparison example: 6 characters
a b c d e f
frequency 45 13 12 16 9 5
fixed length codeword 000 001 010 011 100 101
var-length codeword 0 101 100 111 1101 1100
Why is variable length better?
fixed (3-bits per char) 100,000 chars 300,000 bits
variable length: (45 1+(13 +12+16) 3+(9+5) 4) =224, 000 bits (25% saving)
problem: how do we know where to begin each code-
word?
prefix codes: no codeword is a prefix of another code-word. e.g., only a starts with 0. So, 00 means (a a).
1 need to look at more bits:10 is prefix for either b (101) or c (100)
11 is prefix for d, e, f
Example: 001011101 uniquely parses as0(a) 0(a) 101(b) 1101(e), no ambiguity
Algorithm
HUFFMAN(C) (page 388) C is the array of nodes that carry (letter, frequency)1 n |C|2 Q C put all elements into priority queue3 for i 1 to n 14 do z MAKE NEW NODE5 left[z] x EXTRACTMIN(Q)6 right[z] y EXTRACTMIN(Q)7 f[z]
f[x] + f[y]
8 INSERT(Q,z)9 return EXTRACTMIN(Q)
Python version
http://e3.uci.edu/02f/15845/huffman.html
33
8/3/2019 ALGO Notes
35/62
14
25 30
55
100
0 1
1
1
1
0
0
0
0 1
initially: min-priority Q gets character/frequency nodes
'f': 5 'e': 9 'c': 12 'b': 13 'd': 16Q
i=1:
141. extract two nodes2. make a binary tree z with sum freq.3. insert z into Q
'c': 12 'b': 13 'a': 45Q 14
'f' 'e'
'f': 5 'e': 9
z:
i=2:25
'c' 'b'
Q 'a': 4514
'f' 'e'
25
'c' 'b'
'a': 45
'd': 16
'd': 16
i=3:
Q 'a': 4525
'c' 'b'
14
'f' 'e'
'd': 16
30
30
14
'f' 'e'
'd'
continued... when all done,
'a' : 45
'b' : 13'c' : 12 'd' : 16
'e' : 9'f' : 5
Time complexity:
|n| 1 calls, heap is O(lg n), so this is O(n lg n) for ncharacters
Correctness
Greedy property:
C= alphabet. x,y Care two chars with lowest frequen-cies. (most rarely used)
then there exist an optimal prefix code such that
the codewords for x and y have the same length and
differ only by the last bit value.
If a, b are leaves at deepest depth and have higherfrequency, then we can swap a, b with x,y and ob-tain a new tree with a lower cost of sum of freq depth.
Optimal substructure:
T is an optimal tree for alphabet C T is an optimal tree for alphabet Cif
Cand C are the same except x,y Cwhile z C, f is the same except f[z] = f[x] + f[y]
we construct T from T by replacing leaf node forz with internal node that is parent to x,y.
34
8/3/2019 ALGO Notes
36/62
Lecture 9
Basic Graph Algorithm
This lecture
Graph data structure
Graph traversal ordering definition
9.1 Graphs G = (V,E)
V: vertices. Notation: u, v V
E: edges, each connects a pair of verticesE VV; (u, v) E
Variations
undirected or directed graph (digraph)
weighted: by default, weighted edgesw : ERNotation: weight for edge (u, v) is written as w(u, v)
Also possible to have weighted vertices
Undirected Graphs
degree of a vertex: # of edges connected to the vertex
complete graph, aka clique:undirected graph where all pairs of vertices are con-
nected
bipartite graph:undirected graph whose V = V1 V2, and E V1 V2
multigraph: can have multiple edges between the samepair of vertices (including self edges)
hypergraph: an edge can connect more than two vertices
Digraphs
in-degree: # incoming edges,out-degree: # outgoing edges
path:
v0
, v1
, . . . ,vk
, where (vi, v
i+1)
E
simple path: a path where all vi are distinct
cycle: a nontrivial simple path plus (vk, v0) to close thepath
DAG: directed acyclic graph (no cycle)
strongly connected: digraph whose vertices are all reach-able from each other.
Representation
Adjacency list
Associate each vertex with a linked list of its neighborson its outgoing edges
good for sparse graphs
for weighted graph, need to store weight in linked list
Adjacency matrix
A bit matrix to indicate presence/absence of edge
weighted: can store weight in matrix
u v
y x
w
V
u
v
w
x
y
Adjacency list
v y
x w
x
y
v
u
v
w
x
y
u v w x y
1 1
1 1
1
1
1
Adjacency matrix
tradeoffs:
Adj. List: easier to grow #of vertices, good for sparse
Adj. Matrix: faster access, but O(V2) storage
35
8/3/2019 ALGO Notes
37/62
9.2 Breadth-first search
distance here means # of edges, not edge weight!e.g., start from u:
= 0: u
= 1: v, y
= 2: w, x
u v
y x
w
= 1
= 2
start
visit vertices in increasing shortest distancevisit all of distance d before visiting distance d+ 1 ver-tices
BFS order may not be unique!!
Example: BFS(G, u) can be[u, v,y, w,x], [u, v,y,x, w], [u,y, v, w,x], [u,y, v,x,y]
Algorithm
BFS(G, s)1 for each u V[G]{s}2 do color[u] WHITE3 d[u] 4 [u] NI L5 color[s] COLOR6 d[s] 07 [s] NI L8 Q /09 ENQUEUE(Q, s)
10 while Q = /011 do u DEQUEUE(Q)12 for each v Adj[u]13 do ifcolor[v] = WHITE
14 then color[v] GRAY15 d[v] d[u] + 116 [v] u17 ENQUEUE(Q, v)18 color[u] BLACK
main idea: for each vertex ofd, enqueue vertices ofd+1
avoid queuing more than once: use color
WHITE: never been enqueued
GRAY: currently in queue
BLACK: already visited
actually COLOR is somewhat redundant... enough
to test (predecessor)
Lemma 22.1: for any edge (u, v) E, (s, v) (s, u)+1 if we can reach u in edges, we can reach v in +1
edges (by taking edge (u, v))
v might actually be closer than u, but thats ok
d[v] computed by BFS satisfies d[v] (s, v) initialize to d[u] = (s, u)
base case: since we start with d[s] = 0, all those uwith (s, u) will have d[u] = (s, u) = 1
induction: assume d[u] = (s, u), consider edge(u, v).ifcolor[v] = WHITE then it has been visited beforeand d[v] < (s, u) + 1.ifcolor[v] = WHITE then it is being visited for thefirst time: d[v] = (s, u) + 1 = d[u] + 1 = (s, v).
9.3 Depth-first search
depth is not necessarily the same as distance in BFS!
discover vertices (use a stack) before we visit!
pushing vertices on stack (if we havent pushed it before)
DFS order may not be unique!
Example: DFS with tree
u
v
x y
w
z
Output when closing parenthesis: [x,z,y, v, w, u]
start
u
v
x
z
w
(u (v (x x) (y (z z) y) v) (w w) u)
1 2 3 4 5
y
6 7 8 9 10 11 12
DFS(G)1 for each v V[G]2 do color[u] WHITE3 [u] NI L4 time 05 for each u V[G]6 do ifcolor[u] = WHITE7 then DFS-VISIT(u)
DFS-VISIT(u)1 color[u] GRAY discovered u2 time time +13 d[u] time4 for each v Adj[u]5 do ifcolor[v] = WHITE6 then [v] u7 DFS-VISIT(v)8 color[u] BLACK9 f[u] time time +1
36
8/3/2019 ALGO Notes
38/62
8/3/2019 ALGO Notes
39/62
(Lemma 22.13) Component graph GT is a DAG
Notation:
Cdenote a strongly connected component
d[C] = minvCd[v] (earliest of discovery times) f[C] = maxv
C f[v] (latest of finish times)
C,C two distinct SCCs ofG
(Lemma 22.14) if edge (u, v) E and C C, thenf(C) > f(C).
Intuition: C is deeper than C and therefore fin-ishes earlier.
(Lemma 22.15 paraphrased from book!)(v, u) ET and C C, then f(C) < f(C).
Proof: (v, u) ET (u, v) Ebecause v C is deeper (finishes earlier), f(C) >> Adj[t][x, y, z]
Vertices V ofG
>>> Adj.keys()[t, z, s, x, y]
# Note: could be in another