Top Banner
Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004
48

Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Jan 17, 2018

Download

Documents

Leo Lawrence

Recall: Dynamic Programming
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Dynamic Programming II

15-211

Fundamental Data Structures and Algorithms

Klaus SutnerApril 1, 2004

Page 2: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Plan

Homework 6 ...

Quiz 2

ReadingSection 7.6

Page 3: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Recall: Dynamic Programming

Page 4: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Recomputation

Principle: Avoid recomputation.

Typically store already known results in a hash table.

If the computation has additional structure, a plain table may be better.

Space versus time trade-off.

Page 5: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Suproblems

Divide the problem into smaller subproblems.

Since we use memoizing/tables the subproblems may well overlap – even so, there will be no recomputation.

Note: It is not always clear what the subproblems should be. Finding the right parameters can be challenging.

Page 6: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Optimality

Optimal solutions of the problem are composed of optimal solutions to subproblems.

In many optimization problems this property is entirely natural:

- each segment of a shortest path is a shortest path- an optimal knapsack is obtained from an optimal knapsack on fewer items- a LCS is obtained from LCSs of truncated strings

Page 7: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Recursion

Fibonacci: n n-1, n-2Fast Fibo.: n n/2 (more or less)

Binary KP: (k,A) (k-1,A), (k-1,A-s)

LCS: (i,j) (i-1,j-1), (i,j-1), (i-1,j)

In all cases, recursion clearly terminates.

Though for Fast Fibonacci the number of subproblems is far from obvious.

Page 8: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Top-Down vs. Bottom-Up

Top-down places the burden on the storage mechanism (memoization) – it must keep track of known values. More efficient if only a few values are needed.

Bottom-up requires the algorithm designer to figure out the right order in which to compute values – no hash table needed, just a k-dimensional array (usually k = 1 or k = 2). More efficient if all (most) values in array are used.

Page 9: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Matrix Chain Multiplication

Page 10: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Matrix Multiplication

Given a p x q matrix A and an q x r matrix B we can compute their product

C = A B

using p q r scalar multiplications.

This assumes the brute-force algorithm, there are better ways (clever divide-and-conquer methods).

CB

A= xC

BA= x

CB

A= xC

BA= x

Page 11: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Matrix Chain Multiplication

Given a n matrices A1 ,A2 ,..., A n of suitable dimensions, we want to compute their product.

C = A1 A2 … A n

Note: Matrix multiplication is associative (try to prove this), so we can parenthesize the expression any which way we like.

But, of course, we must not permute the matrices.

Page 12: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

N = 3

Who cares? You, the implementor: the total number of scalar multiplications depends on where we put the parens.

A1 10 x 100A2 100 x 5A3 5 x 50

Which is better: ( A1 A2 ) A 3 or A1 ( A2 A 3 ) ?

Page 13: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

N = 3

A1 10 x 100A2 100 x 5A3 5 x 50

( A1 A2 ) A 3 requires 7500 scalar mults.

A1 ( A2 A 3 ) requires 75000 scalar mults.

This assumes the standard algorithm. A surprising factor of 10.

Page 14: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

The Problem

So the problem is: find the optimal (minimal number of scalar multiplications) parenthesization for a given sequence of matrices.

The only relevant input is the vector of dimensions of the matrices:

p0 p1 p2 … p n

where matrix A i has dimension pi-1 pi

Page 15: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Brute Force

Boils down to finding all full binary trees on n leaves.

A B C D

D

A

C

B

Page 16: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Hard Recurrence Equation

Let F(n) be the number of all full binary trees on n leaves.

F(1) = 1

F(n) = F(1)F(n-1) + F(2)F(n-2) +… + F(n-1)F(1)

Convolution, hard to solve.

Claim: F(n) = (n -3/2 4 n)

Page 17: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Catalan Numbers

Incidentally, F(n) is closely related to another important sequence in combinatorics, the so-called Catalan numbers C(n): the number of binary trees on n nodes.

One can show that

C(n) = binom(2n,n)/(n-1)

Also

F(n) = C(n-1)

Page 18: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Recursion

Let's write c(i,j) for the optimal cost of multipying out matrices A i A i+1 … A j.

So c(i,i) = 0.

Then c(i,j) is the min over all k, i <= k < j, of

c(i,k) + c(k+1,j) + pi-1pk pj

This is the same idea as for the convolution: split into left and right subtree.

Page 19: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Recursion

c(i,i) = 0.

c(i,j) = min ( c(i,k) + c(k+1,j) + pi-1pk pj ) i≤k<j

With memoizing, done: this is essentially a recursive program to compute c(i,j) for i ≤ j.

We can simply call c(1,n).

Page 20: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Explicit Table

for d = 1,..,n-1 dofor i = 1,..,n-d do j = i + d; C[i,j] = infinity; for k = i,..,j-1 do { // compute min c = C[i,k]+C[k+1,j]+p[i-1] p[k] p[j]; if( c < C[i,j] ) C[i,j] = c; }

Running time is clearly cubic in n.

Page 21: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Correctness

for d = 1,..,n-1 dofor i = 1,..,n-d do j = i + d; C[i,j] = infinity; for k = i,..,j-1 do { c = C[i,k]+C[k+1,j]+p[i-1] p[k] p[j]; if( c < C[i,j] ) C[i,j] = c; }

But why is it correct?

What is the exact order in which the table entries are produced?

Page 22: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Correctness

for d = 1,..,n-1 dofor i = 1,..,n-d do j = i + d; C[i,j] = infinity; for k = i,..,j-1 do { c = C[i,k]+C[k+1,j]+p[i-1] p[k] p[j]; if( c < C[i,j] ) C[i,j] = c; }

Correctness proof is by induction on d: the blue terms have smaller d-values.

Page 23: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Correctness

1 9 16 22 27 31 34 36 0 2 10 17 23 28 32 35 0 0 3 11 18 24 29 33 0 0 0 4 12 19 25 30 0 0 0 0 5 13 20 26 0 0 0 0 0 6 14 21 0 0 0 0 0 0 7 15 0 0 0 0 0 0 0 8

c(i,j) = min ( c(i,k) + c(k+1,j) + pi-1pk pj )

Page 24: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Correctness

1 9 16 22 27 31 34 36 0 2 10 17 23 28 32 35 0 0 3 11 18 24 29 33 0 0 0 4 12 19 25 30 0 0 0 0 5 13 20 26 0 0 0 0 0 6 14 21 0 0 0 0 0 0 7 15 0 0 0 0 0 0 0 8

Page 25: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Correctness

1 9 16 22 27 31 34 36 0 2 10 17 23 28 32 35 0 0 3 11 18 24 29 33 0 0 0 4 12 19 25 30 0 0 0 0 5 13 20 26 0 0 0 0 0 6 14 21 0 0 0 0 0 0 7 15 0 0 0 0 0 0 0 8

c(i,j) = min ( c(i,k) + c(k+1,j) + pi-1pk pj )

Page 26: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Actual Multiplication

So far we only have the optimal of number of scalar multiplications. How do we figure out how to really perform the matrix multiplications?

Keep track of the k for which the minimum value appears:

- Multiply matrices i through k, store result in T1.- Multiply matrices k+1 through j, store in T2.- Multiply T1 and T2.

Page 27: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Faster MM Algorithms

What happens if we use a faster algorithm for the individual matrix multiplications?

Let's write M(p,q,r) for the cost of multiplying the two matrices.

There are clever divide-and-conquer algorithms that show that

M(p,q,r) = o( pqr )

How does this affect our algorithm?

Page 28: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Faster MM Algorithms

It’s essentially irrelevant!

We can simply change the code a little:

c = C[i,k]+C[k+1,j] + M(p[i-1],p[k],p[j]);

That's it!

The optimal solution may be different, but we can still find it using the same Dyn. Prg. Approach.

Page 29: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

All-Pairs Shortest Path

Page 30: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Shortest Paths, again

Recall: We are given a digraph G with a non-negative cost function cost(x,y) for all edges.

Dijkstra's algorithm nicely solves the single source problem.

But if we have to compute the distances dist[x,y] between any two vertices in a dense graph we might as well resort to dynamic programming.

A minor problem: Where on earth is the recursion?

Page 31: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Floyd-Warshall-Kleene

To get a recursive solution we need a clever trick: constrain all intermediate vertices on a path to be in {1,2,...,k} (where V = {1,...,n}).

Then

c(x,y,k) = cost(x,y) possibly infinite

c(x,y,k) = min( c(x,y,k-1), c(x,k,k-1)+c(k,y,k-1) )

Bingo. Done by memoizing.

Page 32: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Explicit Table

Disregarding memory:

for( k = 1; i <= n; k++ )for( i = 1; i <= n; i++ )for( j = 1; i <= n; j++ ) dist[i,j,k] = min( dist[i,j,k-1],

dist[i,k,k-1] + dist[k,j,k-1] );

Clearly cubic in n. Constants are excellent.

Page 33: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Other Approaches?

Restricting the intermediate points is not particularly obvious. Are there other approaches?

How about this:

c(x,y,k) = shortest path from x to y using at most k edges.

So c(x,y,n -1) = dist(x,y)

and c(x,y,1)

is trivial.

Page 34: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

The Step

We need a way to get from c(x,y,<k) to c(x,y,k).

c(x,y,k) = min( c(x,y,k-1), c(x,z,s) + c(z,y,t) )

where s+t = k and 0 < s, t.

But this update step is expensive, both in time and space.

Update in Floyd-Warshall-Kleene is O(1)!

Page 35: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

All-Pairs Longest Paths

Page 36: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Longest PathsOf course, we restrict the longest paths to be

simple paths.Let’s try the same trick as in FWK shortest paths. Constrain all intermediate vertices on a path to be

in {1,2,...,k}.

That is, c(x,y,k) = max( c(x,y,k-1), c(x,k,k-1)+c(k,y,k-1) )Does this work?

Page 37: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

No!The longest simple path from c to a is (c, b, d, a). But the subpath (c, b) is not the longest path from c to b!

Optimal longest paths do not have optimal solutions to subproblems!!!

d

a

b

c1

1

2

34

You cannot use optimal paths from c to b and from b to a to get an optimal path from c to a. Instead you create a cycle.

2

Page 38: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Slow Dynamic Programming

Page 39: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

TSP

Sometimes DynPrg can be used to find relatively fast, but still exponential algorithms.

Traveling Salesman Problem

Given: A distance table between n cities. Problem: Find a shortest tour.

A tour here is required to visit each city exactly once and return to the city of origin. The cost is the sum of the edge costs.

Page 40: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Brute Force

A tour is in essence just a permutation of the cities.

So the brute-force algorithm has to inspect some n! permutations.

Actually: (n -1)! would be enough. Why?

This may sound horrible, but no polynomial time algorithm for TSP is known, and there are good reasons to believe that none exists (NP-completeness).

Page 41: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Fred Uses Dynamic Programming

F.H. has decided that Dyn.Prog. will do better than brute force.

The problem is: How do we describe the problem recursively?

Fred wants to borrow the idea from Floyd-Warshall-Kleene: essentially induction on the number of admissible nodes.

c(x,y,k) = shortest path from x to y using only {1,2,...,k} as intermediate nodes.

Page 42: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Adapt

Adapted for TSP:

c(k) = cost of shortest tour on {1,2,...,k}.

So c(n) is what we are after and c(1) is trivial.

We can now apply dynamic programming to get from c(1) to c(n).

According to Fred, that is.

Page 43: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Well …

But how do we get from c(<k) to c(k)?

We need some simple recursion.

Note that we only keep track of costs, not the actual permutations. But updating is difficult: we would have to insert a step to/from k in all possible places.

There appears to be no reasonable way to do this.

Page 44: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Optimality Substructure

The tour goes along a path from 1 to k and then along a path from k back to 1 so that all vertices are included. But these two paths can divide the vertices {1,2,3,…,k} arbitrarily into two subsets, neither one being {1,2,3,…,j}, j<k where we could use c(j).

There is no optimality substructure!

So we need a different kind of recursion. But which???

Page 45: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Second Try

Here is a better (but much less obvious)line of attack.

For any subset S of {1,2,...,n}, i in S, let

c(S,i) = shortest Hamiltonian path from 1 to i in S.

Hamiltonian: touches every point in S exactly once.

Then the cost of an optimal tour is given by

min( c({1,...,n},i) + cost(i,1) | i )

Page 46: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Recursion

Can we update now?

c({i},i) = cost(1,i)

c(S+j,j) = min( c(S,i) + cost(i,j) | i in S )

Here j is not in S.

Since we are dealing with subsets rather than permutations the running time is

O( n 2n ) = o( n! )

Page 47: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Values versus Witnesses

As usual, we only get the cost of an optimal solution, not the actual tour.

To retrieve the tour, trace the history of the optimal value backwards. The back-trace only requires linearly many steps – but first we have to compute the exponential size table to get at the witnesses.

Page 48: Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

The Recursion

In the TSP problem, the recursion is not over integers or strings (as is most often the case), but over

pointed sets (S,i) where i in S

We generate all pointed sets starting at singletons ({i},i) by adding one element at a time (induction on cardinality).