Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Dynamic Programming II

15-211

Fundamental Data Structures and Algorithms

Klaus SutnerApril 1, 2004

Plan

Homework 6 ...

Quiz 2

ReadingSection 7.6

Recall: Dynamic Programming

Recomputation

Principle: Avoid recomputation.

Typically store already known results in a hash table.

If the computation has additional structure, a plain table may be better.

Space versus time trade-off.

Suproblems

Divide the problem into smaller subproblems.

Since we use memoizing/tables the subproblems may well overlap – even so, there will be no recomputation.

Note: It is not always clear what the subproblems should be. Finding the right parameters can be challenging.

Optimality

Optimal solutions of the problem are composed of optimal solutions to subproblems.

In many optimization problems this property is entirely natural:

- each segment of a shortest path is a shortest path- an optimal knapsack is obtained from an optimal knapsack on fewer items- a LCS is obtained from LCSs of truncated strings

Recursion

Fibonacci: n n-1, n-2Fast Fibo.: n n/2 (more or less)

Binary KP: (k,A) (k-1,A), (k-1,A-s)

LCS: (i,j) (i-1,j-1), (i,j-1), (i-1,j)

In all cases, recursion clearly terminates.

Though for Fast Fibonacci the number of subproblems is far from obvious.

Top-Down vs. Bottom-Up

Top-down places the burden on the storage mechanism (memoization) – it must keep track of known values. More efficient if only a few values are needed.

Bottom-up requires the algorithm designer to figure out the right order in which to compute values – no hash table needed, just a k-dimensional array (usually k = 1 or k = 2). More efficient if all (most) values in array are used.

Matrix Chain Multiplication

Matrix Multiplication

Given a p x q matrix A and an q x r matrix B we can compute their product

C = A B

using p q r scalar multiplications.

This assumes the brute-force algorithm, there are better ways (clever divide-and-conquer methods).

CB

A= xC

BA= x

CB

A= xC

BA= x

Matrix Chain Multiplication

Given a n matrices A1 ,A2 ,..., A n of suitable dimensions, we want to compute their product.

C = A1 A2 … A n

Note: Matrix multiplication is associative (try to prove this), so we can parenthesize the expression any which way we like.

But, of course, we must not permute the matrices.

N = 3

Who cares? You, the implementor: the total number of scalar multiplications depends on where we put the parens.

A1 10 x 100A2 100 x 5A3 5 x 50

Which is better: ( A1 A2 ) A 3 or A1 ( A2 A 3 ) ?

N = 3

A1 10 x 100A2 100 x 5A3 5 x 50

( A1 A2 ) A 3 requires 7500 scalar mults.

A1 ( A2 A 3 ) requires 75000 scalar mults.

This assumes the standard algorithm. A surprising factor of 10.

The Problem

So the problem is: find the optimal (minimal number of scalar multiplications) parenthesization for a given sequence of matrices.

The only relevant input is the vector of dimensions of the matrices:

p0 p1 p2 … p n

where matrix A i has dimension pi-1 pi

Brute Force

Boils down to finding all full binary trees on n leaves.

A B C D

D

A

C

B

Hard Recurrence Equation

Let F(n) be the number of all full binary trees on n leaves.

F(1) = 1

F(n) = F(1)F(n-1) + F(2)F(n-2) +… + F(n-1)F(1)

Convolution, hard to solve.

Claim: F(n) = (n -3/2 4 n)

Catalan Numbers

Incidentally, F(n) is closely related to another important sequence in combinatorics, the so-called Catalan numbers C(n): the number of binary trees on n nodes.

One can show that

C(n) = binom(2n,n)/(n-1)

Also

F(n) = C(n-1)

Recursion

Let's write c(i,j) for the optimal cost of multipying out matrices A i A i+1 … A j.

So c(i,i) = 0.

Then c(i,j) is the min over all k, i <= k < j, of

c(i,k) + c(k+1,j) + pi-1pk pj

This is the same idea as for the convolution: split into left and right subtree.

Recursion

c(i,i) = 0.

c(i,j) = min ( c(i,k) + c(k+1,j) + pi-1pk pj ) i≤k<j

With memoizing, done: this is essentially a recursive program to compute c(i,j) for i ≤ j.

We can simply call c(1,n).

Explicit Table

for d = 1,..,n-1 dofor i = 1,..,n-d do j = i + d; C[i,j] = infinity; for k = i,..,j-1 do { // compute min c = C[i,k]+C[k+1,j]+p[i-1] p[k] p[j]; if( c < C[i,j] ) C[i,j] = c; }

Running time is clearly cubic in n.

Correctness

for d = 1,..,n-1 dofor i = 1,..,n-d do j = i + d; C[i,j] = infinity; for k = i,..,j-1 do { c = C[i,k]+C[k+1,j]+p[i-1] p[k] p[j]; if( c < C[i,j] ) C[i,j] = c; }

But why is it correct?

What is the exact order in which the table entries are produced?

Correctness

for d = 1,..,n-1 dofor i = 1,..,n-d do j = i + d; C[i,j] = infinity; for k = i,..,j-1 do { c = C[i,k]+C[k+1,j]+p[i-1] p[k] p[j]; if( c < C[i,j] ) C[i,j] = c; }

Correctness proof is by induction on d: the blue terms have smaller d-values.

Correctness

1 9 16 22 27 31 34 36 0 2 10 17 23 28 32 35 0 0 3 11 18 24 29 33 0 0 0 4 12 19 25 30 0 0 0 0 5 13 20 26 0 0 0 0 0 6 14 21 0 0 0 0 0 0 7 15 0 0 0 0 0 0 0 8

c(i,j) = min ( c(i,k) + c(k+1,j) + pi-1pk pj )

Correctness

1 9 16 22 27 31 34 36 0 2 10 17 23 28 32 35 0 0 3 11 18 24 29 33 0 0 0 4 12 19 25 30 0 0 0 0 5 13 20 26 0 0 0 0 0 6 14 21 0 0 0 0 0 0 7 15 0 0 0 0 0 0 0 8

Correctness

1 9 16 22 27 31 34 36 0 2 10 17 23 28 32 35 0 0 3 11 18 24 29 33 0 0 0 4 12 19 25 30 0 0 0 0 5 13 20 26 0 0 0 0 0 6 14 21 0 0 0 0 0 0 7 15 0 0 0 0 0 0 0 8

c(i,j) = min ( c(i,k) + c(k+1,j) + pi-1pk pj )

Actual Multiplication

So far we only have the optimal of number of scalar multiplications. How do we figure out how to really perform the matrix multiplications?

Keep track of the k for which the minimum value appears:

- Multiply matrices i through k, store result in T1.- Multiply matrices k+1 through j, store in T2.- Multiply T1 and T2.

Faster MM Algorithms

What happens if we use a faster algorithm for the individual matrix multiplications?

Let's write M(p,q,r) for the cost of multiplying the two matrices.

There are clever divide-and-conquer algorithms that show that

M(p,q,r) = o( pqr )

How does this affect our algorithm?

Faster MM Algorithms

It’s essentially irrelevant!

We can simply change the code a little:

c = C[i,k]+C[k+1,j] + M(p[i-1],p[k],p[j]);

That's it!

The optimal solution may be different, but we can still find it using the same Dyn. Prg. Approach.

All-Pairs Shortest Path

Shortest Paths, again

Recall: We are given a digraph G with a non-negative cost function cost(x,y) for all edges.

Dijkstra's algorithm nicely solves the single source problem.

But if we have to compute the distances dist[x,y] between any two vertices in a dense graph we might as well resort to dynamic programming.

A minor problem: Where on earth is the recursion?

Floyd-Warshall-Kleene

To get a recursive solution we need a clever trick: constrain all intermediate vertices on a path to be in {1,2,...,k} (where V = {1,...,n}).

Then

c(x,y,k) = cost(x,y) possibly infinite

c(x,y,k) = min( c(x,y,k-1), c(x,k,k-1)+c(k,y,k-1) )

Bingo. Done by memoizing.

Explicit Table

Disregarding memory:

for( k = 1; i <= n; k++ )for( i = 1; i <= n; i++ )for( j = 1; i <= n; j++ ) dist[i,j,k] = min( dist[i,j,k-1],

dist[i,k,k-1] + dist[k,j,k-1] );

Clearly cubic in n. Constants are excellent.

Other Approaches?

Restricting the intermediate points is not particularly obvious. Are there other approaches?

How about this:

c(x,y,k) = shortest path from x to y using at most k edges.

So c(x,y,n -1) = dist(x,y)

and c(x,y,1)

is trivial.

The Step

We need a way to get from c(x,y,<k) to c(x,y,k).

c(x,y,k) = min( c(x,y,k-1), c(x,z,s) + c(z,y,t) )

where s+t = k and 0 < s, t.

But this update step is expensive, both in time and space.

Update in Floyd-Warshall-Kleene is O(1)!

All-Pairs Longest Paths

Longest PathsOf course, we restrict the longest paths to be

simple paths.Let’s try the same trick as in FWK shortest paths. Constrain all intermediate vertices on a path to be

in {1,2,...,k}.

That is, c(x,y,k) = max( c(x,y,k-1), c(x,k,k-1)+c(k,y,k-1) )Does this work?

No!The longest simple path from c to a is (c, b, d, a). But the subpath (c, b) is not the longest path from c to b!

Optimal longest paths do not have optimal solutions to subproblems!!!

d

a

b

c1

1

2

34

You cannot use optimal paths from c to b and from b to a to get an optimal path from c to a. Instead you create a cycle.

2

Slow Dynamic Programming

TSP

Sometimes DynPrg can be used to find relatively fast, but still exponential algorithms.

Traveling Salesman Problem

Given: A distance table between n cities. Problem: Find a shortest tour.

A tour here is required to visit each city exactly once and return to the city of origin. The cost is the sum of the edge costs.

Brute Force

A tour is in essence just a permutation of the cities.

So the brute-force algorithm has to inspect some n! permutations.

Actually: (n -1)! would be enough. Why?

This may sound horrible, but no polynomial time algorithm for TSP is known, and there are good reasons to believe that none exists (NP-completeness).

Fred Uses Dynamic Programming

F.H. has decided that Dyn.Prog. will do better than brute force.

The problem is: How do we describe the problem recursively?

Fred wants to borrow the idea from Floyd-Warshall-Kleene: essentially induction on the number of admissible nodes.

c(x,y,k) = shortest path from x to y using only {1,2,...,k} as intermediate nodes.

Adapt

Adapted for TSP:

c(k) = cost of shortest tour on {1,2,...,k}.

So c(n) is what we are after and c(1) is trivial.

We can now apply dynamic programming to get from c(1) to c(n).

According to Fred, that is.

Well …

But how do we get from c(<k) to c(k)?

We need some simple recursion.

Note that we only keep track of costs, not the actual permutations. But updating is difficult: we would have to insert a step to/from k in all possible places.

There appears to be no reasonable way to do this.

Optimality Substructure

The tour goes along a path from 1 to k and then along a path from k back to 1 so that all vertices are included. But these two paths can divide the vertices {1,2,3,…,k} arbitrarily into two subsets, neither one being {1,2,3,…,j}, j<k where we could use c(j).

There is no optimality substructure!

So we need a different kind of recursion. But which???

Second Try

Here is a better (but much less obvious)line of attack.

For any subset S of {1,2,...,n}, i in S, let

c(S,i) = shortest Hamiltonian path from 1 to i in S.

Hamiltonian: touches every point in S exactly once.

Then the cost of an optimal tour is given by

min( c({1,...,n},i) + cost(i,1) | i )

Recursion

Can we update now?

c({i},i) = cost(1,i)

c(S+j,j) = min( c(S,i) + cost(i,j) | i in S )

Here j is not in S.

Since we are dealing with subsets rather than permutations the running time is

O( n 2n ) = o( n! )

Values versus Witnesses

As usual, we only get the cost of an optimal solution, not the actual tour.

To retrieve the tour, trace the history of the optimal value backwards. The back-trace only requires linearly many steps – but first we have to compute the exponential size table to get at the witnesses.

The Recursion

In the TSP problem, the recursion is not over integers or strings (as is most often the case), but over

pointed sets (S,i) where i in S

We generate all pointed sets starting at singletons ({i},i) by adding one element at a time (induction on cardinality).

Dynamic Programming II 15-211 Fundamental Data Structures and Algorithms Klaus Sutner April 1, 2004.

Documents