Top Banner
Assignment 3 Out: 5 October Due: 26 October 23:59 Grace period: 27 October 13:00 No cheating!!!
32

Assignment 3 - Australian National University

Dec 18, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Assignment 3 - Australian National University

Assignment 3• Out: 5 October• Due: 26 October 23:59• Grace period: 27 October 13:00• No cheating!!!

Page 2: Assignment 3 - Australian National University

Final Project• Milestone 2:• Please read the description:

https://cs.anu.edu.au/courses/comp3600/finalPrj-m2.pdf• M2 description is in p.4

• Due: 8 Oct 23:59• Grace period: 9 Oct 13:00

Page 3: Assignment 3 - Australian National University

Marking• A2 XOR A3 is redeemable by Final Project • Total Score = max(0.15*A1 + 0.2*max(A2, FinalProject)

+ 0.25*A3 + 0.4*FinalProject ; 0.15*A1 + 0.2*A2 + 0.25*max(A3, FinalProject) + 0.4*FinalProject)

• But, note that Final Project, Milestone-1 marking is extremely lenient. This leniency will decrease in the marking of Milestone-2 and further decrease in the marking of the final deliverables• Hence, skipping A3 to work solely on the Final Project

may not be a good idea

Page 4: Assignment 3 - Australian National University

COMP3600/6466 – Algorithms Dynamic Programming Cont.

[CLRS sec.15.2]

Hanna Kurniawati

https://cs.anu.edu.au/courses/comp3600/[email protected]

Page 5: Assignment 3 - Australian National University

TopicsüWhat is it?üExample: Fibonacci SequenceüHow to develop DP algorithms?üExample: Shortest PathüExample: Chain matrix multiplication• Example: Longest Common Subsequence• Example: Decision-making under uncertainty

Page 6: Assignment 3 - Australian National University

Longest Common Subsequence (LCS)• The Problem: Given two strings X and Y, find a

subsequence of the strings that appear in both X and Y and has the longest length• Note: Here, the subsequence does not need to be

continuous, but the order must be the same• Example: • Suppose X = (A, B, C, B, D, A, B) and Y = (B, D, C, A,

B, A). Then, LCS(X, Y) = (B, C, A, B) OR (B, D, A, B)• Applications:• Computational biology, e.g., comparing DNA• diff function in Linux

Page 7: Assignment 3 - Australian National University

LCS – DP Steps1. Sub-problems:

• Recall the heuristic: When the problem requires processing of a sequence, there’s generally 3 possibilities to divide the problem into sub-problems: Prefix (subproblems from start to index-k), Suffix (subproblems from index-k to end), Substrings (subproblems from index-i to index-j)

• In this case, the sub-problems are the LCS of pairs of prefixes of the original strings and its length

Page 8: Assignment 3 - Australian National University

LCS – DP Steps2. Relation between sub-problems• On LCS (optimal sub-structure of LCS)

Suppose X = (x1, x2, …, xm) and Y = (y1, y2, …, yn) be the input sequences and suppose Z = (z1, z2, …, zk) be any LCS of X and Y, then there’s 3 cases:• If xm = yn and zk = xm = yn then Zk-1 is an LCS of Xm-1 and

Yn-1

• If xm ≠ yn and zk ≠ xm then Zk-1 is an LCS of Xm-1 and Y• If xm ≠ yn and zk ≠ yn then Zk-1 is an LCS of X and Yn-1

• Xi, Yi, and Zi are continuous subsequences of X, Y, and Z respectively, starting from index-1 to index-i

Page 9: Assignment 3 - Australian National University

LCS – DP Steps2. Relation between sub-problems• On the length of the LCS

𝑐 𝑖, 𝑗 = '0 𝑖 = 0 𝑂𝑅 𝑗 = 0

𝑐 𝑖 − 1, 𝑗 − 1 + 1 𝑖, 𝑗 > 0 𝑎𝑛𝑑 𝑥! = 𝑦"max(𝑐 𝑖, 𝑗 − 1 , 𝑐[𝑖 − 1, 𝑗]) 𝑖, 𝑗 > 0 𝑎𝑛𝑑 𝑥! ≠ 𝑦"

• 𝑐 𝑖, 𝑗 : The length of LCS of Xi and Yj

Page 10: Assignment 3 - Australian National University

LCS – DP Steps3. Constructing the recurrence/topological order:

a. Is the sub-problem graph a DAG? Yes, because the sub-problem dependency goes one way, which is to the prefixes.

Page 11: Assignment 3 - Australian National University

LCS – DP Steps3. Constructing the recurrence/topological order:

b. Top-down recurrence is straight forward. Topological order for bottom-up? Starts from LCS of X0 and Y0 to LCS of Xm and Yn

Maintain a 2D matrix to keep the length of the LCS and direction on where does the LCS come from

[CLRS] p. 395

Page 12: Assignment 3 - Australian National University

Pseudo-code[CLRS] p. 394

Page 13: Assignment 3 - Australian National University

LCS – DP Steps4. Compute solution to the original problem: • c[m, n] is the length of the LCS of the original

problem.• To get the LCS itself, traverse the matrix, starting from

b[m, n], following the directions. Whenever the direction is diagonal, we put the character corresponding to the particular cell into the LCS (e.g., a diagonal at b[i, j] means we’ll add xi to the LCS. This strategy will output the LCS from its last character

Page 14: Assignment 3 - Australian National University

LCS – Time complexity• #sub-problems: Θ(𝑚𝑛)• Time/sub-problem: Θ(1)• Total time for the DP (i.e., constructing the

matrix): #sub-problems X time/sub-problem=Θ(𝑚𝑛)

Page 15: Assignment 3 - Australian National University

TopicsüWhat is it?üExample: Fibonacci SequenceüHow to develop DP algorithms?üExample: Shortest PathüExample: Chain matrix multiplicationüExample: Longest Common Subsequence• Example: Decision-making under uncertainty

Page 16: Assignment 3 - Australian National University

Dynamic Programming in Robust Planning / Decision-Making• Dynamic Programming is a well-known approach (in

fact one of two major approaches) in robust control and planning (aka. sequential decision-making)• Robust: The system is affected by uncertainty• Sequential decision-making: The problem of deciding what

should a system do now, so as to get good long-term performance

• Rely on Bellman’s Principle of Optimality: An optimal solution from the initial state must constitute an optimal solution from the state resulting from the first decision

• The notion of dynamic programming in algorithm and in planning and control are the same. In fact, dynamic programming approach to algorithm design started exactly from the control and planning domain

Page 17: Assignment 3 - Australian National University

An Example: Solving a Markov Decision Processes (MDP) Problem• A framework to find the best mapping from states

to actions when the outcome of each action is non-deterministic.• Many applications:• Games: Tic Tac Toe, Chess, Go• Robots: Pedestrian avoidance in self-driving cars• Navigation:

Page 18: Assignment 3 - Australian National University

Markov Decision Processes• The non-determinism must be 1st order Markov.• 1st order Markov means given the present state, the

future states are independent from the past states.• P(st+1 | st, at) = P(st+1 | st, at, st-1, at-1, .., s1, a1, s0)

Page 19: Assignment 3 - Australian National University

Defining an MDP Problem• Formally defined as 4-tuples

(S, A, T, R):• S: State space• A: Action space• T: transition function

T(s, a, s’) = P(St+1 = s’ | St = s, At = a)• R: Reward function

R(s) or R(s, a) or R(s, a, s’)

G

Page 20: Assignment 3 - Australian National University

Solving an MDP problem• Is finding an optimal policy, usually

denoted as π*.• Policy = strategy• A mapping from states to actions π : S à A.• Meaning for any state s in S, π(s) wil tell us the best

action the system should perform.• Example: +1

-1

Page 21: Assignment 3 - Australian National University

Using a Policy

Policy

Action

Observation (state)

G

1. Starts from the initial state.2. Move according to the policy.3. The system moves to a new

state and receives a reward Some notes:The new state the system ends up may be different in different runs.The goal of the system is to get the maximum possible total reward

4. Repeat to step-2 until stopping criteria is satisfied (e.g., goal is reached)

Page 22: Assignment 3 - Australian National University

𝑉∗ 𝑠 = max$

𝑅 𝑠 + 𝛾>%&

𝑇 𝑠, 𝑎, 𝑠′ 𝑉∗ 𝑠′

Solving an MDP is Solving an Optimization Problem

• Recall optimal policy maps states to the best action. Best here means maximizing the following

• Theorem: There is a unique function V* satisfying the above function

Q(s, a)Bellman equation

Notice the optimal sub-structure property

Page 23: Assignment 3 - Australian National University

Solving an MDP is Solving an Optimization Problem

• Optimal policy?• If we know V*, the optimal policy can be generated

easily.

𝜋∗ 𝑠 = argmax$

𝑅 𝑠 + 𝛾>%&

𝑇 𝑠, 𝑎, 𝑠′ 𝑉∗ 𝑠′

Page 24: Assignment 3 - Australian National University

Value Iteration: A way to compute the optimal value function• Iterate calculating the optimal value of a state until

convergence.• Algorithm:

Initialize for all s.Loop

For all s {

}t = t + 1

Until Vt+1(s)=Vt(s) for all s (impl: maxs |Vt+1(s)-Vt(s)| < 1e-7)• Essentially, bottom-up dynamic programming

Often called value update or Bellman update or Bellman backup.

𝑉'() 𝑠 = max$

𝑅 𝑠 + 𝛾>%&

𝑇 𝑠, 𝑎, 𝑠′ 𝑉' 𝑠′

𝑉* 𝑠 = 𝑅 𝑠

Page 25: Assignment 3 - Australian National University

Example: Simple Navigation• An agent moves in 4X3 grid cells.• It can move to one of four neighboring

cells. The actions’ accuracy is 70%.30% of the time, the agent ends up at the left or right of its intended cell, or at the current cell, with equal probability. If there’s no cell in the left or right of its intended cell, the probability mass is added to staying where it is.• Collision with obstacle/boundary will result in no

movement.• Two terminal states, with reward +1 and -1. Being in all

other valid states incur a cost of -0.04. Being in invalid states incur a cost of -10

+1-1

S

Page 26: Assignment 3 - Australian National University

The MDP Model• S: The set of cells, say {c1,1, c1,2, …, c3,4}

• A: {L, R, U, D}

+1-1

S

123

1 2 3 4

Page 27: Assignment 3 - Australian National University

The MDP Model• T(s, a, s’): For each action, we’ll have the probability matrix

C1,1

C1,2

C1,3

::

C3,4

C1,1 C1,2 C1,3 … C3,4

s

s’Left

Page 28: Assignment 3 - Australian National University

The MDP Model• Reward function R(s): Parameterized by states

-0.04 -0.04 -0.04 +1-0.04 -10 -0.04 -1-0.04 -0.04 -0.04 -0.04

123

1 2 3 4

Page 29: Assignment 3 - Australian National University

Computing the Optimal Policy?• Run the value iteration algorithm on the

information about transition and reward function that have been defined earlier

• Time complexity? 𝑂(𝑇 𝑆 D 𝐴 ), where T is #time steps to convergence.

Page 30: Assignment 3 - Australian National University

TopicsüWhat is it?üExample: Fibonacci SequenceüHow to develop DP algorithms?üExample: Shortest PathüExample: Chain matrix multiplicationüExample: Longest Common SubsequenceüExample: Decision-making under uncertainty

Next: Greedy Approach for Designing Algorithms

Page 31: Assignment 3 - Australian National University

If time permits, W9 Tutorial Q1• How many ways can you roll a sum of n by throwing

a 6-sided dice at most n times? Note that in this question, order matters1. Sub-problems: Number of ways to roll a sum of 𝑠 by

throwing the dice 𝑘 times, let’s denote this as 𝑊(𝑠, 𝑘)2. Relation between sub-problems:

𝑊 𝑠, 𝑘 =

0 𝑘 = 0, 𝑠 ≤ 0

>!+)

,

𝑊(𝑠 − 𝑖, 𝑘 − 1) 𝑘 > 0, 𝑠 > 0

Page 32: Assignment 3 - Australian National University

If time permits, W9 Tutorial Q1• How many ways can you roll a sum of n by throwing

a 6-sided dice at most n times? Note that in this question, order matters3. Recurrence / topological order

a. Is the sub-problem graph a DAG? Yes, because the sub-problem dependency goes one way, which is to smaller 𝑠 and 𝑘

b. Top down + memorization, simply transform the recurrence in step-2 to pseudo-code

4.Compute solution to the original problem: 𝑊(𝑛, 𝑛)