Top Banner
Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee
57

Partially Observable Markov Decision Processes (POMDPs)

Jan 03, 2017

Download

Documents

trinhphuc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs)

Sachin Patil

Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee

Page 2: Partially Observable Markov Decision Processes (POMDPs)

n  Introduction to POMDPs

n  Locally Optimal Solutions for POMDPs

n  Trajectory Optimization in (Gaussian) Belief Space

n  Accounting for Discontinuities in Sensing Domains

n  Separation Principle

Outline

Page 3: Partially Observable Markov Decision Processes (POMDPs)

Markov Decision Process (S, A, H, T, R)

Given

n  S: set of states

n  A: set of actions

n  H: horizon over which the agent will act

n  T: S x A x S x {0,1,…,H} à [0,1] , Tt(s,a,s’) = P(st+1 = s’ | st = s, at =a)

n  R: S x A x S x {0, 1, …, H} à , Rt(s,a,s’) = reward for (st+1 = s’, st = s, at =a)

Goal:

n  Find : S x {0, 1, …, H} à A that maximizes expected sum of rewards, i.e., π

Page 4: Partially Observable Markov Decision Processes (POMDPs)

= MDP

BUT

don’t get to observe the state itself, instead get sensory measurements

Now: what action to take given current probability distribution rather than given current state.

POMDP – Partially Observable MDP

Page 5: Partially Observable Markov Decision Processes (POMDPs)

POMDPs: Tiger Example

Page 6: Partially Observable Markov Decision Processes (POMDPs)

Belief State n  Probability of S0 vs S1 being true underlying state

n  Initial belief state: p(S0)=p(S1)=0.5

n  Upon listening, the belief state should change according to the Bayesian update (filtering)

TL TR

Page 7: Partially Observable Markov Decision Processes (POMDPs)

Policy – Tiger Example n  Policy π is a map from [0,1] → {listen, open-left, open-right}

n  What should the policy be?

n  Roughly: listen until sure, then open

n  But where are the cutoffs?

Page 8: Partially Observable Markov Decision Processes (POMDPs)

n  Canonical solution method 1: Continuous state “belief MDP”

n  Run value iteration, but now the state space is the space of probability distributions

n  à value and optimal action for every possible probability distribution

n  à will automatically trade off information gathering actions versus actions that affect the underlying state

n  Value iteration updates cannot be carried out because uncountable number of belief states – approximation

Solving POMDPs

Page 9: Partially Observable Markov Decision Processes (POMDPs)

n  Canonical solution method 2:

n  Search over sequences of actions with limited look-ahead

n  Branching over actions and observations

Solving POMDPs

Finite horizon: nodes

Page 10: Partially Observable Markov Decision Processes (POMDPs)

n  Approximate solution: becoming tractable for |S| in millions

n  α-vector point-based techniques

n  Monte Carlo Tree Search

n  …Beyond scope of course…

Solving POMDPs

Page 11: Partially Observable Markov Decision Processes (POMDPs)

n  Canonical solution method 3:

n  Plan in the MDP

n  Probabilistic inference (filtering) to track probability distribution

n  Choose optimal action for MDP for currently most likely state

Solving POMDPs

Page 12: Partially Observable Markov Decision Processes (POMDPs)

n  Introduction to POMDPs

n  Locally Optimal Solutions for POMDPs

n  Trajectory Optimization in (Gaussian) Belief Space

n  Accounting for Discontinuities in Sensing Domains

n  Separation Principle

Outline

Page 13: Partially Observable Markov Decision Processes (POMDPs)

Facilitate reliable operation of cost-effective robots that use:

n  Imprecise actuation mechanisms – serial elastic actuators, cables

n  Inaccurate encoders and sensors – gyros, accelerometers

Motivation

Cable-driven 7-DOF arms Perception

(stereo, depth) Motors connected

to joints using cables

Page 14: Partially Observable Markov Decision Processes (POMDPs)

Continuous state/action/observation spaces

Motivation

Cable-driven 7-DOF arms Perception

(stereo, depth) Motors connected

to joints using cables

Page 15: Partially Observable Markov Decision Processes (POMDPs)

Model Uncertainty As Gaussians

Start

Uncertainty parameterized by mean and covariance

Start

Page 16: Partially Observable Markov Decision Processes (POMDPs)

Dark-Light Domain

State space plan

start

goal

Problem Setup

[Example from Platt, Tedrake, Kaelbling, Lozano-Perez, 2010]

Page 17: Partially Observable Markov Decision Processes (POMDPs)

Dark-Light Domain

start

goal

Problem Setup Belief space plan

Tradeoff information gathering vs. actions

Page 18: Partially Observable Markov Decision Processes (POMDPs)

Problem Setup

n  Stochastic motion and observation Model

n  Non-linear

n  User-defined objective / cost function

n  Plan trajectory that minimizes expected cost

Page 19: Partially Observable Markov Decision Processes (POMDPs)

Locally Optimal Solutions

n  Belief is Gaussian

n 

n  Belief dynamics – Bayesian filter

n  [X] Kalman Filter

(underlying state space) (belief space)

Page 20: Partially Observable Markov Decision Processes (POMDPs)

State Space – Trajectory Optimization

Page 21: Partially Observable Markov Decision Processes (POMDPs)

(Gaussian) Belief Space Planning

             

minµ,Σ,u

H�

t=0

c(µt,Σt, ut)

s.t. (µt+1,Σt+1) = xKF (µt,Σt, ut, wt, vt)

µH = goal

u ∈ U

Page 22: Partially Observable Markov Decision Processes (POMDPs)

(Gaussian) Belief Space Planning

             

= maximum likelihood assumption for observations Can now be solved by Sequential Convex Programming [Platt et al., 2010; also Roy et al ; van den Berg et al. 2011, 2012]

minµ,Σ,u

H�

t=0

c(µt,Σt, ut)

s.t. (µt+1,Σt+1) = xKF (µt,Σt, ut, 0, 0)

µH = goal

u ∈ U Obstacles?

Page 23: Partially Observable Markov Decision Processes (POMDPs)

Dark-Light Domain

start

goal

Problem Setup Belief space plan

Tradeoff information gathering vs. actions

Page 24: Partially Observable Markov Decision Processes (POMDPs)

n  Prior work approximates robot geometry as points or spheres

n  Articulated robots cannot be approximated as points/spheres

n  Gaussian noise in joint space

n  Need probabilistic collision avoidance w.r.t robot links

Collision Avoidance

Van den Berg et al.

Page 25: Partially Observable Markov Decision Processes (POMDPs)

n  Definition: Convex hull of a robot link transformed (in joint space) according to sigma points

n  Consider sigma points lying on the !-standard deviation contour of uncertainty covariance (UKF)

Sigma Hulls

Page 26: Partially Observable Markov Decision Processes (POMDPs)

Collision Avoidance Constraint Consider signed distance between obstacle and sigma hulls

Page 27: Partially Observable Markov Decision Processes (POMDPs)

n  Gaussian belief state in joint space: #↓% =[█■)↓% @Σ↓%  ]

n  Optimization problem:

Variables:

Belief space planning using trajectory optimization

mean covariance

Belief dynamics (UKF) Probabilistic collision avoidance Reach desired end-effector pose Control inputs are feasible

Page 28: Partially Observable Markov Decision Processes (POMDPs)

n  Robot trajectory should stay at least distance from other objects

Collision avoidance constraint

Page 29: Partially Observable Markov Decision Processes (POMDPs)

n  Robot trajectory should stay at least distance from other objects

n  Linearize signed distance at current belief

Collision avoidance constraint

Page 30: Partially Observable Markov Decision Processes (POMDPs)

n  Robot trajectory should stay at least distance from other objects

n  Linearize signed distance at current belief

n  Consider the closest point lies on a face spanned by vertices

Collision avoidance constraint

Page 31: Partially Observable Markov Decision Processes (POMDPs)

n  Discrete collision avoidance can lead to trajectories that collide with obstacles in between time steps

n  Use convex hull of sigma hulls between consecutive time steps

n  Advantages:

n  Solutions are collision-free in between time-steps

n  Discretized trajectory can have less time-steps

Continuous Collision Avoidance Constraint

Page 32: Partially Observable Markov Decision Processes (POMDPs)

n  During execution, update the belief state based on the actual observation

n  Re-plan after every belief state update

n  Effective feedback control, provided one can re-plan sufficiently fast

Model Predictive Control (MPC)

Page 33: Partially Observable Markov Decision Processes (POMDPs)

State space trajectory

Example: 4-DOF planar robot

Page 34: Partially Observable Markov Decision Processes (POMDPs)

1-standard deviation belief space trajectory

Example: 4-DOF planar robot

Page 35: Partially Observable Markov Decision Processes (POMDPs)

4-standard deviation belief space trajectory

Example: 4-DOF planar robot

Page 36: Partially Observable Markov Decision Processes (POMDPs)

Probability of collision

Experiments: 4-DOF planar robot

Page 37: Partially Observable Markov Decision Processes (POMDPs)

Mean distance from target

Experiments: 4-DOF planar robot

Page 38: Partially Observable Markov Decision Processes (POMDPs)

n  Efficient trajectory optimization in Gaussian belief spaces to reduce task uncertainty

n  Prior work approximates robot geometry as a point or a single sphere

n  Pose collision constraints using signed distance between sigma hulls of robot links and obstacles

n  Sigma hulls never explicitly computed – fast convex collision checking and analytical gradients

n  Iterative re-planning in belief space (MPC)

Take-Away

Page 39: Partially Observable Markov Decision Processes (POMDPs)

n  Introduction to POMDPs

n  Locally Optimal Solutions for POMDPs

n  Trajectory Optimization in (Gaussian) Belief Space

n  Accounting for Discontinuities in Sensing Domains

n  Separation Principle

Outline

Page 40: Partially Observable Markov Decision Processes (POMDPs)

Discontinuities in Sensing Domains

Zero gradient, hence local optimum

start

goal

“dark” “light”

Patil et al., under review

Page 41: Partially Observable Markov Decision Processes (POMDPs)

Increasing difficulty

≈  Noise level determined by signed distance to sensing region * homotopy iteration

Discontinuities in Sensing Domains

Page 42: Partially Observable Markov Decision Processes (POMDPs)

Signed Distance to Sensing Discontinuity

Field of view (FOV) discontinuity

Occlusion discontinuity

Page 43: Partially Observable Markov Decision Processes (POMDPs)

vs. Signed distance

Page 44: Partially Observable Markov Decision Processes (POMDPs)

Modified Belief Dynamics

: Binary variable {0,1}  0 -> No measurement 1 -> Measurement

Page 45: Partially Observable Markov Decision Processes (POMDPs)

Incorporating in SQP

n  Binary non-convex program – difficult to solve

n  Solve successively smooth approximations

Page 46: Partially Observable Markov Decision Processes (POMDPs)

Algorithm Overview

n  While δ not within desired tolerance

n  Solve optimization problem with current value of α

n  Increase α

n  Re-integrate belief trajectory

n  Update δ

Page 47: Partially Observable Markov Decision Processes (POMDPs)

Increasing difficulty

≈  Noise level determined by signed distance to sensing region * homotopy iteration

Discontinuities in Sensing Domains

Page 48: Partially Observable Markov Decision Processes (POMDPs)

“No measurement” Belief Update

Truncate Gaussian Belief if no measurement obtained

Page 49: Partially Observable Markov Decision Processes (POMDPs)

Without “No measurement” Belief Update

With “No measurement” Belief Update

Effect of Truncation

Page 50: Partially Observable Markov Decision Processes (POMDPs)

Experiments

Page 51: Partially Observable Markov Decision Processes (POMDPs)

Car and Landmarks (Active Exploration)

Page 52: Partially Observable Markov Decision Processes (POMDPs)

Arm Occluding (Static) Camera

Initial belief State space plan execution

(way-point) (end) Belief space plan execution

Page 53: Partially Observable Markov Decision Processes (POMDPs)

Arm Occluding (Moving) Camera

Initial belief State space plan execution

(way-point) (end) Belief space plan execution

Page 54: Partially Observable Markov Decision Processes (POMDPs)

n  Introduction to POMDPs

n  Locally Optimal Solutions for POMDPs

n  Trajectory Optimization in (Gaussian) Belief Space

n  Accounting for Discontinuities in Sensing Domains

n  Separation Principle

Outline

Page 55: Partially Observable Markov Decision Processes (POMDPs)

n  Assume:

n  Goal:

n  Then, optimal control policy consists of:

1. Offline/Ahead of time: Run LQR to find optimal control policy for fully observed case, which gives sequence of feedback matrices

2. Online: Run Kalman filter to estimate state, and apply control

xt+1 = Axt +But + wt wt ∼ N (0, Qt)

zt = Cxt + vt vt ∼ N (0, Rt)

minimize E

�H�

t=0

u�tUtut + x�

tXtxt

K1,K2, . . .

ut = Ktµt|0:t

Separation Principle

Page 56: Partially Observable Markov Decision Processes (POMDPs)

Extensions

n  Current research directions

n  Fast! belief space planning

n  Multi-modal belief spaces

n  Physical experiments with the Raven surgical robot

Page 57: Partially Observable Markov Decision Processes (POMDPs)

Recap

n  POMDP = MDP but sensory measurements instead of exact state knowledge

n  Locally optimal solutions in Gaussian belief spaces

n  Augmented state vector (mean, covariance)

n  Trajectory optimization

n  Sigma Hulls for probabilistic collision avoidance

n  Homotopy methods for dealing with discontinuities in sensing domains