Stochastic Control Princeton

8/13/2019 Stochastic Control Princeton

1/14

Stochastic Optimal ControlRobert Stengel

Optimal Control and Estimation, MAE 546Princeton University, 2013

Copyright 2013 by Robert Stengel. All rights reserved. For educatio nal use only.http://www.princeton.edu/~stengel/MAE546.html

http://www.princeton.edu/~stengel/OptConEst.html

! Nonlinear systems with randominputs and perfect measurements! Nonlinear systems with random

inputs and imperfect measurements

! Certainty equivalence and separation! Stochastic neighboring-optimal

control

! Linear-quadratic-Gaussian (LQG)control

Nonlinear Systems with RandomInputs and Perfect Measurements

Inputs and initial conditions are uncertain,but the state can be measured without error

!x t( ) = f x t( ),u t( ), w t( ),t!" #$

z t( ) =x t( )

E x 0( )!" #$ =x 0( )

E x 0( )%x 0( )!" #$ x 0( )%x 0( )!" #$T{ }=0

Assume that random disturbance effectsare small and additive

!x t( ) = f x t( ), u t( ),t!" #$ + L t( )w t( )

E w t( )!" #$ = 0

E w t( )wT %( )!" #$ =W t( )& t' %( )

Cost Must Be an Expected Value Deterministic cost function cannot be minimized because

disturbance effect on state cannot be predicted state and control are random variables

minu(t)

J = ! x(tf)"# $% + L x(t),u(t)[ ]to

tf

& dt

minu(t)

J = E ! x(tf)"# $% + L x(t),u(t)[ ]to

tf

& dt'()

*)

+,)

-)

However, the expected value of a deterministic costfunctioncan be minimized

Stochastic Euler-LagrangeEquations?

There is no single optimal trajectory

Expected values of Euler-Lagrange necessaryconditions may not be well defined

1 E (tf)"# $% = E &'[x(tf)]&x()*

+,-

T

2 E ! (t)"# $% = &E

'H[x(t),u(t),(t),t]'x

()*

+,-

T

3) E!H[x(t),u(t),(t),t]

!u

#$%

&'(= 0


2/14

Stochastic Value Function fora Nonlinear System

However, a Hamilton-Jacobi-Bellman (HJB) based onexpectations can be solved

Base the optimization on the Principle of Optimality Optimal expected value function at t1

V* t1( ) =E ! x*(tf)"# $% & L x*('),u*(')[ ]

tf

t1

( d')*+

,+

-.+

/+

=minu

E ! x*(tf)"# $% & L x*('),u(')[ ]tf

t1

( d')*+

,+

-.+

/+

Rate of Change of the Value Function

dV*dt

t= t1

=!E L x *(t1),u *(t1)[ ]{ }

x(t)and u(t)can be known precisely; therefore

Total time-derivative of V*

dV*

dtt= t1

=!L x *(t1),u *(t

1)[ ]

Incremental Changein the Value Function

Apply chain rule to total derivativedV

*

dt= E !V

*

!t+ !V

*

!x!x"

#$%&'

!V*=dV*

dt!t= E

"V*"t

!t+"V*"x

!x!t+1

2!xT" 2V*"x2

!x#$%

&'(!t2 +"

)

*+

,

-.

= E "V*

"t!t+

"V*"x

f .( )+Lw .( )( )!t+1

2f .( )+Lw .( )( )

T" 2V*"x2

f .( )+Lw .( )( )#$%

&'(!t2 +"

)

*+

,

-.

Incremental change in value function, Expand to second degree

!V

Cancel !t

Introduction of the Trace

Trace of a matrix productis scalar

Tr ABC( ) = Tr CAB( ) = Tr BCA( )

Tr xTQx( ) = Tr xxTQ( ) = Tr QxxT( ) dim Tr ( )!" #$ = 1%1

dV*

dt! E

"V*"t

+"V*"x

f .( )+Lw .( )( )+1

2Tr f .( )+Lw .( )( )

T" 2V*"x2

f .( )+Lw .( )( )#$%

&'()t

*

+,

-

./

= E "V*

"t+

"V*"x

f .( )+Lw .( )( )+1

2Tr

"2V*"x2

f .( )+Lw .( )( ) f .( )+Lw .( )( )T#

$%&'()t

*

+,

-

./


3/14

Toward the StochasticHJB Equation

dV*

dt= E

!V*!t

+!V*!x

f .( )+Lw .( )( )+1

2Tr

!2V*!x2

f .( )+Lw .( )( ) f .( )+Lw .( )( )T"

#$%&'(t

)

*+

,

-.

=!V*!t

+!V*!x

f .( )+ E !V*

!xLw .( )+

1

2Tr

!2V*!x2

f .( )+Lw .( )( ) f .( )+Lw .( )( )T"

#$%&'(t

)

*+

,

-.

Becausex(t)and u(t)can be measured,

they can be taken outside the expectation

Toward the Stochastic HJB Equation

E w t( )!" #$ = 0

E w t( )wT %( )!" #$ =W t( )& t' %( )

dV*

dt=!V*

!t+!V*

!xf .( )+

1

2lim"t#0

Tr !2V*

!x2 E f .( )f .( )

T( )"t+ LE w .( )w .( )T( )LT$% &'"t()*

+,-

=!V*

!tt( )+

!V*

!xt( )f .( )+

1

2Tr

!2V*

!x2 t( )L t( )W t( )L t( )

T$

%.

&

'/

Uncertain disturbance input can onlyincrease the value function rate of change

Disturbance is assumed to be zero-mean white noise

Stochastic Principle ofOptimality

(Perfect Measurements)

!V*

!tt( ) =

"minu

E L x* t( ),u t( ),t#$ %& +!V*

!xt( ) f x* t( ),u t( ),t#$ %&+

1

2Tr

!2V*

!x2 t( )L t( )W t( )L t( )

T#

$'

%

&(

)*+,

-./,

Boundary (terminal) condition : V* tf( ) =E 0 tf( )#$ %&

dV*

dt= !V*

!tt( )+ !V*

!xt( ) f .( ) + 1

2Tr !

2

V*

!x2 t( )L t( )W t( )L t( )T"

#$ %

&'

! Substitute for total derivative, dV*/dt= L(x*,u*)! Solve for the partial derivative, "V*/"t! Stochastic HJB Equation

Observations of StochasticPrinciple of Optimality


!V*

!tt( ) =

"minu

E L x* t( ),u t( ),t#$ %& +!V*

!xt( )f x* t( ),u t( ),t#$ %& +

1

2Tr

!2V*

!x2 t( )L t( )W t( )L t( )

T#

$'

%

&(

)*+,

-./,

! Control has no effect on the disturbance input! Criterion for optimality is the sameas for the

deterministic case

! Disturbance uncertaintyincreases the magnitudeof the total optimal value function, V*(0)


4/14

Information Sets andExpected Cost

! Sigma algebra(Wikipedia definitions)! Thecollection of setsover which a measure is defined! The collection of eventsthat can be assigned probabilities! A measurable space

! Information available at current time, t1! All measurements from initial time, to! All control commands from initial time

I to, t

1[ ] = z to,t1[ ],u to,t1[ ]{ }

The Information Set,I

! Plus available model structure, parameters, and statisticsI t

o,t

1[ ] = z to,t1[ ],u to,t1[ ], f ( ),Q,R,!{ }

! Measurements may be directly useful, e.g.,! Displays! Simple feedback control

! ... or they may require processing, e.g.,! Transformation! Estimation

! Example of a derived information set! History of mean and covariance from a state estimator

ID t

o,t

1[ ] = x to,t1[ ],P to,t1[ ],u to,t1[ ]{ }

A Derived Information Set, ID

!Markov derivedinformation set

! Most current mean and covariance froma state estimator

IMD

t1( ) = x t1( ),P t1( ),u t1( ){ }

Additional DerivedInformation Sets

! Multiple model derivedinformation set! Parallel estimates of current mean, covariance,

and hypothesis probability mass function

IMM

t1( ) = xA t1( ),PA t1( ),u t1( ),Pr HA( )!" #$, xB t1( ),PB t1( ),u t1( ),Pr HB( )!" #$,!{ }


5/14

! Optimal control requires propagation of information backfrom the final time

! Hence, it requires the entire information set, extending from tototf

Required and Available InformationSets for Optimal Control

I to,tf!" #$

! Separate information set into knowableand predictablepartsI to, tf!" #$ = I to ,t1[ ]+ I t1, tf!" #$

! Knowable information has been received! Predictable information is to come

Expected Values ofState and Control

Expected values of the state and control are

conditioned on the information set

E x t( )| ID

!" #$ = x t( )

E x t( )% x t( )!" #$ x t( )% x t( )!" #$T

| ID{ } =P t( )

... where the conditional expected values areestimates from an optimal filter

Dependence of the Stochastic CostFunction on the Information Set

Expand the state covariance

J=1

2E E Tr S(tf)x(tf)x

T(tf)!" #$| ID!"

#$ + E Tr Qx t( )x

Tt( )!" #${ }dt

0

tf

% + E Tr Ru t( )uT

t( )!" #${ }dt0

tf

%&'(

)(

*+(

,(

P t( ) = E x t( )! x t( )"# $% x t( )! x t( )"# $%T

| ID{ }

= E x t( )xT t( )! x t( )xT t( )! x t( )xT t( )+ x t( )xT t( )"# $%| ID{ }

E x t( )xT t( )!" #$| ID{ } = E x t( )xT t( )!" #$| ID{ } = x t( )xT t( )

P t( ) = E x t( )xT t( )!" #$ | ID{ }% x t( )xT t( )or

E x t( )xT t( )!" #$ | ID{ } = P t( )+ x t( )xT t( )

... where the conditionalexpected values are

obtained from an optimalfilter

Certainty-Equivalent andStochastic Incremental Costs

J= 1

2E Tr S(tf) P tf( )+ x(tf)x

T(tf)!" #${ }+ Tr Q P t( )+ x t( )xT t( )!" #${ }dt

0

tf

% + Tr Ru t( )uT t( )!" #$dt

0

tf

%&

'()(

*

+(,(

! JCE+ JS

JCE=1

2E Tr S(tf)x(tf)x

T(tf)!" #$+ Tr Qx t( )xT t( ){ }dt

0

tf

% + Tr Ru t( )uT t( )!" #$dt0

tf

%&'(

)(

*+(

,(

JS=1

2E Tr S(tf)P tf( )!" #$+ Tr QP t( )!" #$dt

0

tf

%&'(

)(

*+(

,(

! Cost function has two parts! Certainty-equivalent cost! Stochastic increment cost


6/14

Expected Cost of the Trajectory

V* to( ) ! J* tf( ) =E ! x * (tf)"# $% + L x * (&),u* (&)[ ]

t0

tF

' d&()*

+*

,-*

.*

E !( ) =E !| I to,t1[ ]( )Pr I to,t1[ ]{ }+E !| I t1,tf"# $%( )Pr I t1,tf"# $%{ }=E E !| I( )"# $%

Law of total expectation

Optimized cost function

Because the past is established at t1

E J*( ) =E J* | I to,t1[ ]( ) 1[ ]+E J* |I t1,tf!" #$( )Pr I t1,tf!" #${ }=E J* |I to,t1[ ]( )+E J* | I t1,tf!" #$( )Pr I t1,tf!" #${ }

Expected Cost of the Trajectory

! For planningor post-trajectoryanalysis, one can assume that the

entire information set is available

! For real-time control, t1 #tf, andfuture information set can only bepredicted

Separation Property andCertainty Equivalence

Separation Property Optimal Control Law and Optimal Estimation Law can bederived separately Their derivations are strictly independent

Certainty Equivalence Property Separation property plus, ... The Stochastic Optimal Control Law and the Deterministic

Optimal Control Law are the same The Optimal Estimation Law can be derived separately

Linear-quadratic-Gaussian (LQG) control iscertainty-equivalent

Stochastic Linear-QuadraticOptimal Control


7/14

Stochastic Principle of OptimalityApplied to the Linear-Quadratic (LQ)

Problem

Linear dynamic constraint

V to( )=E ! x(tf)"# $% & L x('),u(')[ ]tf

to

( d')*+

,+

-.+

/+

=

1

2E x

T(t

f)S(t

f)x(t

f)& xT(t) uT(t)"#

$%

Q(t) M(t)

MT(t) R(t)

"

#

00

$

%

11

x(t)

u(t)

"

#00

$

%11

dttf

to

()*+

,+

-.+

/+

!x t( ) = F(t)x(t)+G(t)u(t)+ L(t)w(t)

Quadratic value function

Components of the LQ Value Function

Certainty-equivalent value function

V t( ) =1

2

xT(t)S(t)x(t) +v t( )

Quadratic value function has two parts

Stochastic value function increment

VCE t( ) !1

2xT(t)S(t)x(t)

v t( ) =1

2Tr S !( )L !( ) W !( )L !( )

T"#

$%d!t

tf

&

Value Function Gradient and Hessian

Certainty-equivalent value function

Gradient with respect to the state

VCE t( ) !1

2xT

(t)S(t)x(t)

!V

!xt( ) = xT(t)S(t)

Hessian with respect to the state

!2V

!x2 t( ) =S(t)

Linear-Quadratic StochasticHamilton-Jacobi-Bellman Equation


Certainty-equivalent plus stochastic terms

!V*

!t="min

u

1

2E x*

TQx*+2x*

TMu+ u

TRu( )+ x*T S Fx*+Gu( )+ Tr SLWLT( )#$ %&

="minu

1

2x*

TQx*+2x*

TMu+ u

TRu( )+ x*T S Fx*+Gu( )+ Tr SLWLT( )#$ %&

Terminal condition

V tf( ) =1

2x

T(tf)S(tf)x(tf)


8/14

Optimal Control LawDifferentiate right side of HJB equation w.r.t. uand

set equal to zero

! !V !t( )!u

= 0 = xTM + u

TR( ) + xTSG"# $%

u t( ) =!R!1 t( ) GT t( )S t( )+MT t( )"# $%x t( )

!!C t( )x t( )

Solve for u, obtaining feedback control law

LQ Optimal Control Law

u t( ) =!R!1 t( ) GT t( )S t( )+MT t( )"# $%x t( )

!!C t( )x t( )

Zero-mean, white-noise disturbance has no effecton thestructure and gains of the LQ feedback control law

Matrix Riccati Equation

! Substitute optimal control law in HJB equation

! Matrix Riccati equation provides S(t)!S t( ) = !Q(t)+M(t)R!1(t)MT(t)"# $%! F(t)!G(t)R

!1(t)MT(t)"# $%

T

S t( )

!S t( ) F(t)!G(t)R!1(t)MT(t)"# $%+S t( )G(t)R!1(t)GT(t)S t( ), S tf( )=&xx tf( )

1

2

xT !Sx +!v =1

2

xT !Q + MR!1MT( )! F ! GR!1MT( )T

S! S F ! GR!1MT( )+ SGR!1GTS"

#

$

%

x

+1

2Tr SLWL

T( ) u t( ) =!R!1

t( ) GT t( )S t( ) +MT t( )"# $%x t( )

! Stochastic value function increases cost due to disturbance! However, its calculation is independent of the Riccati equation

!v =1

2Tr SLWL

T( )

Evaluation of the Total Cost(Imperfect Measurements)

! Stochastic quadratic cost function, neglecting cross termsJ=

1

2Tr E xT(tf)S(tf)x(tf)!" #$ +E xT(t) uT(t)!" #$

Q(t) 0

0 R(t)

!

"%%

#

$&&

x(t)

u(t)

!

"%%

#

$&&dt

to

tf

'

(

)*

+*

,

-*

.*

=

1

2Tr S(tf)E x(tf)x

T(tf)!" #$ + Q(t)E x(t)xT(t)!" #$ + R(t)E u(t)u

T(t)!" #${ }dtto

tf

'

J =1

2Tr S(t

f)P(t

f) + Q(t)P(t)+R(t)U t( )!" #$dt

to

tf

%&'(

)(

*+(

,(

where P(t) ! E x(t)xT(t)!" #$U t( ) ! E u(t)uT(t)!" #$

or


9/14

Optimal Control Covariance

u t( ) =!C t( ) x t( )Optimal control vector

U t( ) =C t( )P t( )CT t( )

= R!1

t( )GT t( )S t( )P t( )S t( )G t( )R!1 t( )

Optimal control covariance

Revise Cost to Reflect State andAdjoint Covariance Dynamics

! Integration by parts

J=1

2Tr S(to )P to( )+ Q(t)P t( ) + R(t)U t( ) + !S(t)P(t)+ S(t) !P(t)!" #$dt

to

tf

%&'(

)(

*+(

,(

S(t)P(t)to

tf= !S(t)P(t)+S(t) !P(t)!" #$dt

to

tf

%

S(tf)P(tf) = S(to )P(to )+ !S(t)P(t)+S(t) !P(t)!" #$dt

to

tf

%

! Rewrite cost function to incorporate initial cost

Evolution of State and AdjointCovariance Matrices

(No Control)

! State covariance response to random disturbance

! Adjoint covariance response to terminal cost!P t( ) = F t( )P t( ) + P t( )FT t( ) + L t( )W t( )LT t( ), P t

o( ) given

!S t( ) =!FT t( )S t( ) ! S t( )F t( ) !Q t( ), S tf( ) given

u t( ) = 0; U t( ) = 0

Evolution of State and AdjointCovariance Matrices

(Optimal Control)

! State covariance response to random disturbance

! Adjoint covariance response to terminal cost

!P t( ) = F t( )!G t( )C t( )"# $%P t( )+ P t( ) F t( )!G t( )C t( )"# $%T

+L t( )W t( )LT t( )

!S t( ) =!FT t( )S t( ) ! S t( )F t( ) !Q t( ) ! S t( )G t( )R!1 t( )GT t( )S t( )

Dependent on S(t)

Independent of P(t)


10/14

! With nocontrolJ

no control = 1

2Tr S(t

o)P t

o( ) + S t( )L t( )W t( )LT t( )dt

to

tf

!"

#$$

%

&''

! Withoptimal control, the equation for the cost is the same

Joptimal control

=1

2Tr S(to )P to( )+ S t( )L t( )W t( )L

T t( )dtto

tf

!"

#$$

%

&''

! ... but evolutions of S(t)and S(to)are different in each case

Total Cost With andWithout Control

Next Time:

Linear-Quadratic-GaussianRegulators

Supplementalaterial

Neighboring-Optimal Control withUncertain Disturbance, Measurement,

and Initial Condition


11/14

Immune Response Example

! Optimal open-loop drug therapy (control)! Assumptions

! Initial condition known without error! No disturbance

! Optimal closed-loop therapy! Assumptions

! Small error in initial condition! Small disturbance! Perfect measurement of state

! Stochastic optimal closed-loop therapy! Assumptions

! Small error in initial condition! Small disturbance! Imperfect measurement! Certainty-equivalence applies to

perturbation control

Immune ResponseExample with Optimal

Feedback Control

Open-Loop Optimal Controlfor Lethal Initial Condition

Open- and Closed-LoopOptimal Control for 150%Lethal Initial Condition

Immune Response with Full-StateStochastic Optimal Feedback Control

(Random Disturbance and Measurement Error Not Simulated)

Low-Bandwidth Estimator

(|W| < |N|)

High-Bandwidth Estimator

(|W| > |N|)

! Initial control too sluggishto prevent divergence

! Quick initial controlprevents divergence

W= I4N= I2/ 20

Stochastic-Optimal Control (u1)with Two Measurements (x1, x3)

(w/Ghigliazza, 2004)


12/14

Immune Response to RandomDisturbance with Two-Measurement

Stochastic Neighboring-Optimal Control

Disturbance due to Re-infection

Sequestered pocketsof pathogen

Noisy measurements Closed-loop therapy is

robust

... but not robust enough: Organ death occurs in

one case

Probability of satisfactorytherapy can be

maximized by stochasticredesignof controller

Dual Control(Feldbaum, 1965)

! Nonlinear system! Uncertain system parameters to be estimated! Parameter estimation can be aided by test inputs

! Approach: Minimize value function with three increments! Nominal control! Cautious control! Probing control

minu

V* =

minu

V*nominal + V*cautious + V*probing( )

Estimation and control calculations are coupledand necessarily recursive

Adaptive Critic Controller

Nonlinear control law,c, takes the general form

On-line adaptive critic controller Nonlinear control law (action network) Criticizesnon-optimal performance via critic network

Adapts control gains to improve performance Adapts cost model to improve estimate

u t( ) =c x(t),a,y * t( )[ ]

x(t) : state

a: parameters of operating point

y *( t) : command input

Algebraic Initialization of Neural Networks(Ferrari and Stengel)

Initially, c[x, a, y*]is unknown Design PI-LQ controllers with integral compensation that

satisfy requirements at noperating points Scheduling variable, a

u t( ) =CF

a( )y *+CB

a( )!x + CI

a( ) !y t( )dt" #c x(t),a,y * t( )$% &'


13/14

Replace Gain Matrices by Neural Networks

Replace control gain matrices by sigmoidal neural networks

u t( ) =NNF

y * t( ),a t( )!" #$ +NNB x t( ),a t( )!" #$ +NNI %y t( )dt& ,a t( )!" #$ = c x(t),a,y * t( )!" #$

Initial Neural Control Law

Algebraic training of neural networksproduces exact fitoflinear control gains and trim conditions at noperating points Interpolation and gain scheduling via neural networks One node per operating point in each neural network

On-line Optimization of AdaptiveCritic Neural Network Controller

Critic adapts neural network weights toimprove performance usingapproximate dynamic programming

Heuristic Dynamic ProgrammingAdaptive Critic

Dual Heuristic Programming Adaptive Criticfor receding-horizonoptimization problem

Critic andAction (i.e.,Control)networks adapted concurrently LQ-PI cost functionapplied to nonlinear problem Modified resilient backpropagation for neural network training

V x tk

( )[ ] = L x tk( ),u tk( )[ ] + V x tk+1( )[ ]

!V

!u=!L

!u+!V

!x

!x

!u= 0

!V xa t( )[ ]

!xa t( )

= NNC x

a t( ),a t( )[ ]


14/14

Action Network On-line TrainingTrain action network, at timet, holding the critic parameters fixed

NNC

Aircraft ModelTransition MatricesState Prediction

Utility Function

Derivatives

NNA

xa(t)

a(t)

Optimality

Condition

NNA

Target

Target Generation

Critic Network On-line TrainingTrain critic network, at time t, holding the action parameters fixed

NNC(old)

Utility Function

Derivatives

NNA

NNC

Target

Target Generation

Aircraft ModelTransition MatricesState Prediction

NNC

Target Cost

Gradient

xa(t)

a(t)

Stochastic Control Princeton

Documents