8/13/2019 Stochastic Control Princeton
1/14
Stochastic Optimal ControlRobert Stengel
Optimal Control and Estimation, MAE 546Princeton University, 2013
Copyright 2013 by Robert Stengel. All rights reserved. For educatio nal use only.http://www.princeton.edu/~stengel/MAE546.html
http://www.princeton.edu/~stengel/OptConEst.html
! Nonlinear systems with randominputs and perfect measurements! Nonlinear systems with random
inputs and imperfect measurements
! Certainty equivalence and separation! Stochastic neighboring-optimal
control
! Linear-quadratic-Gaussian (LQG)control
Nonlinear Systems with RandomInputs and Perfect Measurements
Inputs and initial conditions are uncertain,but the state can be measured without error
!x t( ) = f x t( ),u t( ), w t( ),t!" #$
z t( ) =x t( )
E x 0( )!" #$ =x 0( )
E x 0( )%x 0( )!" #$ x 0( )%x 0( )!" #$T{ }=0
Assume that random disturbance effectsare small and additive
!x t( ) = f x t( ), u t( ),t!" #$ + L t( )w t( )
E w t( )!" #$ = 0
E w t( )wT %( )!" #$ =W t( )& t' %( )
Cost Must Be an Expected Value Deterministic cost function cannot be minimized because
disturbance effect on state cannot be predicted state and control are random variables
minu(t)
J = ! x(tf)"# $% + L x(t),u(t)[ ]to
tf
& dt
minu(t)
J = E ! x(tf)"# $% + L x(t),u(t)[ ]to
tf
& dt'()
*)
+,)
-)
However, the expected value of a deterministic costfunctioncan be minimized
Stochastic Euler-LagrangeEquations?
There is no single optimal trajectory
Expected values of Euler-Lagrange necessaryconditions may not be well defined
1 E (tf)"# $% = E &'[x(tf)]&x()*
+,-
T
2 E ! (t)"# $% = &E
'H[x(t),u(t),(t),t]'x
()*
+,-
T
3) E!H[x(t),u(t),(t),t]
!u
#$%
&'(= 0
8/13/2019 Stochastic Control Princeton
2/14
Stochastic Value Function fora Nonlinear System
However, a Hamilton-Jacobi-Bellman (HJB) based onexpectations can be solved
Base the optimization on the Principle of Optimality Optimal expected value function at t1
V* t1( ) =E ! x*(tf)"# $% & L x*('),u*(')[ ]
tf
t1
( d')*+
,+
-.+
/+
=minu
E ! x*(tf)"# $% & L x*('),u(')[ ]tf
t1
( d')*+
,+
-.+
/+
Rate of Change of the Value Function
dV*dt
t= t1
=!E L x *(t1),u *(t1)[ ]{ }
x(t)and u(t)can be known precisely; therefore
Total time-derivative of V*
dV*
dtt= t1
=!L x *(t1),u *(t
1)[ ]
Incremental Changein the Value Function
Apply chain rule to total derivativedV
*
dt= E !V
*
!t+ !V
*
!x!x"
#$%&'
!V*=dV*
dt!t= E
"V*"t
!t+"V*"x
!x!t+1
2!xT" 2V*"x2
!x#$%
&'(!t2 +"
)
*+
,
-.
= E "V*
"t!t+
"V*"x
f .( )+Lw .( )( )!t+1
2f .( )+Lw .( )( )
T" 2V*"x2
f .( )+Lw .( )( )#$%
&'(!t2 +"
)
*+
,
-.
Incremental change in value function, Expand to second degree
!V
Cancel !t
Introduction of the Trace
Trace of a matrix productis scalar
Tr ABC( ) = Tr CAB( ) = Tr BCA( )
Tr xTQx( ) = Tr xxTQ( ) = Tr QxxT( ) dim Tr ( )!" #$ = 1%1
dV*
dt! E
"V*"t
+"V*"x
f .( )+Lw .( )( )+1
2Tr f .( )+Lw .( )( )
T" 2V*"x2
f .( )+Lw .( )( )#$%
&'()t
*
+,
-
./
= E "V*
"t+
"V*"x
f .( )+Lw .( )( )+1
2Tr
"2V*"x2
f .( )+Lw .( )( ) f .( )+Lw .( )( )T#
$%&'()t
*
+,
-
./
8/13/2019 Stochastic Control Princeton
3/14
Toward the StochasticHJB Equation
dV*
dt= E
!V*!t
+!V*!x
f .( )+Lw .( )( )+1
2Tr
!2V*!x2
f .( )+Lw .( )( ) f .( )+Lw .( )( )T"
#$%&'(t
)
*+
,
-.
=!V*!t
+!V*!x
f .( )+ E !V*
!xLw .( )+
1
2Tr
!2V*!x2
f .( )+Lw .( )( ) f .( )+Lw .( )( )T"
#$%&'(t
)
*+
,
-.
Becausex(t)and u(t)can be measured,
they can be taken outside the expectation
Toward the Stochastic HJB Equation
E w t( )!" #$ = 0
E w t( )wT %( )!" #$ =W t( )& t' %( )
dV*
dt=!V*
!t+!V*
!xf .( )+
1
2lim"t#0
Tr !2V*
!x2 E f .( )f .( )
T( )"t+ LE w .( )w .( )T( )LT$% &'"t()*
+,-
=!V*
!tt( )+
!V*
!xt( )f .( )+
1
2Tr
!2V*
!x2 t( )L t( )W t( )L t( )
T$
%.
&
'/
Uncertain disturbance input can onlyincrease the value function rate of change
Disturbance is assumed to be zero-mean white noise
Stochastic Principle ofOptimality
(Perfect Measurements)
!V*
!tt( ) =
"minu
E L x* t( ),u t( ),t#$ %& +!V*
!xt( ) f x* t( ),u t( ),t#$ %&+
1
2Tr
!2V*
!x2 t( )L t( )W t( )L t( )
T#
$'
%
&(
)*+,
-./,
Boundary (terminal) condition : V* tf( ) =E 0 tf( )#$ %&
dV*
dt= !V*
!tt( )+ !V*
!xt( ) f .( ) + 1
2Tr !
2
V*
!x2 t( )L t( )W t( )L t( )T"
#$ %
&'
! Substitute for total derivative, dV*/dt= L(x*,u*)! Solve for the partial derivative, "V*/"t! Stochastic HJB Equation
Observations of StochasticPrinciple of Optimality
(Perfect Measurements)
!V*
!tt( ) =
"minu
E L x* t( ),u t( ),t#$ %& +!V*
!xt( )f x* t( ),u t( ),t#$ %& +
1
2Tr
!2V*
!x2 t( )L t( )W t( )L t( )
T#
$'
%
&(
)*+,
-./,
! Control has no effect on the disturbance input! Criterion for optimality is the sameas for the
deterministic case
! Disturbance uncertaintyincreases the magnitudeof the total optimal value function, V*(0)
8/13/2019 Stochastic Control Princeton
4/14
Information Sets andExpected Cost
! Sigma algebra(Wikipedia definitions)! Thecollection of setsover which a measure is defined! The collection of eventsthat can be assigned probabilities! A measurable space
! Information available at current time, t1! All measurements from initial time, to! All control commands from initial time
I to, t
1[ ] = z to,t1[ ],u to,t1[ ]{ }
The Information Set,I
! Plus available model structure, parameters, and statisticsI t
o,t
1[ ] = z to,t1[ ],u to,t1[ ], f ( ),Q,R,!{ }
! Measurements may be directly useful, e.g.,! Displays! Simple feedback control
! ... or they may require processing, e.g.,! Transformation! Estimation
! Example of a derived information set! History of mean and covariance from a state estimator
ID t
o,t
1[ ] = x to,t1[ ],P to,t1[ ],u to,t1[ ]{ }
A Derived Information Set, ID
!Markov derivedinformation set
! Most current mean and covariance froma state estimator
IMD
t1( ) = x t1( ),P t1( ),u t1( ){ }
Additional DerivedInformation Sets
! Multiple model derivedinformation set! Parallel estimates of current mean, covariance,
and hypothesis probability mass function
IMM
t1( ) = xA t1( ),PA t1( ),u t1( ),Pr HA( )!" #$, xB t1( ),PB t1( ),u t1( ),Pr HB( )!" #$,!{ }
8/13/2019 Stochastic Control Princeton
5/14
! Optimal control requires propagation of information backfrom the final time
! Hence, it requires the entire information set, extending from tototf
Required and Available InformationSets for Optimal Control
I to,tf!" #$
! Separate information set into knowableand predictablepartsI to, tf!" #$ = I to ,t1[ ]+ I t1, tf!" #$
! Knowable information has been received! Predictable information is to come
Expected Values ofState and Control
Expected values of the state and control are
conditioned on the information set
E x t( )| ID
!" #$ = x t( )
E x t( )% x t( )!" #$ x t( )% x t( )!" #$T
| ID{ } =P t( )
... where the conditional expected values areestimates from an optimal filter
Dependence of the Stochastic CostFunction on the Information Set
Expand the state covariance
J=1
2E E Tr S(tf)x(tf)x
T(tf)!" #$| ID!"
#$ + E Tr Qx t( )x
Tt( )!" #${ }dt
0
tf
% + E Tr Ru t( )uT
t( )!" #${ }dt0
tf
%&'(
)(
*+(
,(
P t( ) = E x t( )! x t( )"# $% x t( )! x t( )"# $%T
| ID{ }
= E x t( )xT t( )! x t( )xT t( )! x t( )xT t( )+ x t( )xT t( )"# $%| ID{ }
E x t( )xT t( )!" #$| ID{ } = E x t( )xT t( )!" #$| ID{ } = x t( )xT t( )
P t( ) = E x t( )xT t( )!" #$ | ID{ }% x t( )xT t( )or
E x t( )xT t( )!" #$ | ID{ } = P t( )+ x t( )xT t( )
... where the conditionalexpected values are
obtained from an optimalfilter
Certainty-Equivalent andStochastic Incremental Costs
J= 1
2E Tr S(tf) P tf( )+ x(tf)x
T(tf)!" #${ }+ Tr Q P t( )+ x t( )xT t( )!" #${ }dt
0
tf
% + Tr Ru t( )uT t( )!" #$dt
0
tf
%&
'()(
*
+(,(
! JCE+ JS
JCE=1
2E Tr S(tf)x(tf)x
T(tf)!" #$+ Tr Qx t( )xT t( ){ }dt
0
tf
% + Tr Ru t( )uT t( )!" #$dt0
tf
%&'(
)(
*+(
,(
JS=1
2E Tr S(tf)P tf( )!" #$+ Tr QP t( )!" #$dt
0
tf
%&'(
)(
*+(
,(
! Cost function has two parts! Certainty-equivalent cost! Stochastic increment cost
8/13/2019 Stochastic Control Princeton
6/14
Expected Cost of the Trajectory
V* to( ) ! J* tf( ) =E ! x * (tf)"# $% + L x * (&),u* (&)[ ]
t0
tF
' d&()*
+*
,-*
.*
E !( ) =E !| I to,t1[ ]( )Pr I to,t1[ ]{ }+E !| I t1,tf"# $%( )Pr I t1,tf"# $%{ }=E E !| I( )"# $%
Law of total expectation
Optimized cost function
Because the past is established at t1
E J*( ) =E J* | I to,t1[ ]( ) 1[ ]+E J* |I t1,tf!" #$( )Pr I t1,tf!" #${ }=E J* |I to,t1[ ]( )+E J* | I t1,tf!" #$( )Pr I t1,tf!" #${ }
Expected Cost of the Trajectory
! For planningor post-trajectoryanalysis, one can assume that the
entire information set is available
! For real-time control, t1 #tf, andfuture information set can only bepredicted
Separation Property andCertainty Equivalence
Separation Property Optimal Control Law and Optimal Estimation Law can bederived separately Their derivations are strictly independent
Certainty Equivalence Property Separation property plus, ... The Stochastic Optimal Control Law and the Deterministic
Optimal Control Law are the same The Optimal Estimation Law can be derived separately
Linear-quadratic-Gaussian (LQG) control iscertainty-equivalent
Stochastic Linear-QuadraticOptimal Control
8/13/2019 Stochastic Control Princeton
7/14
Stochastic Principle of OptimalityApplied to the Linear-Quadratic (LQ)
Problem
Linear dynamic constraint
V to( )=E ! x(tf)"# $% & L x('),u(')[ ]tf
to
( d')*+
,+
-.+
/+
=
1
2E x
T(t
f)S(t
f)x(t
f)& xT(t) uT(t)"#
$%
Q(t) M(t)
MT(t) R(t)
"
#
00
$
%
11
x(t)
u(t)
"
#00
$
%11
dttf
to
()*+
,+
-.+
/+
!x t( ) = F(t)x(t)+G(t)u(t)+ L(t)w(t)
Quadratic value function
Components of the LQ Value Function
Certainty-equivalent value function
V t( ) =1
2
xT(t)S(t)x(t) +v t( )
Quadratic value function has two parts
Stochastic value function increment
VCE t( ) !1
2xT(t)S(t)x(t)
v t( ) =1
2Tr S !( )L !( ) W !( )L !( )
T"#
$%d!t
tf
&
Value Function Gradient and Hessian
Certainty-equivalent value function
Gradient with respect to the state
VCE t( ) !1
2xT
(t)S(t)x(t)
!V
!xt( ) = xT(t)S(t)
Hessian with respect to the state
!2V
!x2 t( ) =S(t)
Linear-Quadratic StochasticHamilton-Jacobi-Bellman Equation
(Perfect Measurements)
Certainty-equivalent plus stochastic terms
!V*
!t="min
u
1
2E x*
TQx*+2x*
TMu+ u
TRu( )+ x*T S Fx*+Gu( )+ Tr SLWLT( )#$ %&
="minu
1
2x*
TQx*+2x*
TMu+ u
TRu( )+ x*T S Fx*+Gu( )+ Tr SLWLT( )#$ %&
Terminal condition
V tf( ) =1
2x
T(tf)S(tf)x(tf)
8/13/2019 Stochastic Control Princeton
8/14
Optimal Control LawDifferentiate right side of HJB equation w.r.t. uand
set equal to zero
! !V !t( )!u
= 0 = xTM + u
TR( ) + xTSG"# $%
u t( ) =!R!1 t( ) GT t( )S t( )+MT t( )"# $%x t( )
!!C t( )x t( )
Solve for u, obtaining feedback control law
LQ Optimal Control Law
u t( ) =!R!1 t( ) GT t( )S t( )+MT t( )"# $%x t( )
!!C t( )x t( )
Zero-mean, white-noise disturbance has no effecton thestructure and gains of the LQ feedback control law
Matrix Riccati Equation
! Substitute optimal control law in HJB equation
! Matrix Riccati equation provides S(t)!S t( ) = !Q(t)+M(t)R!1(t)MT(t)"# $%! F(t)!G(t)R
!1(t)MT(t)"# $%
T
S t( )
!S t( ) F(t)!G(t)R!1(t)MT(t)"# $%+S t( )G(t)R!1(t)GT(t)S t( ), S tf( )=&xx tf( )
1
2
xT !Sx +!v =1
2
xT !Q + MR!1MT( )! F ! GR!1MT( )T
S! S F ! GR!1MT( )+ SGR!1GTS"
#
$
%
x
+1
2Tr SLWL
T( ) u t( ) =!R!1
t( ) GT t( )S t( ) +MT t( )"# $%x t( )
! Stochastic value function increases cost due to disturbance! However, its calculation is independent of the Riccati equation
!v =1
2Tr SLWL
T( )
Evaluation of the Total Cost(Imperfect Measurements)
! Stochastic quadratic cost function, neglecting cross termsJ=
1
2Tr E xT(tf)S(tf)x(tf)!" #$ +E xT(t) uT(t)!" #$
Q(t) 0
0 R(t)
!
"%%
#
$&&
x(t)
u(t)
!
"%%
#
$&&dt
to
tf
'
(
)*
+*
,
-*
.*
=
1
2Tr S(tf)E x(tf)x
T(tf)!" #$ + Q(t)E x(t)xT(t)!" #$ + R(t)E u(t)u
T(t)!" #${ }dtto
tf
'
J =1
2Tr S(t
f)P(t
f) + Q(t)P(t)+R(t)U t( )!" #$dt
to
tf
%&'(
)(
*+(
,(
where P(t) ! E x(t)xT(t)!" #$U t( ) ! E u(t)uT(t)!" #$
or
8/13/2019 Stochastic Control Princeton
9/14
Optimal Control Covariance
u t( ) =!C t( ) x t( )Optimal control vector
U t( ) =C t( )P t( )CT t( )
= R!1
t( )GT t( )S t( )P t( )S t( )G t( )R!1 t( )
Optimal control covariance
Revise Cost to Reflect State andAdjoint Covariance Dynamics
! Integration by parts
J=1
2Tr S(to )P to( )+ Q(t)P t( ) + R(t)U t( ) + !S(t)P(t)+ S(t) !P(t)!" #$dt
to
tf
%&'(
)(
*+(
,(
S(t)P(t)to
tf= !S(t)P(t)+S(t) !P(t)!" #$dt
to
tf
%
S(tf)P(tf) = S(to )P(to )+ !S(t)P(t)+S(t) !P(t)!" #$dt
to
tf
%
! Rewrite cost function to incorporate initial cost
Evolution of State and AdjointCovariance Matrices
(No Control)
! State covariance response to random disturbance
! Adjoint covariance response to terminal cost!P t( ) = F t( )P t( ) + P t( )FT t( ) + L t( )W t( )LT t( ), P t
o( ) given
!S t( ) =!FT t( )S t( ) ! S t( )F t( ) !Q t( ), S tf( ) given
u t( ) = 0; U t( ) = 0
Evolution of State and AdjointCovariance Matrices
(Optimal Control)
! State covariance response to random disturbance
! Adjoint covariance response to terminal cost
!P t( ) = F t( )!G t( )C t( )"# $%P t( )+ P t( ) F t( )!G t( )C t( )"# $%T
+L t( )W t( )LT t( )
!S t( ) =!FT t( )S t( ) ! S t( )F t( ) !Q t( ) ! S t( )G t( )R!1 t( )GT t( )S t( )
Dependent on S(t)
Independent of P(t)
8/13/2019 Stochastic Control Princeton
10/14
! With nocontrolJ
no control = 1
2Tr S(t
o)P t
o( ) + S t( )L t( )W t( )LT t( )dt
to
tf
!"
#$$
%
&''
! Withoptimal control, the equation for the cost is the same
Joptimal control
=1
2Tr S(to )P to( )+ S t( )L t( )W t( )L
T t( )dtto
tf
!"
#$$
%
&''
! ... but evolutions of S(t)and S(to)are different in each case
Total Cost With andWithout Control
Next Time:
Linear-Quadratic-GaussianRegulators
Supplementalaterial
Neighboring-Optimal Control withUncertain Disturbance, Measurement,
and Initial Condition
8/13/2019 Stochastic Control Princeton
11/14
Immune Response Example
! Optimal open-loop drug therapy (control)! Assumptions
! Initial condition known without error! No disturbance
! Optimal closed-loop therapy! Assumptions
! Small error in initial condition! Small disturbance! Perfect measurement of state
! Stochastic optimal closed-loop therapy! Assumptions
! Small error in initial condition! Small disturbance! Imperfect measurement! Certainty-equivalence applies to
perturbation control
Immune ResponseExample with Optimal
Feedback Control
Open-Loop Optimal Controlfor Lethal Initial Condition
Open- and Closed-LoopOptimal Control for 150%Lethal Initial Condition
Immune Response with Full-StateStochastic Optimal Feedback Control
(Random Disturbance and Measurement Error Not Simulated)
Low-Bandwidth Estimator
(|W| < |N|)
High-Bandwidth Estimator
(|W| > |N|)
! Initial control too sluggishto prevent divergence
! Quick initial controlprevents divergence
W= I4N= I2/ 20
Stochastic-Optimal Control (u1)with Two Measurements (x1, x3)
(w/Ghigliazza, 2004)
8/13/2019 Stochastic Control Princeton
12/14
Immune Response to RandomDisturbance with Two-Measurement
Stochastic Neighboring-Optimal Control
Disturbance due to Re-infection
Sequestered pocketsof pathogen
Noisy measurements Closed-loop therapy is
robust
... but not robust enough: Organ death occurs in
one case
Probability of satisfactorytherapy can be
maximized by stochasticredesignof controller
Dual Control(Feldbaum, 1965)
! Nonlinear system! Uncertain system parameters to be estimated! Parameter estimation can be aided by test inputs
! Approach: Minimize value function with three increments! Nominal control! Cautious control! Probing control
minu
V* =
minu
V*nominal + V*cautious + V*probing( )
Estimation and control calculations are coupledand necessarily recursive
Adaptive Critic Controller
Nonlinear control law,c, takes the general form
On-line adaptive critic controller Nonlinear control law (action network) Criticizesnon-optimal performance via critic network
Adapts control gains to improve performance Adapts cost model to improve estimate
u t( ) =c x(t),a,y * t( )[ ]
x(t) : state
a: parameters of operating point
y *( t) : command input
Algebraic Initialization of Neural Networks(Ferrari and Stengel)
Initially, c[x, a, y*]is unknown Design PI-LQ controllers with integral compensation that
satisfy requirements at noperating points Scheduling variable, a
u t( ) =CF
a( )y *+CB
a( )!x + CI
a( ) !y t( )dt" #c x(t),a,y * t( )$% &'
8/13/2019 Stochastic Control Princeton
13/14
Replace Gain Matrices by Neural Networks
Replace control gain matrices by sigmoidal neural networks
u t( ) =NNF
y * t( ),a t( )!" #$ +NNB x t( ),a t( )!" #$ +NNI %y t( )dt& ,a t( )!" #$ = c x(t),a,y * t( )!" #$
Initial Neural Control Law
Algebraic training of neural networksproduces exact fitoflinear control gains and trim conditions at noperating points Interpolation and gain scheduling via neural networks One node per operating point in each neural network
On-line Optimization of AdaptiveCritic Neural Network Controller
Critic adapts neural network weights toimprove performance usingapproximate dynamic programming
Heuristic Dynamic ProgrammingAdaptive Critic
Dual Heuristic Programming Adaptive Criticfor receding-horizonoptimization problem
Critic andAction (i.e.,Control)networks adapted concurrently LQ-PI cost functionapplied to nonlinear problem Modified resilient backpropagation for neural network training
V x tk
( )[ ] = L x tk( ),u tk( )[ ] + V x tk+1( )[ ]
!V
!u=!L
!u+!V
!x
!x
!u= 0
!V xa t( )[ ]
!xa t( )
= NNC x
a t( ),a t( )[ ]
8/13/2019 Stochastic Control Princeton
14/14
Action Network On-line TrainingTrain action network, at timet, holding the critic parameters fixed
NNC
Aircraft ModelTransition MatricesState Prediction
Utility Function
Derivatives
NNA
xa(t)
a(t)
Optimality
Condition
NNA
Target
Target Generation
Critic Network On-line TrainingTrain critic network, at time t, holding the action parameters fixed
NNC(old)
Utility Function
Derivatives
NNA
NNC
Target
Target Generation
Aircraft ModelTransition MatricesState Prediction
NNC
Target Cost
Gradient
xa(t)
a(t)