Top Banner

of 14

Stochastic Control Princeton

Jun 04, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/13/2019 Stochastic Control Princeton

    1/14

    Stochastic Optimal ControlRobert Stengel

    Optimal Control and Estimation, MAE 546Princeton University, 2013

    Copyright 2013 by Robert Stengel. All rights reserved. For educatio nal use only.http://www.princeton.edu/~stengel/MAE546.html

    http://www.princeton.edu/~stengel/OptConEst.html

    ! Nonlinear systems with randominputs and perfect measurements! Nonlinear systems with random

    inputs and imperfect measurements

    ! Certainty equivalence and separation! Stochastic neighboring-optimal

    control

    ! Linear-quadratic-Gaussian (LQG)control

    Nonlinear Systems with RandomInputs and Perfect Measurements

    Inputs and initial conditions are uncertain,but the state can be measured without error

    !x t( ) = f x t( ),u t( ), w t( ),t!" #$

    z t( ) =x t( )

    E x 0( )!" #$ =x 0( )

    E x 0( )%x 0( )!" #$ x 0( )%x 0( )!" #$T{ }=0

    Assume that random disturbance effectsare small and additive

    !x t( ) = f x t( ), u t( ),t!" #$ + L t( )w t( )

    E w t( )!" #$ = 0

    E w t( )wT %( )!" #$ =W t( )& t' %( )

    Cost Must Be an Expected Value Deterministic cost function cannot be minimized because

    disturbance effect on state cannot be predicted state and control are random variables

    minu(t)

    J = ! x(tf)"# $% + L x(t),u(t)[ ]to

    tf

    & dt

    minu(t)

    J = E ! x(tf)"# $% + L x(t),u(t)[ ]to

    tf

    & dt'()

    *)

    +,)

    -)

    However, the expected value of a deterministic costfunctioncan be minimized

    Stochastic Euler-LagrangeEquations?

    There is no single optimal trajectory

    Expected values of Euler-Lagrange necessaryconditions may not be well defined

    1 E (tf)"# $% = E &'[x(tf)]&x()*

    +,-

    T

    2 E ! (t)"# $% = &E

    'H[x(t),u(t),(t),t]'x

    ()*

    +,-

    T

    3) E!H[x(t),u(t),(t),t]

    !u

    #$%

    &'(= 0

  • 8/13/2019 Stochastic Control Princeton

    2/14

    Stochastic Value Function fora Nonlinear System

    However, a Hamilton-Jacobi-Bellman (HJB) based onexpectations can be solved

    Base the optimization on the Principle of Optimality Optimal expected value function at t1

    V* t1( ) =E ! x*(tf)"# $% & L x*('),u*(')[ ]

    tf

    t1

    ( d')*+

    ,+

    -.+

    /+

    =minu

    E ! x*(tf)"# $% & L x*('),u(')[ ]tf

    t1

    ( d')*+

    ,+

    -.+

    /+

    Rate of Change of the Value Function

    dV*dt

    t= t1

    =!E L x *(t1),u *(t1)[ ]{ }

    x(t)and u(t)can be known precisely; therefore

    Total time-derivative of V*

    dV*

    dtt= t1

    =!L x *(t1),u *(t

    1)[ ]

    Incremental Changein the Value Function

    Apply chain rule to total derivativedV

    *

    dt= E !V

    *

    !t+ !V

    *

    !x!x"

    #$%&'

    !V*=dV*

    dt!t= E

    "V*"t

    !t+"V*"x

    !x!t+1

    2!xT" 2V*"x2

    !x#$%

    &'(!t2 +"

    )

    *+

    ,

    -.

    = E "V*

    "t!t+

    "V*"x

    f .( )+Lw .( )( )!t+1

    2f .( )+Lw .( )( )

    T" 2V*"x2

    f .( )+Lw .( )( )#$%

    &'(!t2 +"

    )

    *+

    ,

    -.

    Incremental change in value function, Expand to second degree

    !V

    Cancel !t

    Introduction of the Trace

    Trace of a matrix productis scalar

    Tr ABC( ) = Tr CAB( ) = Tr BCA( )

    Tr xTQx( ) = Tr xxTQ( ) = Tr QxxT( ) dim Tr ( )!" #$ = 1%1

    dV*

    dt! E

    "V*"t

    +"V*"x

    f .( )+Lw .( )( )+1

    2Tr f .( )+Lw .( )( )

    T" 2V*"x2

    f .( )+Lw .( )( )#$%

    &'()t

    *

    +,

    -

    ./

    = E "V*

    "t+

    "V*"x

    f .( )+Lw .( )( )+1

    2Tr

    "2V*"x2

    f .( )+Lw .( )( ) f .( )+Lw .( )( )T#

    $%&'()t

    *

    +,

    -

    ./

  • 8/13/2019 Stochastic Control Princeton

    3/14

    Toward the StochasticHJB Equation

    dV*

    dt= E

    !V*!t

    +!V*!x

    f .( )+Lw .( )( )+1

    2Tr

    !2V*!x2

    f .( )+Lw .( )( ) f .( )+Lw .( )( )T"

    #$%&'(t

    )

    *+

    ,

    -.

    =!V*!t

    +!V*!x

    f .( )+ E !V*

    !xLw .( )+

    1

    2Tr

    !2V*!x2

    f .( )+Lw .( )( ) f .( )+Lw .( )( )T"

    #$%&'(t

    )

    *+

    ,

    -.

    Becausex(t)and u(t)can be measured,

    they can be taken outside the expectation

    Toward the Stochastic HJB Equation

    E w t( )!" #$ = 0

    E w t( )wT %( )!" #$ =W t( )& t' %( )

    dV*

    dt=!V*

    !t+!V*

    !xf .( )+

    1

    2lim"t#0

    Tr !2V*

    !x2 E f .( )f .( )

    T( )"t+ LE w .( )w .( )T( )LT$% &'"t()*

    +,-

    =!V*

    !tt( )+

    !V*

    !xt( )f .( )+

    1

    2Tr

    !2V*

    !x2 t( )L t( )W t( )L t( )

    T$

    %.

    &

    '/

    Uncertain disturbance input can onlyincrease the value function rate of change

    Disturbance is assumed to be zero-mean white noise

    Stochastic Principle ofOptimality

    (Perfect Measurements)

    !V*

    !tt( ) =

    "minu

    E L x* t( ),u t( ),t#$ %& +!V*

    !xt( ) f x* t( ),u t( ),t#$ %&+

    1

    2Tr

    !2V*

    !x2 t( )L t( )W t( )L t( )

    T#

    $'

    %

    &(

    )*+,

    -./,

    Boundary (terminal) condition : V* tf( ) =E 0 tf( )#$ %&

    dV*

    dt= !V*

    !tt( )+ !V*

    !xt( ) f .( ) + 1

    2Tr !

    2

    V*

    !x2 t( )L t( )W t( )L t( )T"

    #$ %

    &'

    ! Substitute for total derivative, dV*/dt= L(x*,u*)! Solve for the partial derivative, "V*/"t! Stochastic HJB Equation

    Observations of StochasticPrinciple of Optimality

    (Perfect Measurements)

    !V*

    !tt( ) =

    "minu

    E L x* t( ),u t( ),t#$ %& +!V*

    !xt( )f x* t( ),u t( ),t#$ %& +

    1

    2Tr

    !2V*

    !x2 t( )L t( )W t( )L t( )

    T#

    $'

    %

    &(

    )*+,

    -./,

    ! Control has no effect on the disturbance input! Criterion for optimality is the sameas for the

    deterministic case

    ! Disturbance uncertaintyincreases the magnitudeof the total optimal value function, V*(0)

  • 8/13/2019 Stochastic Control Princeton

    4/14

    Information Sets andExpected Cost

    ! Sigma algebra(Wikipedia definitions)! Thecollection of setsover which a measure is defined! The collection of eventsthat can be assigned probabilities! A measurable space

    ! Information available at current time, t1! All measurements from initial time, to! All control commands from initial time

    I to, t

    1[ ] = z to,t1[ ],u to,t1[ ]{ }

    The Information Set,I

    ! Plus available model structure, parameters, and statisticsI t

    o,t

    1[ ] = z to,t1[ ],u to,t1[ ], f ( ),Q,R,!{ }

    ! Measurements may be directly useful, e.g.,! Displays! Simple feedback control

    ! ... or they may require processing, e.g.,! Transformation! Estimation

    ! Example of a derived information set! History of mean and covariance from a state estimator

    ID t

    o,t

    1[ ] = x to,t1[ ],P to,t1[ ],u to,t1[ ]{ }

    A Derived Information Set, ID

    !Markov derivedinformation set

    ! Most current mean and covariance froma state estimator

    IMD

    t1( ) = x t1( ),P t1( ),u t1( ){ }

    Additional DerivedInformation Sets

    ! Multiple model derivedinformation set! Parallel estimates of current mean, covariance,

    and hypothesis probability mass function

    IMM

    t1( ) = xA t1( ),PA t1( ),u t1( ),Pr HA( )!" #$, xB t1( ),PB t1( ),u t1( ),Pr HB( )!" #$,!{ }

  • 8/13/2019 Stochastic Control Princeton

    5/14

    ! Optimal control requires propagation of information backfrom the final time

    ! Hence, it requires the entire information set, extending from tototf

    Required and Available InformationSets for Optimal Control

    I to,tf!" #$

    ! Separate information set into knowableand predictablepartsI to, tf!" #$ = I to ,t1[ ]+ I t1, tf!" #$

    ! Knowable information has been received! Predictable information is to come

    Expected Values ofState and Control

    Expected values of the state and control are

    conditioned on the information set

    E x t( )| ID

    !" #$ = x t( )

    E x t( )% x t( )!" #$ x t( )% x t( )!" #$T

    | ID{ } =P t( )

    ... where the conditional expected values areestimates from an optimal filter

    Dependence of the Stochastic CostFunction on the Information Set

    Expand the state covariance

    J=1

    2E E Tr S(tf)x(tf)x

    T(tf)!" #$| ID!"

    #$ + E Tr Qx t( )x

    Tt( )!" #${ }dt

    0

    tf

    % + E Tr Ru t( )uT

    t( )!" #${ }dt0

    tf

    %&'(

    )(

    *+(

    ,(

    P t( ) = E x t( )! x t( )"# $% x t( )! x t( )"# $%T

    | ID{ }

    = E x t( )xT t( )! x t( )xT t( )! x t( )xT t( )+ x t( )xT t( )"# $%| ID{ }

    E x t( )xT t( )!" #$| ID{ } = E x t( )xT t( )!" #$| ID{ } = x t( )xT t( )

    P t( ) = E x t( )xT t( )!" #$ | ID{ }% x t( )xT t( )or

    E x t( )xT t( )!" #$ | ID{ } = P t( )+ x t( )xT t( )

    ... where the conditionalexpected values are

    obtained from an optimalfilter

    Certainty-Equivalent andStochastic Incremental Costs

    J= 1

    2E Tr S(tf) P tf( )+ x(tf)x

    T(tf)!" #${ }+ Tr Q P t( )+ x t( )xT t( )!" #${ }dt

    0

    tf

    % + Tr Ru t( )uT t( )!" #$dt

    0

    tf

    %&

    '()(

    *

    +(,(

    ! JCE+ JS

    JCE=1

    2E Tr S(tf)x(tf)x

    T(tf)!" #$+ Tr Qx t( )xT t( ){ }dt

    0

    tf

    % + Tr Ru t( )uT t( )!" #$dt0

    tf

    %&'(

    )(

    *+(

    ,(

    JS=1

    2E Tr S(tf)P tf( )!" #$+ Tr QP t( )!" #$dt

    0

    tf

    %&'(

    )(

    *+(

    ,(

    ! Cost function has two parts! Certainty-equivalent cost! Stochastic increment cost

  • 8/13/2019 Stochastic Control Princeton

    6/14

    Expected Cost of the Trajectory

    V* to( ) ! J* tf( ) =E ! x * (tf)"# $% + L x * (&),u* (&)[ ]

    t0

    tF

    ' d&()*

    +*

    ,-*

    .*

    E !( ) =E !| I to,t1[ ]( )Pr I to,t1[ ]{ }+E !| I t1,tf"# $%( )Pr I t1,tf"# $%{ }=E E !| I( )"# $%

    Law of total expectation

    Optimized cost function

    Because the past is established at t1

    E J*( ) =E J* | I to,t1[ ]( ) 1[ ]+E J* |I t1,tf!" #$( )Pr I t1,tf!" #${ }=E J* |I to,t1[ ]( )+E J* | I t1,tf!" #$( )Pr I t1,tf!" #${ }

    Expected Cost of the Trajectory

    ! For planningor post-trajectoryanalysis, one can assume that the

    entire information set is available

    ! For real-time control, t1 #tf, andfuture information set can only bepredicted

    Separation Property andCertainty Equivalence

    Separation Property Optimal Control Law and Optimal Estimation Law can bederived separately Their derivations are strictly independent

    Certainty Equivalence Property Separation property plus, ... The Stochastic Optimal Control Law and the Deterministic

    Optimal Control Law are the same The Optimal Estimation Law can be derived separately

    Linear-quadratic-Gaussian (LQG) control iscertainty-equivalent

    Stochastic Linear-QuadraticOptimal Control

  • 8/13/2019 Stochastic Control Princeton

    7/14

    Stochastic Principle of OptimalityApplied to the Linear-Quadratic (LQ)

    Problem

    Linear dynamic constraint

    V to( )=E ! x(tf)"# $% & L x('),u(')[ ]tf

    to

    ( d')*+

    ,+

    -.+

    /+

    =

    1

    2E x

    T(t

    f)S(t

    f)x(t

    f)& xT(t) uT(t)"#

    $%

    Q(t) M(t)

    MT(t) R(t)

    "

    #

    00

    $

    %

    11

    x(t)

    u(t)

    "

    #00

    $

    %11

    dttf

    to

    ()*+

    ,+

    -.+

    /+

    !x t( ) = F(t)x(t)+G(t)u(t)+ L(t)w(t)

    Quadratic value function

    Components of the LQ Value Function

    Certainty-equivalent value function

    V t( ) =1

    2

    xT(t)S(t)x(t) +v t( )

    Quadratic value function has two parts

    Stochastic value function increment

    VCE t( ) !1

    2xT(t)S(t)x(t)

    v t( ) =1

    2Tr S !( )L !( ) W !( )L !( )

    T"#

    $%d!t

    tf

    &

    Value Function Gradient and Hessian

    Certainty-equivalent value function

    Gradient with respect to the state

    VCE t( ) !1

    2xT

    (t)S(t)x(t)

    !V

    !xt( ) = xT(t)S(t)

    Hessian with respect to the state

    !2V

    !x2 t( ) =S(t)

    Linear-Quadratic StochasticHamilton-Jacobi-Bellman Equation

    (Perfect Measurements)

    Certainty-equivalent plus stochastic terms

    !V*

    !t="min

    u

    1

    2E x*

    TQx*+2x*

    TMu+ u

    TRu( )+ x*T S Fx*+Gu( )+ Tr SLWLT( )#$ %&

    ="minu

    1

    2x*

    TQx*+2x*

    TMu+ u

    TRu( )+ x*T S Fx*+Gu( )+ Tr SLWLT( )#$ %&

    Terminal condition

    V tf( ) =1

    2x

    T(tf)S(tf)x(tf)

  • 8/13/2019 Stochastic Control Princeton

    8/14

    Optimal Control LawDifferentiate right side of HJB equation w.r.t. uand

    set equal to zero

    ! !V !t( )!u

    = 0 = xTM + u

    TR( ) + xTSG"# $%

    u t( ) =!R!1 t( ) GT t( )S t( )+MT t( )"# $%x t( )

    !!C t( )x t( )

    Solve for u, obtaining feedback control law

    LQ Optimal Control Law

    u t( ) =!R!1 t( ) GT t( )S t( )+MT t( )"# $%x t( )

    !!C t( )x t( )

    Zero-mean, white-noise disturbance has no effecton thestructure and gains of the LQ feedback control law

    Matrix Riccati Equation

    ! Substitute optimal control law in HJB equation

    ! Matrix Riccati equation provides S(t)!S t( ) = !Q(t)+M(t)R!1(t)MT(t)"# $%! F(t)!G(t)R

    !1(t)MT(t)"# $%

    T

    S t( )

    !S t( ) F(t)!G(t)R!1(t)MT(t)"# $%+S t( )G(t)R!1(t)GT(t)S t( ), S tf( )=&xx tf( )

    1

    2

    xT !Sx +!v =1

    2

    xT !Q + MR!1MT( )! F ! GR!1MT( )T

    S! S F ! GR!1MT( )+ SGR!1GTS"

    #

    $

    %

    x

    +1

    2Tr SLWL

    T( ) u t( ) =!R!1

    t( ) GT t( )S t( ) +MT t( )"# $%x t( )

    ! Stochastic value function increases cost due to disturbance! However, its calculation is independent of the Riccati equation

    !v =1

    2Tr SLWL

    T( )

    Evaluation of the Total Cost(Imperfect Measurements)

    ! Stochastic quadratic cost function, neglecting cross termsJ=

    1

    2Tr E xT(tf)S(tf)x(tf)!" #$ +E xT(t) uT(t)!" #$

    Q(t) 0

    0 R(t)

    !

    "%%

    #

    $&&

    x(t)

    u(t)

    !

    "%%

    #

    $&&dt

    to

    tf

    '

    (

    )*

    +*

    ,

    -*

    .*

    =

    1

    2Tr S(tf)E x(tf)x

    T(tf)!" #$ + Q(t)E x(t)xT(t)!" #$ + R(t)E u(t)u

    T(t)!" #${ }dtto

    tf

    '

    J =1

    2Tr S(t

    f)P(t

    f) + Q(t)P(t)+R(t)U t( )!" #$dt

    to

    tf

    %&'(

    )(

    *+(

    ,(

    where P(t) ! E x(t)xT(t)!" #$U t( ) ! E u(t)uT(t)!" #$

    or

  • 8/13/2019 Stochastic Control Princeton

    9/14

    Optimal Control Covariance

    u t( ) =!C t( ) x t( )Optimal control vector

    U t( ) =C t( )P t( )CT t( )

    = R!1

    t( )GT t( )S t( )P t( )S t( )G t( )R!1 t( )

    Optimal control covariance

    Revise Cost to Reflect State andAdjoint Covariance Dynamics

    ! Integration by parts

    J=1

    2Tr S(to )P to( )+ Q(t)P t( ) + R(t)U t( ) + !S(t)P(t)+ S(t) !P(t)!" #$dt

    to

    tf

    %&'(

    )(

    *+(

    ,(

    S(t)P(t)to

    tf= !S(t)P(t)+S(t) !P(t)!" #$dt

    to

    tf

    %

    S(tf)P(tf) = S(to )P(to )+ !S(t)P(t)+S(t) !P(t)!" #$dt

    to

    tf

    %

    ! Rewrite cost function to incorporate initial cost

    Evolution of State and AdjointCovariance Matrices

    (No Control)

    ! State covariance response to random disturbance

    ! Adjoint covariance response to terminal cost!P t( ) = F t( )P t( ) + P t( )FT t( ) + L t( )W t( )LT t( ), P t

    o( ) given

    !S t( ) =!FT t( )S t( ) ! S t( )F t( ) !Q t( ), S tf( ) given

    u t( ) = 0; U t( ) = 0

    Evolution of State and AdjointCovariance Matrices

    (Optimal Control)

    ! State covariance response to random disturbance

    ! Adjoint covariance response to terminal cost

    !P t( ) = F t( )!G t( )C t( )"# $%P t( )+ P t( ) F t( )!G t( )C t( )"# $%T

    +L t( )W t( )LT t( )

    !S t( ) =!FT t( )S t( ) ! S t( )F t( ) !Q t( ) ! S t( )G t( )R!1 t( )GT t( )S t( )

    Dependent on S(t)

    Independent of P(t)

  • 8/13/2019 Stochastic Control Princeton

    10/14

    ! With nocontrolJ

    no control = 1

    2Tr S(t

    o)P t

    o( ) + S t( )L t( )W t( )LT t( )dt

    to

    tf

    !"

    #$$

    %

    &''

    ! Withoptimal control, the equation for the cost is the same

    Joptimal control

    =1

    2Tr S(to )P to( )+ S t( )L t( )W t( )L

    T t( )dtto

    tf

    !"

    #$$

    %

    &''

    ! ... but evolutions of S(t)and S(to)are different in each case

    Total Cost With andWithout Control

    Next Time:

    Linear-Quadratic-GaussianRegulators

    Supplementalaterial

    Neighboring-Optimal Control withUncertain Disturbance, Measurement,

    and Initial Condition

  • 8/13/2019 Stochastic Control Princeton

    11/14

    Immune Response Example

    ! Optimal open-loop drug therapy (control)! Assumptions

    ! Initial condition known without error! No disturbance

    ! Optimal closed-loop therapy! Assumptions

    ! Small error in initial condition! Small disturbance! Perfect measurement of state

    ! Stochastic optimal closed-loop therapy! Assumptions

    ! Small error in initial condition! Small disturbance! Imperfect measurement! Certainty-equivalence applies to

    perturbation control

    Immune ResponseExample with Optimal

    Feedback Control

    Open-Loop Optimal Controlfor Lethal Initial Condition

    Open- and Closed-LoopOptimal Control for 150%Lethal Initial Condition

    Immune Response with Full-StateStochastic Optimal Feedback Control

    (Random Disturbance and Measurement Error Not Simulated)

    Low-Bandwidth Estimator

    (|W| < |N|)

    High-Bandwidth Estimator

    (|W| > |N|)

    ! Initial control too sluggishto prevent divergence

    ! Quick initial controlprevents divergence

    W= I4N= I2/ 20

    Stochastic-Optimal Control (u1)with Two Measurements (x1, x3)

    (w/Ghigliazza, 2004)

  • 8/13/2019 Stochastic Control Princeton

    12/14

    Immune Response to RandomDisturbance with Two-Measurement

    Stochastic Neighboring-Optimal Control

    Disturbance due to Re-infection

    Sequestered pocketsof pathogen

    Noisy measurements Closed-loop therapy is

    robust

    ... but not robust enough: Organ death occurs in

    one case

    Probability of satisfactorytherapy can be

    maximized by stochasticredesignof controller

    Dual Control(Feldbaum, 1965)

    ! Nonlinear system! Uncertain system parameters to be estimated! Parameter estimation can be aided by test inputs

    ! Approach: Minimize value function with three increments! Nominal control! Cautious control! Probing control

    minu

    V* =

    minu

    V*nominal + V*cautious + V*probing( )

    Estimation and control calculations are coupledand necessarily recursive

    Adaptive Critic Controller

    Nonlinear control law,c, takes the general form

    On-line adaptive critic controller Nonlinear control law (action network) Criticizesnon-optimal performance via critic network

    Adapts control gains to improve performance Adapts cost model to improve estimate

    u t( ) =c x(t),a,y * t( )[ ]

    x(t) : state

    a: parameters of operating point

    y *( t) : command input

    Algebraic Initialization of Neural Networks(Ferrari and Stengel)

    Initially, c[x, a, y*]is unknown Design PI-LQ controllers with integral compensation that

    satisfy requirements at noperating points Scheduling variable, a

    u t( ) =CF

    a( )y *+CB

    a( )!x + CI

    a( ) !y t( )dt" #c x(t),a,y * t( )$% &'

  • 8/13/2019 Stochastic Control Princeton

    13/14

    Replace Gain Matrices by Neural Networks

    Replace control gain matrices by sigmoidal neural networks

    u t( ) =NNF

    y * t( ),a t( )!" #$ +NNB x t( ),a t( )!" #$ +NNI %y t( )dt& ,a t( )!" #$ = c x(t),a,y * t( )!" #$

    Initial Neural Control Law

    Algebraic training of neural networksproduces exact fitoflinear control gains and trim conditions at noperating points Interpolation and gain scheduling via neural networks One node per operating point in each neural network

    On-line Optimization of AdaptiveCritic Neural Network Controller

    Critic adapts neural network weights toimprove performance usingapproximate dynamic programming

    Heuristic Dynamic ProgrammingAdaptive Critic

    Dual Heuristic Programming Adaptive Criticfor receding-horizonoptimization problem

    Critic andAction (i.e.,Control)networks adapted concurrently LQ-PI cost functionapplied to nonlinear problem Modified resilient backpropagation for neural network training

    V x tk

    ( )[ ] = L x tk( ),u tk( )[ ] + V x tk+1( )[ ]

    !V

    !u=!L

    !u+!V

    !x

    !x

    !u= 0

    !V xa t( )[ ]

    !xa t( )

    = NNC x

    a t( ),a t( )[ ]

  • 8/13/2019 Stochastic Control Princeton

    14/14

    Action Network On-line TrainingTrain action network, at timet, holding the critic parameters fixed

    NNC

    Aircraft ModelTransition MatricesState Prediction

    Utility Function

    Derivatives

    NNA

    xa(t)

    a(t)

    Optimality

    Condition

    NNA

    Target

    Target Generation

    Critic Network On-line TrainingTrain critic network, at time t, holding the action parameters fixed

    NNC(old)

    Utility Function

    Derivatives

    NNA

    NNC

    Target

    Target Generation

    Aircraft ModelTransition MatricesState Prediction

    NNC

    Target Cost

    Gradient

    xa(t)

    a(t)