INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING Int. J. Adapt. Control Signal Process. 2015; 00:1–28 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/acs Optimal Adaptive Control for Weakly Coupled Nonlinear Systems: A Neuro-Inspired Approach † Luis Rodolfo Garc´ ıa Carrillo 1 , Kyriakos G. Vamvoudakis 2, * , Jo˜ ao Pedro Hespanha 2 1 School of Engineering and Computing Sciences, Texas A&M University - Corpus Christi, TX, 78412-5797 USA 2 Center for Control, Dynamical-systems and Computation (CCDC), University of California, Santa Barbara, CA 93106-9560 USA SUMMARY This paper proposes a new approximate dynamic programming algorithm to solve the infinite-horizon optimal control problem for weakly coupled nonlinear systems. The algorithm is implemented as a three- critics/four-actors approximators structure, where the critic approximators are used to learn the optimal costs, while the actor approximators are used to learn the optimal control policies. Simultaneous continuous- time adaptation of both critic and actor approximators is implemented, a method commonly known as synchronous policy iteration. The adaptive control nature of the algorithm requires a persistence of excitation condition to be a priori guaranteed, but this can be relaxed by using previously stored data concurrently with current data in the update of the critic approximators. Appropriate robustifying terms are added to the controllers to eliminate the effects of the residual errors, leading to asymptotic stability of the equilibrium point of the closed-loop system. Simulation results show the effectiveness of the proposed approach for a sixth-order dynamical example. Copyright c 2015 John Wiley & Sons, Ltd. Received . . . KEY WORDS: Weakly coupled systems; large scale systems; adaptive control; approximate dynamic programming; asymptotic stability; optimal control; reinforcement learning. 1. INTRODUCTION Large-scale systems represent a challenging problem in optimal control [1], because their complexity can make numerical computations infeasible. A common approach for dealing with these kind of systems consists on splitting the large-scale design problem into a set of simpler problems or subsystems. As an example, the subsystems for the regulation of temperature, pressure, and flow, are designed separately in spite of their connection through a chemical plant. Similarly, such connected systems can be found in power systems, aircrafts, cars, communication networks. They are generally characterized by the presence of weak coupling between their subsystems. Practical knowledge may provide some guidance on how to split a large-scale problem into a set of simpler problems. But all these approaches completely neglect the coupling effect and most of the time the obtained results do not have a guaranteed performance level. Weakly coupled linear systems have been studied extensively since their introduction to the control systems community by Kokotovic et al. [2] (see also for example [3], [4], and the references therein). Those systems have also been studied in mathematics [5], [6], economics [7], power system Correspondence to: K. G. Vamvoudakis, Center for Control, Dynamical-systems and Computation (CCDC), University of California, Santa Barbara, CA 93106-9560 USA. E-mail: [email protected]This material is based upon work supported in part by ARO MURI Grant number W911NF0910553, ARO grant W911NF-09-D-0001 (Inst. for Collaborative Biotechnologies), University of California, Santa Barbara, CA 93106-9560 USA Copyright c 2015 John Wiley & Sons, Ltd. Prepared using acsauth.cls [Version: 2010/03/27 v2.00]
28
Embed
Optimal Adaptive Control for Weakly Coupled …hespanha/published/acsdoc_V1.pdfINTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING Int. J. Adapt. Control Signal Process.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSINGInt. J. Adapt. Control Signal Process. 2015; 00:1–28Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/acs
Optimal Adaptive Control for Weakly Coupled NonlinearSystems: A Neuro-Inspired Approach†
Luis Rodolfo Garcıa Carrillo1, Kyriakos G. Vamvoudakis2,∗, Joao Pedro Hespanha2
1 School of Engineering and Computing Sciences, Texas A&M University - Corpus Christi, TX, 78412-5797 USA2Center for Control, Dynamical-systems and Computation (CCDC), University of California, Santa Barbara, CA
Large-scale systems represent a challenging problem in optimal control [1], because theircomplexity can make numerical computations infeasible. A common approach for dealing withthese kind of systems consists on splitting the large-scale design problem into a set of simplerproblems or subsystems. As an example, the subsystems for the regulation of temperature, pressure,and flow, are designed separately in spite of their connection through a chemical plant. Similarly,such connected systems can be found in power systems, aircrafts, cars, communication networks.They are generally characterized by the presence of weak coupling between their subsystems.Practical knowledge may provide some guidance on how to split a large-scale problem into a set ofsimpler problems. But all these approaches completely neglect the coupling effect and most of thetime the obtained results do not have a guaranteed performance level.
Weakly coupled linear systems have been studied extensively since their introduction to thecontrol systems community by Kokotovic et al. [2] (see also for example [3], [4], and the referencestherein). Those systems have also been studied in mathematics [5], [6], economics [7], power system
Correspondence to: K. G. Vamvoudakis, Center for Control, Dynamical-systems and Computation (CCDC), Universityof California, Santa Barbara, CA 93106-9560 USA. E-mail: [email protected]:This material is based upon work supported in part by ARO MURI Grant number W911NF0910553, ARO grantW911NF-09-D-0001 (Inst. for Collaborative Biotechnologies), University of California, Santa Barbara, CA 93106-9560USA
2 L. R. GARCIA CARRILLO, K. G. VAMVOUDAKIS, AND J. P. HESPANHA
engineering [8], [9], [10], and in nearly complete decomposable continuous-time and discrete-timeMarkov chains [11], [12], [13].
Due to the curse of dimensionality and the intractable form of the Hamilton-Jacobi-Bellman(HJB) equations that arise in optimal control, obtaining closed-form optimal control solutions forweakly coupled nonlinear systems is practically impossible. A first attempt for the optimizationof coupled nonlinear systems was reported in [14] where the authors proposed a couplingperturbation method for near-optimum design. Approximate solutions of independent reduced-order HJB equations using Successive Galerkin Approximation (SGA) [15], [16], [17] have beenproposed as alternative methods for solving the weakly coupled nonlinear optimal control problem.Unfortunately, the SGA method suffers from a computational complexity that increases with thedimension of the system under consideration.
Adaptive dynamic programming (ADP) techniques were proposed by Werbos [18], [19]. ADPbrings together the advantages of adaptive and optimal control to obtain approximate and forwardin time solutions to difficult optimization problems [20], [21], [22]. But, all the existing algorithms- such as the ones developed in [23], [24], [25], [26], [27], and the references therein - canonly guarantee uniform ultimate boundedness of the closed-loop signals, i.e., a milder form ofstability [28], and require a persistence of excitation condition to be satisfied for all time.
The need for adaptive controllers with the ability to learn optimal solutions for weakly couplednonlinear systems, while also guaranteeing asymptotic stability of the equilibrium point of theclosed-loop system motivates our research. The algorithm proposed, is motivated by a reinforcementlearning algorithm called Policy Iteration (PI) [29] which is inspired by behaviorist psychology. Tothe best of our knowledge, there are not any asymptotically stable online solutions to the continuous-time HJB equation for weakly coupled nonlinear systems since couplings add nonlinearities to theHJB and make the problem more difficult.
1.1. Related work
A decoupling transformation that exactly decomposes weakly coupled linear systems composed oftwo subsystems into independent subsystems was introduced in [30]. These results were extendedin [31] and in the book [32] to the general case of linear weakly coupled systems composed ofN subsystems, and conditions under which such a transformation is feasible were established. Theproposed optimal control algorithm is obtained in the form of a feedback law, where feedback gainsare calculated from two independent reduced-order optimal control problems. In a similar way, theoptimal control problem for weakly coupled bilinear systems was studied in [35], [33], and [34].These results, were based on a recursive reduced-order scheme in order to solve the algebraicRiccati equation. Following this reduced-order scheme for solving the algebraic Riccati equation,the authors in [36] proposed a nonlinear optimal control for a weakly coupled nonlinear systembased on the solution of two independent reduced-order HJB equations, using successive Galerkinapproximation (SGA) [15], [16], [17]. The main drawback of this method is the offline design andthat the computational complexity increases with the dimension of the system.
Moreover, in most of the adaptive control algorithms [47], there is a need for guaranteedpersistence of excitation (PE) condition which is equivalent to space exploration in reinforcementlearning [29], [37], [38]. This condition is restrictive in nonlinear systems and often difficult toguarantee in practice. Hence, convergence cannot be guaranteed. The work of [39] from the adaptivecontrol side, and the works of [40] and [41] from the reinforcement learning side propose somealternatives that rely on concurrently using current and recorded data for adaptation to obviate thedifficulty of guaranteeing convergence with PE. Recently the authors in [42] have used concurrentlearning in optimal adaptive control but they only prove a milder form of stability, namely uniformultimate boundedness of the closed-loop signals by using an approach that is based on integralreinforcement learning.
1.2. Contributions
The contributions of the paper rely on the development of an adaptive learning algorithm to solve thecontinuous-time optimal control problem with infinite horizon cost for weakly coupled nonlinear
OPTIMAL ADAPTIVE CONTROL FOR WEAKLY COUPLED NONLINEAR SYSTEMS 3
systems. The online adaptive algorithm is implemented as a three-critic/four-actor approximatorsstructure, which involves continuous-time adaptation of both critic and actor approximators. Theproposed algorithm is an appropriate combination of ideas from adaptive control, optimal controland reinforcement learning. Finally, we prove asymptotic stability of the equilibrium point of theclosed-loop system.
Structure
The paper is structured as follows. In Section 2 we formulate the optimal control with saturatedinputs problem. The approximate solution for the HJB equation is presented in Section 3. TheLyapunov proof that guarantees asymptotic stability of the closed-loop is presented in Section 5.Simulation results demonstrating the performance of the online algorithm acting on a weaklycoupled system are given in Section 4. Finally Section 5 concludes and talks about future work.
Notation
The notation used here is standard. R is the set of positive real numbers and Z is the set of positiveinteger numbers. The superscript is used to denote the optimal solution, λmin
A
is the minimumeigenvalue of a matrix A, λmax
A
is the maximum eigenvalue of a matrix A and 1m is the columnvector withm ones. The gradient of a scalar-valued function with respect to a vector-valued variablex is denoted as a column vector, and is denoted by ∇ : BBx. Vx denotes the partial derivative agiven function V pxq with respect to x. A function α : R Ñ R is said to belong to class Kpα P Kqfunctions if it is strictly increasing and αp0q 0.
2. PROBLEM FORMULATION
Consider the following weakly coupled nonlinear continuous-time system,
9x
9x1
9x2
f11px1q εf12pxqεf21pxq f22px2q
g11px1q εg12pxqεg21pxq g22px2q
u11ptq εu12ptqεu21ptq u22ptq
, (1)
with an initial condition, x1p0qx2p0q
x10
x20
,
where x1 P Rn1 , x2 P Rn2 , with n1 n2 n are the states that can be measured, u1i P Rm1 , u2i PRm2 , withm1 m2 m, i P t1, 2u are the control inputs and ε P R is a small coupling parameter.Moreover, x rxT
1 xT2 s
T is the full state variable and u ruT11 εuT
12 εuT21 uT
22sT P U Rm
is the total control input. We assume that f1ipq P Rn1 , f2ipq P Rn2 and gijpq P Rnimj are knownfunctions. We also assume that f1ip0q 0 and f2ip0q 0 for i P t1, 2u.
It is desired to minimize the following infinite horizon cost functional associated with (1),
The Hamiltonian of the system (1) associated with the cost function (2)-(3) after setting ε2 0 isexpressed by the following Opε2q approximation,
H H1 εH2 H3, (8)
where,
H1 1
2xT
1 Q1x1 V T1x1f11px1q V T
1x1g11px1qu11
1
2uT
11R1u11,@x1, u11, (9)
H2 xT1 Q2x2 uT
11R1u12 uT22R2u21 V T
1x1f12pxq V T
3x2f21pxq
V T1x1g11px1qu12 V T
1x1g12pxqu22
V T2x1g11px1qu11 V T
2x2g22px2qu22 V T
2x1f11px1q V T
2x2f22px2q
V T3x2g21pxqu11 V T
3x2g22px2qu21, @x, u11, u12, u21, u22, (10)
H3 1
2xT
2 Q3x2 V T3x2f22px2q V T
3x2g22px2qu22
1
2uT
22R2u22,@x2, u22. (11)
Hence, the ultimate goal is to find the following optimal value function,
V minuPU
» 8t
rpx, uqdτ, t ¥ 0, (12)
subject to the state dynamics in (1).The optimal value V satisfies the following HJB equation (see [1] for an existence theorem),
1
2xTQx
1
2uTRu V T
x fpxq V Tx gpxqu 0, (13)
where,
fpxq rf11 εf12 εf21 f22sT
; gpxq
g11 εg12
εg21 g22
,
and u is the optimal control that will be found later.
Assumption 1 (Smoothness of solution)The solution to (13) is smooth, i.e. V P C1, and positive definite with V p0q 0. l
Remark 1Hamilton-Jacobi equations are nonlinear partial differential equations, and it is well-known thatin general such equations do not admit global classical solutions and if they do, they may not besmooth. But they may have the so-called viscosity solutions [43]. Under certain local reachabilityand observability assumptions, they have local smooth solutions [44]. Various other assumptionsguarantee existence of smooth solutions, such as that the dynamics not be bilinear and the costfunction not contain cross-terms in the state and control input. The latter two assumptions aresatisfied for the system (1) and the cost (2) under consideration. l
OPTIMAL ADAPTIVE CONTROL FOR WEAKLY COUPLED NONLINEAR SYSTEMS 5
We shall now, split the solution V Tx as follows,
V Tx
V T
1x1 εV T
2x1
εV T2x2
V T3x2
. (14)
The optimal control input for the system (1) with the optimal value function (12) can be obtainedusing the stationarity condition in the Hamiltonian (8),
BH
Bu 0 ñ u R1gTpxqV T
x pxq, (15)
which can be split into the following control inputs,
u11px1q R11 gT
11px1qVT1x1
,@x1, (16)
u12pxq R11
gT
11px1qVT2x1
gT21pxqV
T3x2
,@x, (17)
u21pxq R12
gT
22px2qVT2x2
gT12pxqV
T1x1
,@x, (18)
u22px2q R12 gT
22px2qVT3x2
,@x2. (19)
After substituting the optimal controls (16)-(19) into the Hamiltonians (9)-(11) one has thefollowing 3 HJB equations H
i 0, @i P t1, 2, 3u,
0 1
2xT
1 Q1x1 V T1x1
f11px1q V T1x1
g11px1qu11px1q
1
2uT
11 px1qR1u11px1q, (20)
0 xT1 Q2x2 V T
1x1f12pxq V T
3x2f21pxq
V T1x1
g12pxqu22px2q V T
2x1g11px1qu
11px1q V T
2x2g22px2qu
22px2q
V T2x1
f11px1q V T2x2
f22px2q V T3x2
g21pxqu11px1q, (21)
0 1
2xT
2 Q3x2 V T3x2
f22px2q V T3x2
g22px2qu22px2q
1
2uT
22 px2qR2u22px2q. (22)
Due to the nonlinear nature of these three weakly coupled HJB equations, finding their solution isgenerally difficult or impossible.
The following section shall provide approximate solutions to equations (20), (21), and (22).
3. APPROXIMATE SOLUTION
The next subsections will lay the foundation for updating the optimal value function and the optimalcontrol input simultaneously by using data collected along the closed-loop trajectory.
3.1. Critic approximators and recorded past data
The first step to solve the HJB equations (20), (21), and (22) is to approximate the value functionV pxq in equation (12) on any given compact set Ω Rn with a critic approximator as follows,
V pxq WTφpxq εpxq, @x, (23)
where W P RNtot are the ideal weights satisfying W ¤Wmax; φpxq : Ω Ñ RNtot , φpxq rϕ1pxq ϕ2pxq . . . ϕNtot
pxqsT are the basis functions, such that ϕip0q 0 and ∇ϕip0q 0, @i 1, . . . , Ntot; Ntot is the number of neurons in the hidden layer and εpxq is the approximation error.It has been shown in [46] that NNs with a single hidden layer and an appropriately smooth hiddenlayer activation function are capable of arbitrarily accurate approximation to an arbitrary functionand its derivatives.
One should pick the basis functions ϕipxq, @i P t1, 2, . . . , Ntotu as polynomial, radial basis orsigmoidal functions. In this case, V and its derivatives,
6 L. R. GARCIA CARRILLO, K. G. VAMVOUDAKIS, AND J. P. HESPANHA
can be uniformly approximated on any given compact set Ω. According to Weierstrass higher orderapproximation Theorem [45], a polynomial suffices to approximate V as well as its derivativeswhen they exist. Moreover, as the number of basis sets Ntot increases, the approximation error on acompact set Ω goes to zero, i.e., εpxq Ñ 0 as Ntot Ñ8.
We shall require a form of uniformity in this approximation result that is common in neuro-adaptive control and other approximation techniques [46, 47]. We shall now write the approximateHamiltonian as,
H :1
2xTQx
1
2uTRuWT∇φpfpxq guq, @x, u, (25)
with a residual error given as,
εH : H H ∇εTfpxq ∇εTpf guq, @x, u, (26)
with,
H H1 εH
2 H3 ,
and,
εH εH1 εεH2 εH3 .
Assumption 2 (Critic Uniform Approximation)The critic activation functions φ, the value function approximation error ε, their derivatives, and theHamiltonian residual error εH are all uniformly bounded on a set Ω Rn, in the sense that thereexist known finite constants φm, φdm, εm, εdm, εHm P R such that |φpxq| ¤ φm, |∇φpxq| ¤ φdm,|εpxq| ¤ εm, |∇εpxq| ¤ εdm, |εHpxq| ¤ εHm, @x P Ω. l
Since the ideal weights for the value function V pxq that appear in (23) are unknown, one mustconsider the actual critic weight estimates W P RNtot , associated to,
V pxq WTφpxq, @x. (27)
The approximate solution (27) can be split to obtain the approximate solution of equations (5), (6),and (7), and for that reason we shall use a vector of polynomials φ1 P RN1 , φ2 P RN2 , and φ3 P RN3
respectively. Hence, these approximations can be expressed as,
V1px1q WT1 φ1px1q, @ x1, (28)
V2pxq WT2 φ2pxq, @ x, (29)
V3px2q WT3 φ3px2q, @ x2. (30)
Our objective is to find update laws for the weight estimates W1 P RN1 , W2 P RN2 , and W3 P RN3
where Nj , j t1, 2, 3u are the neurons in the hidden layer of each critic approximator. Ourobjective is for the actual weight estimates to converge to the ideal values in the sense thatW1 ÑW
1 , W2 ÑW2 , and W3 ÑW
3 .Now, we can write the approximate Hamiltonians (25) with current weight estimates as,
OPTIMAL ADAPTIVE CONTROL FOR WEAKLY COUPLED NONLINEAR SYSTEMS 7
It is obvious that, when we have convergence of the actual weight estimates to the ideal weightvalues and u11 u11, u12 u12, u21 u21, u22 u22 then the approximate Hamiltonians alsoconverge to the HJB equations in the sense that H1 Ñ H
1 , H2 Ñ H2 , and H3 Ñ H
3 , as tÑ8.
Definition 1 ( [47])A vector signal Φptq is exciting over the interval rt, t TPEs, with TPE P R if there existsβ1, β2 P R such that β1I ¤
³tTt
ΦpτqΦTpτqdτ ¤ β2I with I an identity matrix of appropriatedimensions. l
To achieve convergence of (31), (32), (33) to the (approximate) Hamiltonian (25) along theclosed-loop trajectories, one would typically need persistence of excitation @t ¥ 0 (see Definition 1)for the vectors ω1ptq, ω2ptq, ω3ptq defined by
To weaken the need to guarantee a persistence of excitation condition in the sense of Definition 1for infinite-time, we follow the approach proposed in [48] that uses past recorded data, concurrentlywith current data. To this effect, we define the Hamiltonian errors corresponding to the datacollected at the current time t,
e1ptq : H1 H1 H1
x1ptq, u11ptq, W1ptq
T∇φ1px1ptqq, (37)
e2ptq : H2 H2 H2
xptq, u11ptq, u22ptq, u12ptq, u21ptq,
W1ptqT∇φ1px1ptqq, W2ptq
T∇φ2pxptqq, W3ptqT∇φ3px2ptqq
, (38)
e3ptq : H3 H3 H3
x2ptq, u22ptq, W3ptq
T∇φ3px2ptqq, (39)
where the latter equalities in equations (37), (38), and (39) are due to (20), (21),and (22), respectively. Similarly, the errors corresponding to data previously collected at timest0, t1, . . . , tkj t,@j P t1, 2, 3u can be defined as,
e1buffipti, tq : H1
x1ptiq, u11ptiq, W1ptq
T∇φ1px1ptiqq,
e2buffipti, tq : H2
xptiq, u11ptiq, u22ptiq, u12ptiq, u21ptiq,
W1ptqT∇φ1px1ptiqq, W2ptq
T∇φ2pxptiqq, W3ptqT∇φ3px2ptiqq
e3buffipti, tq : H3
x2ptiq, u22ptiq, W3ptq
T∇φ3px2ptiqq.
Note that, while the errors e1buffipti, tq, e2buffi
pti, tq, e3buffipti, tq use past state and input data
x1ptiq, xptiq, x2ptiq and u11ptiq, u12ptiq, u21ptiq, u22ptiq respectively, they are defined based on thecurrent weight estimates W1ptq, W2ptq, W3ptq.
The current and previous errors defined above can be combined into the following (normalized)quadratic errors,
OPTIMAL ADAPTIVE CONTROL FOR WEAKLY COUPLED NONLINEAR SYSTEMS 9
where Nomi, @i P t1, 2, 3u are the nominal systems and peri, @i P t1, 2, 3u are the perturbation dueto the errors εHi
, @i P t1, 2, 3u.
Note that, in order to derive the expressions for the components of 9W 9W we
used (40), (41), (42) together with the fact that 12x
TQx 12u
TRu WTωptq εHptq, whichis a consequence of (9), (10), (11), and (26).
Theorem 1Suppose that tωjpt1q, . . . , ωjptkj qu contains Nj ,@j P t1, 2, 3u linearly independent vectors and thatthe critic tuning laws are given by (40), (41), (42). Then, for any given control signal uptq for thenominal systems (i.e. εHi
0, @i P t1, 2, 3u) we have that,
d
dt
W1ptq2
¤ 2α1λmin
k1
i1
ω1ptiqω1ptiqT
pω1ptiqTω1ptiq 1q2
W1
2
, (49)
d
dt
W2ptq2
¤ 2α2λmin
k2
i1
ω2ptiqω2ptiqT
pω2ptiqTω2ptiq 1q2
W2
2
, (50)
d
dt
W3ptq2
¤ 2α3λmin
k3
i1
ω3ptiqω3ptiqT
pω3ptiqTω3ptiq 1q2
W3
2
, (51)
and for bounded εHj , @j P t1, 2, 3u, the Wj , @j P t1, 2, 3u converge exponentially to the
residual sets,Rsj "Wj |
Wj
¤αj
ωjptqωjptq
T
pωjptqTωjptq1q2
εHjptq
°kji1
ωjptiqωjptiqT
pωjptiqTωjptiq1q2
εHjptiq
λmin
°kji1
ωjptiqωjptiqT
pωjptiqTωjptiq1q2
*, @j P
t1, 2, 3u. l
Remark 2Ordinary adaptive optimal control algorithms, e.g. [49], do not have the extra past-data term°kji1
ωjptiqωjptiqT
pωjptiqTωjptiq1q2 , @j P t1, 2, 3u in the error dynamics and thus need a persistence of
excitation condition on ωjptqpωjptqTωjptq1q (typically of the form β1I ¤
³tTt
ωjptqωjptqT
pωjptqTωjptq1q2 ¤ β2Iwith constants β1, β2, T P R) that holds for every t from t 0 to t 8. This condition cannotbe verified during learning. In Theorem 1, the persistence of excitation condition comes throughthe requirement that at least Nj ,@j P t1, 2, 3u of the vectors tωjpt1q, . . . , ωjptkj qu,@j P t1, 2, 3u
must be linearly independent, which is equivalent to the matrix Λj °kji1
ωjptiqωjptiqT
pωjptiqTωjptiq1q2 ,@j P
t1, 2, 3u being positive definite. In practice, as one collects each additional vector ωjptiq, one addsa new term to the matrix
°kji1
ωjptiqωjptiqT
pωjptiqTωjptiq1q2 ,@j P t1, 2, 3u and one can stop recording points assoon as this matrix becomes full-rank (i.e. tkj ,@jt1, 2, 3u time has been reached). From that pointforward, one does not need to record new data and the assumption of Theorem 1 holds, regardlessof whether or not future data provides additional excitation. In spite of the fact that our Theorem,for theoretical purposes requires a very large number of basis sets (i.e. Nj Ñ8,@j P t1, 2, 3u) inour numerical simulations it suffices to pick a small number of quadratic or radial basis functions.The selection of the times ti is somewhat arbitrary, but in our numerical simulations we typicallyselect these values equally spaced in time. l
Remark 3It is assumed that the maximum number of data points to be stored in the history (i.e.,t0, t1, . . . , tkj t,@j P t1, 2, 3u) is limited due to memory/bandwidth limitations. l
10 L. R. GARCIA CARRILLO, K. G. VAMVOUDAKIS, AND J. P. HESPANHA
3.2. Actor approximators
One could use a single set of weights with a sliding-mode controller as in [51] to approximateboth the optimal value functions V 1 , V 2 , V 3 and their gradients ∇V 1 , ∇V 2 , ∇V 3 but insteadwe independently adjust two sets of weights: the critic weights introduced in (28), (29), (30) toapproximate V 1 , V 2 , V 3 , respectively and the actor weights introduced below to approximate u11,u12, u21, u22 from (16), (17), (18), (19). While this carries additional computational burden, theflexibility introduced by this “over-parameterization” will enable us to establish convergence to theoptimal solution and guaranteed Lyapunov-based stability, which seems difficult using only one setof weights.
The optimal control policies (16), (17), (18), (19) can be approximated, respectively, by 4 actorapproximators as follows
u11px1q Wu11
Tφu11
px1q εu11px1q, @x1 (52)
u12pxq Wu12
Tφu12
pxq εu12pxq, @x (53)
u21pxq Wu21
Tφu21
pxq εu21pxq, @x (54)
u22px2q Wu22
Tφu22px2q εu22px2q, @x2 (55)
where Wu11
P RN4m, Wu12
P RN5m, Wu21
P RN6m, and Wu22
P RN7m are the ideal weightmatrices, φu11px1q, φu21pxq, φu12pxq, and φu22px2q are the basis functions defined in a similar waythan the one used for the critic approximators, N4, N5, N6, N7 is the number of neurons in thehidden layer of each actor approximator, and εu11 , εu12 , εu21 , εu22 are the four approximation errors.As before, the u11px1q, u12pxq, u
21pxq, u
22px2q can be uniformly approximated, as expressed by
the following assumption. According to Weierstrass higher order approximation theorem [46], apolynomial basis set suffices for proper approximation, and moreover as the number of basis setsN4, N5, N6, N7 increases, the approximation errors goes to zero, i.e., εu11
Ñ 0, εu12Ñ 0, εu21
Ñ 0,and εu22
Ñ 0, as N4, N5, N6, N7 Ñ8.
Assumption 3 (Actor Uniform Approximation)The actor activation functions in φu11 P RN4 , φu12 P RN5 , φu21 P RN6 , φu22 P RN7 and theactor residual errors εu11 , εu12 , εu21 , εu22 are all uniformly bounded on any given compact setΩ, in the sense that there exist known finite constants tφu11max, φu12max, φu21max, φu22maxu PR and tεu11max, εu12max, εu21max, εu22maxu P R such that |φu11
px1q| ¤ φu11max, |φu12pxq| ¤
φu12max, |φu21pxq| ¤ φu21max, |φu22
px2q| ¤ φu22max, and |εu11px1q| ¤ εu11max, |εu12
pxq| ¤εu12max, |εu21
pxq| ¤ εu21max, |εu22px2q| ¤ εu22max, @x P Ω . l
Since the ideal weights Wu11
P RN4m, Wu12
P RN5m, Wu21
P RN6m, and Wu22
P RN7m
are not known, we introduce the current actor estimate weights Wu11 , Wu12 , Wu21 , and Wu22
to approximate the optimal controls (52), (53), (54), (55), respectively, by the following actorapproximators,
OPTIMAL ADAPTIVE CONTROL FOR WEAKLY COUPLED NONLINEAR SYSTEMS 11
Our goal is then to appropriately tune Wu11, Wu12
, Wu21, Wu22
, such that the following quadraticerror terms are minimized,
Eu11
1
2eTu11
ptqeu11ptq, @t, (60)
Eu12 1
2eTu12
ptqeu21ptq, @t, (61)
Eu21 1
2eTu21
ptqeu12ptq, @t, (62)
Eu22 1
2eTu22
ptqeu22ptq, @t, (63)
where,
eu11WT
u11φu11
R11 gT
11px1q∇φT1 W1,
eu12 WTu12φu12 R1
1
gT
11px1q∇φT2 W2 gT
21pxq∇φT3 W3
,
eu21 WTu21φu21 R1
2
gT
22px2q∇φT2 W2 gT
12pxq∇φT1 W1
,
eu22 WTu22φu22 R1
2 gT22px2q∇φT
3 W3,
are the errors between the estimates (56), (57), (58), (59) and versions of (16), (17), (18), (19), inwhich V is approximated by the estimates of the critic approximators (28), (29), (30).
The tuning laws for the actor approximators are obtained by a gradient descent-like rule as follows
9Wu11 αu11
BEu11
BWu11
αu11φu11
eu11
αu11φu11
WTu11φu11
R11 gT
11px1q∇φT1 W1
T
, (64)
9Wu12
αu12
BEu12
BWu12
αu12φu12
eu12
αu12φu12
WTu12φu12
R11
gT
11px1q∇φT2 W2 gT
21pxq∇φT3 W3
T
, (65)
9Wu21
αu21
BEu21
BWu21
αu21φu21
eu21
αu21φu21
WTu21φu21
R12
gT
22px2q∇φT2 W2 gT
12pxq∇φT1 W1
T
, (66)
9Wu22
αu22
BEu22
BWu22
αu22φu22
eu22
αu22φu22
WTu22φu22
R12 gT
22px2q∇φT3 W3
T
, (67)
where αu11P R, αu12
P R, αu12P R, and αu22
P R are constant gains that determine the speedof convergence. Defining the weight estimation errors for each one of the actors by
12 L. R. GARCIA CARRILLO, K. G. VAMVOUDAKIS, AND J. P. HESPANHA
and after taking into consideration that (16), (17), (18), (19) with (23) is approximatedby (56), (57), (58), (59), respectively, the actor approximators error dynamics can be written as
9Wu11 αu11φu11φTu11Wu11 αu11φu11
R1
1 gT11px1q∇φT
1 W1
T
αu11φu11
εu11 αu11
φu11
R1
1 gT11px1q∇ε1
T, (72)
9Wu12 αu12
φu12φTu12Wu12
αu12φu12
R1
1 gT11px1q∇φT
2 W2 R11 gT
21pxq∇φ3W3
T
αu12φu12
εu12 αu12
φu12
R1
1 gT11px1q∇ε2 R1
1 gT21pxq∇ε3
T, (73)
9Wu21 αu21φu21φTu21Wu21 αu21φu21
R1
2 gT22px2q∇φT
2 W2 R12 gT
12pxq∇φ1W1
T
αu12φu21εu21 αu21φu21
R1
2 gT22px2q∇ε2 R1
2 gT12pxq∇ε1
T, (74)
9Wu22 αu22
φu22φTu22Wu22
αu22φu22
R1
2 gT22px2q∇φT
3 W3
T
αu22φu22εu22 αu22φu22
R1
2 gT22px2q∇ε3
T. (75)
A pseudocode (with inline comments to provide guidance following after the symbol ) thatdescribes the proposed adaptive-optimal control algorithm has the following form,
Algorithm 1: Adaptive-Optimal Control Algorithm for Weakly-Coupled Nonlinear Systems1: Start with initial state xp0q, random initial weights Wu11p0q, Wu12p0q, Wu21p0q, Wu22p0q, W1p0q, W2p0q, W3p0q
and i 12: procedure3: Propagate t, xptq using (1)4: Propagate Wu11 , Wu12 , Wu21 , Wu22 , W1, W2, W3
integrate 9
Wu11 ,9Wu12 ,
9Wu21 ,
9Wu22 as in
(64)-(67) and 9W1,
9W2,
9W3 as in (40)-(42) using any ode solver (e.g. Runge Kutta)
(
5: Compute V1 WT1 φ1px1q, V2 WT
2 φ2pxq, V3 WT3 φ3px2q
6: Compute u11 WTu11
φu11px1q, u12 WTu12
φu12pxq, u21 WTu21
φu21pxq, u22 WTu22
φu22px2q
7: if i k then tωjpt1q, ωjpt2q, . . . , ωjptiqu,@j P t1, 2, 3u has N1, N2 and N3 linearly
independent elements respectively and tk is the time instant that this happens(
8: Select an arbitrary data point to be included in each history stack (c.f. Remarks 2-3)9: i i 1
10: end if11: end procedureRemark 4Note that the algorithm runs in real time in a plug-n-play framework and we do not have anyiterations within the algorithm. The computational complexity is similar to an adaptive controlarchitecture [47] which increase with the number of the states. l
3.3. Stability analysis
The following regularity assumption is needed for the stability analysis presented below.
Assumption 4The process input functions g11p.q, g12p.q, g21p.q and g22p.q are uniformly bounded onΩ, i.e., supx1PΩ g11px1q g11max, supxPΩ g12pxq g12max, supxPΩ g21pxq g21max,supx2PΩ g22px2q g22max. l
To remove the effect of the approximation errors ε1, εu11 , ε2, εu12 , εu21 , ε3, εu22 , (and their partialderivatives) and obtain a closed-loop system with an asymptotically stable equilibrium point, one
14 L. R. GARCIA CARRILLO, K. G. VAMVOUDAKIS, AND J. P. HESPANHA
The following theorem is the main result of the paper and proves asymptotic stability of theequilibrium point of the closed-loop system dynamics (1), (76)-(79). The closed-loop systemsdynamics can be written as,
9x1
9x2
f11px1q εf12pxqεf21pxq f22px2q
g11pxq εg12pxqεg21pxq g22pxq
WTu11φu11
εWTu12φu12
εWTu21φu21 WT
u22φu22
xT1 x1
AxT1 x1
B111m1 ε xTxAxTx
B121m1
ε xTxAxTx
B211m2
xT2 x2
AxT2 x2
B221m2
. (84)
Theorem 2Consider the closed-loop dynamics given by (84) together with the tuning laws for the criticand the actor approximators given by (40)-(42) and (64)-(67), respectively. Suppose that theHJB equations (20)-(22) have a positive definite, smooth solution, the Assumptions 1, 2, 3,and 4 hold, and that tω1pt1q, ω1pt2q, . . . , ω1ptk1qu, tω2pt1q, ω2pt2q, . . . , ω2ptk2qu, andtω3pt1q, ω3pt2q, . . . , ω3ptk3qu have N1, N2, and N3 linearly independent elements respectively.Then, there exists a triple Ωx ΩW ΩWu
Ω, with Ω compact such that the solutionZ :
xptqT W1ptq
T W2ptqT W3ptq
T Wu11ptqT Wu12ptq
T Wu21ptqT Wu22ptq
TT
PpΩx ΩW ΩWuq converges asymptotically to zero for all initial approximator weightspW1p0q, W2p0q, W3p0qq inside ΩW ,
Wu11
p0q, Wu12p0q, Wu21
p0q, Wu22p0q
inside ΩWu
and statexp0q inside Ωx, provided that the following inequalities are satisfied,
1
α1
8α2
1λmin
k1
i1
ω1ptiqω1ptiqT
pω1ptiqTω1ptiq 1q2
1
¡φu11maxλmaxpR11 qg11max∇φ1max φu21maxλmaxpR
12 qg12max∇φ1max, (85)
1
φu11max
2φ2
u11max 1¡λmaxpR
11 qg11max∇φ1max g11max, (86)
1
α2
8α2
2λmin
k2
i1
ω2ptiqω2ptiqT
pω2ptiqTω2ptiq 1q2
1
¡φu12maxλmaxpR11 qg11max∇φ2max φu21maxλmaxpR
12 qg22max∇φ2max, (87)
1
φu12max
2φ2
u12max 1¡λmaxpR
11 qg11max∇φ2max λmaxpR
11 qg21max∇φ3max g22max,
(88)
1
φu21max
2φ2
u21max 1¡λmaxpR
12 qg22max∇φ2max λmaxpR
12 qg12max∇φ1max g21max,
(89)
1
α3
8α2
3λmin
k3
i1
ω3ptiqω3ptiqT
pω3ptiqTω3ptiq 1q2
1
¡φu22maxλmaxpR12 qg22max∇φ3max φu12maxλmaxpR
11 qg21max∇φ3max, (90)
1
φu22max
2φ2
u22max 1¡λmaxpR
12 qg22max∇φ3max g22max. (91)
When the set Ω that appears in the Assumptions 2, 3, and 4 is the whole Rn, then the tripleΩW ΩWu
OPTIMAL ADAPTIVE CONTROL FOR WEAKLY COUPLED NONLINEAR SYSTEMS 15
Proof. See Appendix.
Remark 5For the inequalities (85), (87), and (90) to hold, one needs to pick the tuning gains α1, α2, α3 for thecritic approximator sufficiently large since the left hand side of these inequalities are monotonicallyincreasing to8 on α1, α2 and α3 respectively. But as noted in adaptive control [47], large adaptivegains can cause high frequency oscillations in the control signal and reduced tolerance to timedelays that will destabilize the system. Regarding (86), (88), (89) and (91), since φu11max, φu12max,φu21max, φu22max are simply the upper bounds that appear in Assumption 3, one can select them aslarge as needed since the left hand side of these inequalities are monotonically increasing to 8on φu11max, φu12max, φu21max and φu22max respectively. However, one must keep in mind that largevalues for these upper bounds, require an appropriate large value for the functions B11, B12, B21,B22 in the robustness terms in (76)-(79). It is possible to pick B11, B12, B21, B22 high enough toensure the convergence of the state to an arbitrarily small neighborhood of the equilibrium point.Choosing an increasing or time-varying robustifying term, can lead to asymptotic stability providedthat the inequalities (80)-(83) hold @x. l
Remark 6From the conclusion of Theorem 2, we shall have that
ZÑ 0 which implies x Ñ 0, it isstraightforward that as tÑ8 then from (76)-(79) we have (56)-(59) which are εu11
, εu12, εu21
, εu22
respectively away from the optimal. l
Remark 7In order to get ε small we assume that we have a large number of basis sets, i.e. N1 Ñ8, N2 Ñ8and N3 Ñ8. Moreover, in order to get εu11 , εu12 , εu21 , εu22 small we also assume that we havea large number of basis sets, i.e. N4 Ñ8, N5 Ñ8, N6 Ñ8, N7 Ñ8. But note that this is arequirement for theoretical purposes. We have observed in our numerical and simulation examplesthat picking quadratic basis function can achieve the required result. l
Remark 8In case the approximation holds over the entire space, i.e. Ω Rn, one can conclude globalexistence of solution provided that the HJB solution V is norm coercive (i.e., V Ñ 0 ñ xÑ0), as this suffices to guarantee that the Lyapunov function V that we use in the proof of Theorem 1is also norm coercive (see [28]). l
4. NUMERICAL EXAMPLE
This section presents a sixth-order numerical example to illustrate the effectiveness of the proposedoptimal adaptive control algorithm for weakly coupled nonlinear systems like the one described byequation (1). The state variables are taken as, x1 rx11 x12 x13s
T and x2 rx21 x22 x23sT. The
small perturbation parameter is chosen as, ε 0.1. The matrices of the system under considerationare chosen as
223s, and the actor activation functions are picked in a similar way.
The initial states are chosen as xp0q r3 1 4.3 1.2 1.5 1sT, and the tuning gains wereset to α1 α2 α3 10, and αu11
αu12 αu21
αu22 2.
Figure 1 shows the time evolution of the states in the weakly coupled nonlinear system. Theconvergence of the critic parameters Wc to the optimal cost (12) is shown in Figure 2. The evolutionof the actor parameters Wu is shown in Figure 3. The optimal control inputs, i.e. u1 u11 εu12
and u2 εu21 u22, are shown in Figure 4.
Figure 1. Trajectory of the closed-loop system states.
5. CONCLUSIONS
This paper proposed a new approximate dynamic programming algorithm for controlling weaklycoupled nonlinear systems, which also relaxes the persistence of excitation condition by usingpreviously stored data concurrently with current data. The algorithm is implemented as a three-critic/four-actor approximators structure. To suppress the effects of the three critics and four actorsapproximation errors, robustifying terms have been added to the controllers. We finally proveasymptotic stability of the equilibrium point of the overall closed-loop system. Simulation resultsillustrate the effectiveness of the proposed approach. Future work will be concentrated on extendingthe results in completely unknown systems and multiple decision makers.
OPTIMAL ADAPTIVE CONTROL FOR WEAKLY COUPLED NONLINEAR SYSTEMS 19
Then 9L is negative definite (see Section 4.9 in [28], where one can prove Input to State Stability(ISS) by treating (46)-(48) as dynamical systems with εHj
, j P t1, 2, 3u as input), as long as
Wj ¡
αj
ωjptqωjptqT
pωjptqTωjptq1q2 εHjptq
°kji1
ωjptiqωjptiqT
pωjptiqTωjptiq1q2 εHjptiq
λmin
°kji1
ωjptiqωjptiqT
pωjptiqTωjptiq1q2
,@j P t1, 2, 3u . (99)
Equations (49), (50), (51) follow from this and the fact that ωjptqωjptqT
pωjptqTωjptq1q2 ¡ 0, @t and@j P t1, 2, 3u. Since tωjpt1q, . . . , ωjptkj qu has Nj ,@j P t1, 2, 3u linearly independent vectors, thematrices Λj , @j P t1, 2, 3u are positive definite, from which the exponential stability of the nominalsystem follows.
Proof of Theorem 2
Consider the following Lyapunov function,
V : V Vc Vu, (100)
with
V :V 1 V 2 V 3 ,
Vc :Vc1 Vc2 Vc3 : WT1 W1 WT
2 W2 WT3 W3,
Vu : tracetWTu11Wu11
u tracetWTu12Wu12
u tracetWTu21Wu21
u tracetWTu22Wu22
u,
where V , are the optimal value functions in (12), that is, the positive definite and smooth solutionof (20)-(22). Since V is positive definite, there exist class-K functions γ1p.q and γ2p.q to write,
γ1
Z ¤ V ¤ γ2
Z ,
for all Z xTptq WT
1 ptq WT2 ptq W
T3 ptq WT
u11ptq WT
u12ptq WT
u21ptq WT
u22ptq
T
P Br whereBr Ω is a ball of radius r P R. By taking the time derivative of the first term with respect tothe state trajectories with uptq (see (84)), and the second term with respect to the perturbed criticestimation error dynamics (46), (47), (48), using (49), (50), (51), substituting the update for the
26 L. R. GARCIA CARRILLO, K. G. VAMVOUDAKIS, AND J. P. HESPANHA
Finally, after taking into account the bound of B11x1, B12x, B21x, and B22x2from (80), (81), (82), and (83), respectively, we can upper bound (101) as
9V ¤
2α1λmin
k1
i1
ω1ptiqω1ptiqT
pω1ptiqTω1ptiq 1q2
1
4α1φu11maxλmaxpR
11 qg11max∇φ1max
2
φu21maxλmaxpR
12 qg12max∇φ1max
2
W1
2
φ2u11max
φu11maxλmaxpR11 qg11max∇φ1max
2
1
2g11maxφu11max
2
Wu11
2
1
2
xT
1 Q1x1 uT11 R1u
11
xT
1 Q2x2 1
2
xT
2 Q3x2 uT22R2u22
2α2λmin
k2
i1
ω2ptiqω2ptiqT
pω2ptiqTω2ptiq 1q2
1
4α2φu12maxλmaxpR
11 qg11max∇φ2max
2
φu21maxλmaxpR
12 qg22max∇φ2max
2
W2
2
φ2u12max
φu12maxλmaxpR11 qg11max∇φ2max
2φu12maxλmaxpR
11 qg21max∇φ3max
2
1
2g12max
φu12max
2
Wu12
2
φ2u21max
φu21maxλmaxpR12 qg22max∇φ2max
2φu21maxλmaxpR
12 qg12max∇φ1max
2
1
2g21maxφu21max
2
Wu21
2
2α3λmin
k3
i1
ω3ptiqω3ptiqT
pω3ptiqTω3ptiq 1q2
1
4α3φu22maxλmaxpR
12 qg22max∇φ3max
2
φu12maxλmaxpR
11 qg21max∇φ3max
2
W3
2
φ2u22max
φu22maxλmaxpR12 qg22max∇φ3max
2
1
2g22maxφu22max
2
Wu22
2
, t ¥ 0.
Then by taking into account the inequalities (85)-(91) (which are the parentheses above) one has9V ¤ 0, t ¥ 0. From Barbalat’s lemma [50] it follows that as tÑ8, then Z Ñ 0 The result holdsas long as we can show that the state xptq remains in the set Ω Rn for all times. To this effect,define the following compact set
M x P Rn|Vptq ¤ mq
( Rn
wherem is chosen as the largest constant so thatM Ω. Since by assumption x0 P Ωx, and Ωx Ωthen we can conclude that x0 P Ω. While xptq remains inside Ω, we have seen that 9V ¤ 0 andtherefore xptq must remain inside M Ω. The fact that xptq remains inside a compact set alsoexcludes the possibility of finite escape time and therefore one has global existence of solution.
REFERENCES
1. Lewis F.L., Vrabie D., Syrmos V.L. Optimal Control, John Wiley & Sons., January 2012.2. Kokotovic P., Perkins W., Cruz J.B., D’Ans G. ε-coupling for near optimum design of large scale linear systems,
Inst. Elect. Eng. Proc. Part D, May 1969; 116(5):889-892.3. Gajic Z., Petkovski D., ShenX. Singularly Perturbed and Weakly Coupled Linear Control Systems: A Recursive
OPTIMAL ADAPTIVE CONTROL FOR WEAKLY COUPLED NONLINEAR SYSTEMS 27
4. Gajic Z., Shen X. Parallel Algorithms for Optimal Control of Large Scale Linear Systems, Springer, London, U.K.,1992.
5. Kaszkurewics E., Bhaya A., Silja D. On the convergence of parallel asynchronous block-iterative computations,Linear Algebra and its Applications, 1990; 131:139-160.
6. Zecevic A., Siljak D. A block-parallel Newton method via overlapping epsilon decomposition, SIAM Journal ofMatrix Analysis and Applications, 1994; 15:824-844.
7. Okuguchi K. Matrices with dominant diagonal blocks and economic theory, Journal of Mathematical Economics,1978; 5:43-52.
8. Medanic J., Avramovic B. Solution of load-flow problems in power systems by ε-coupling method, Proceeding of theInstitution of Electrical Engineers, August 1975; 122(8):801-805.
9. Ilic-Spong M., Katz N., Dai H., Zaborsky J. Block diagonal dominance for systems of nonlinear equations withapplications to load flow calculations in power systems, Mathematical Modelling, 1984; 5(5):275-297.
10. Crow M., Ilic M. The parallel implementation of the waveform relaxation method for transient stability simulations,IEEE Transactions on Power Systems, August, 1990; 5:922-932.
11. Phillips R., Kokotovic P. A singular perturbation approach to modeling and control of Markov chains, IEEETransactions on Automatic Control, October 1981; 26:1087-1094.
12. Aldhaheri R., Khalil H. Aggregation method for nearly completely decomposable Markov chains, IEEE Transactionson Automatic Control, February 1991; 36:178187.
13. Stewart W., Introduction to Numerical Solution of Markov Chains, Princeton Univ. Press, Princeton, NJ, 1994.14. Kokotovic P., Singh G., Optimization of Coupled Nonlinear Systems, International Journal of Control, 1971; 14:
51-64.15. Beard R., McLain T. Successive Galerkin approximation algorithms for nonlinear optimal and robust control,
International Journal of Control, 1998; 71(5):717-743.16. Kim Y.J., Kim B.S., Lim M.T. Composite control for singularly perturbed nonlinear systems via successive Galerkin
approximation, IEE Proceedings on Control Theory and Applications, September 2003; 150(5):483-488.17. Kim Y.J., Kim B.S., Lim M.T. Finite-time composite control for a class of singularly perturbed nonlinear systems via
successive Galerkin approximation, IEE Proceedings on Control Theory and Applications, 2005; 152(5):507-512.18. Werbos P.J. Brain-like intelligent control: from neural nets to larger-scale systems, IEEE Conference on Decision
and Control, San Diego, CA, USA, December 1997; 3902-3904.19. Werbos P.J. Intelligence in the brain: A theory of how it works and how to build it Neural Networks, April 2009;
22(3):200-212.20. Murray J.J., Cox C.J., Lendaris G.G., Saeks R. Adaptive dynamic programming, IEEE Transactions on Systems,
Man and Cybernetics Part C: Applications and Reviews, May 2002; 32(2):140-153.21. Bertsekas D.P. Dynamic Programming and Optimal Control, Vol. II, Athena Scientific, 3rd edition, 2007.22. Powell W.B. Approximate Dynamic Programming: Solving the Curses of Dimensionality, Wiley Series in Probability
and Statistics. Wiley, 2007.23. Bhasin S., Kamalapurkar R., Johnson M., Vamvoudakis K., Lewis F., Dixon W. A novel actor critic identifier
architecture for approximate optimal control of uncertain nonlinear systems, Automatica, 2013; 49(1):82-92.24. Lewis F., Liu D. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. IEEE
Press Series on Computational Intelligence, Wiley, 2013.25. Lewis F., Vrabie D., Vamvoudakis K. Reinforcement learning and feedback control: Using natural decision methods
to design optimal adaptive controllers, IEEE Control Systems Magazine, 2012; 32(6):76-105.26. Vamvoudakis K.G., Lewis F.L. Online actor critic algorithm to solve the continuous-time infinite horizon optimal
control problem, Automatica, 2010; 46(5):878-888.27. Wei Q.L., Zhang H.G., Cui L.L. Data-based optimal control for discrete-time zero-sum games of 2-d systems using
adaptive critic designs, Acta Automatica Sinica, 2009; 35(6):682-692.28. Khalil H.K. Nonlinear systems, Macmillan Pub. Co., 1992.29. Sutton R.S., Barto A.G. Reinforcement learning: An introduction, MIT Press, Cambridge, MA, 1998.30. Gajic Z., Shen X. Decoupling transformation for weakly coupled linear systems, International Journal of Control,
1989; 50:1515-1521.31. Gajic Z., Borno I. General Transformation for Block Diagonalization of Weakly Coupled Linear Systems Composed
of N-Subsystems Transactions on Circuits and Systems, Fundamental Theory and Applications, June 2000;47(6):909-912.
32. Gajic Z., Lim M.-T., Skataric D., Su W.-C., Kecman V. Optimal control: weakly coupled systems and applications,CRC Press, 2008.
33. Aganovic Z., Gajic Z. Optimal control of weakly coupled bilinear systems, Automatica, November 1993; 29(6):1591-1593.
34. Aganovic Z., Gajic Z. Linear optimal control of bilinear systems: With applications to singular perturbations andweak coupling, Springer, London, U.K., 1995.
35. Cebuhar W., Costanza V. Approximation procedures for the optimal control fo bilinear and nonlinear systems,Journal of Optimimal Theory and Applications, 1984; 43(4):615-627.
36. Kim Y.J., Lim M.T. Parallel Optimal Control for Weakly Coupled Nonlinear Systems Using Successive GalerkinApproximation, IEEE Transactions on Automatic Control, July 2008; 53(6):1542-1547.
37. Chen Z., Jagannathan S. Generalized Hamilton-Jacobi-Bellman formulation: Based neural network control of affinenonlinear discrete-time systems, IEEE Transactions on Neural Networks, January 2008; 19(1):90-106.
38. Lewis F.L., Vrabie D., Vamvoudakis K.G. Reinforcement learning and feedback control: Using natural decisionmethods to design optimal adaptive controllers, IEEE Control Systems Magazine, December 2012; 32(6):76-105.
39. Chowdhary G., Johnson E. Concurrent learning for convergence in adaptive control without persistency ofexcitation, In IEEE Conference and Decision and Control, Atlanta, GA, December 2010; 3674-3679.
28 L. R. GARCIA CARRILLO, K. G. VAMVOUDAKIS, AND J. P. HESPANHA
41. Heydari A., Balakrishnan S.N. Fixed-final-time optimal control of nonlinear systems with terminal constraints,Neural Networks, 2013; 48:61-71.
42. Modares H., Lewis F.L., Naghibi-Sistani M.B. Integral reinforcement learning and experience replay for adaptiveoptimal control of partially-unknown constrained-input continuous-time systems, Automatica, 2014; 50(1):193-202.
43. M. Bardi, C-D. Italo, Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations, Springer, 2008.44. A. J. Van der Schaft, L2-gain analysis of nonlinear systems and nonlinear state feedback H-8 control, IEEE
Transactions on Automatic Control, 1992; 37(6): 770-784.45. Abu-Khalaf M., Lewis F.L. Nearly optimal control laws for nonlinear systems with saturating actuators using a
neural network approach, Automatica, 2005; 41(5):779-791.46. Hornik K., Stinchcombe M.B., White H. Universal approximation of an unknown mapping and its derivatives using
multilayer feedforward networks, Neural Networks, 1990; 3(5):551-560.47. Ioannou P., Fidan B. Adaptive Control Tutorial, Advances in Design and Control, Society for Industrial and Applied
Mathematics, 2006.48. Chowdhary G., Yucelen T., Mhlegg M., Johnson E.N. Concurrent learning adaptive control of linear systems with
exponentially convergent bounds, International Journal of Adaptive Control and Signal Processing, 2012.49. Vamvoudakis K.G., Lewis F.L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal
control problem, Automatica, 2010; 46(5):878-888.50. Haddad W.M., Chellaboina V.S. Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach, Princeton
University Press, 2008.51. Dierks T., Jagannathan S. Optimal control of affine nonlinear continuous-time systems, in American Control
Conference, 2010; 1568-1573.52. Polycarpou M., Farrell J., Sharma M. On-line approximation control of uncertain nonlinear systems: issues with
control input saturation, in Proc. American Control Conference, 2003; 543-548.