-
Research ArticleDynamic Programming and
Hamilton–Jacobi–BellmanEquations on Time Scales
Yingjun Zhu and Guangyan Jia
Zhongtai Securities Institute for Financial Studies, Shandong
University, Jinan 250100, China
Correspondence should be addressed to Guangyan Jia;
[email protected]
Received 7 June 2020; Revised 16 October 2020; Accepted 24
October 2020; Published 19 November 2020
Academic Editor: Guang Li
Copyright © 2020 Yingjun Zhu and Guangyan Jia. *is is an open
access article distributed under the Creative CommonsAttribution
License, which permits unrestricted use, distribution, and
reproduction in anymedium, provided the original work isproperly
cited.
Bellman optimality principle for the stochastic dynamic system
on time scales is derived, which includes the continuous time
anddiscrete time as special cases. At the same time, the
Hamilton–Jacobi–Bellman (HJB) equation on time scales is obtained.
Finally,an example is employed to illustrate our main results.
1. Introduction
*e stochastic control problem is to find an optimal controlsuch
that a cost functional associated with a stochasticsystem reaches
the minimum value. *e method of dynamicprogramming is a powerful
approach to solving the sto-chastic optimal control problems. *e
dynamic program-ming is a well-established subject [1–4] to deal
withcontinuous and discrete optimal control problems,
respec-tively, and it has great practical applications in various
fields[5, 6]. It is generally assumed that the time is continuous
ordiscrete in the dynamic systems. However, this cannot
beguaranteed. In reality, the time scale could be neithercontinuous
nor discrete.*ere are many processes which arethe mixture of
continuous time and discrete time, non-uniform discrete time, or
the union of disjoint time interval,such as the production and
storage process in economics, theinvestment process in finance, and
the population for theseasonal insects. When the time is more
complex, it makesthe control problem more difficult. How to deal
with thisproblem?
Time scales were first introduced by Hilger [7] in 1988 inorder
to unite differential and difference equations into ageneral
framework. *is allows us to deal with the con-tinuous and discrete
analyses from the common point ofview. Recently, time scale theory
is extensively studied in
many works [8–13]. It is well known that the optimal
controlproblems on time scales are an important field for
boththeory and applications. Since the calculus of variations
ontime scales was studied by Bohner [14], the results on op-timal
control problems in the time scale setting and theirapplications
have been rapidly growing. *e existence ofoptimal control for the
dynamic systems on time scales wasdiscussed [15–17]. Subsequently,
Pontryagin maximumprinciple on time scales was studied in several
works [18, 19],which specifies the necessary conditions for
optimality.
*e dynamic programming for dynamic systems on timescales is not
a simple task to unite the continuous time anddiscrete time cases
because the time scales contain morecomplex time cases. Seiffertt
et al. [20] studied the ap-proximate dynamic programming for the
dynamic system inthe isolated time scale setting. In addition,
Bellman dynamicprogramming on general time scales for the
deterministicoptimal control problems was considered in [21,
22].However, limited work [23, 24] has been done on the
linearquadratic stochastic optimal control problem in the
timescales setting. *at is to say, the general setting of
stochasticoptimal control problems on time scales is completely
open.
Motivated by all these significant works, the purpose ofthis
paper is to study the method of dynamic programmingfor the
stochastic optimal control problems on time scales.As we know, the
stochastic dynamic programming principle
HindawiComplexityVolume 2020, Article ID 7683082, 11
pageshttps://doi.org/10.1155/2020/7683082
mailto:[email protected]://orcid.org/0000-0003-1121-2669https://creativecommons.org/licenses/by/4.0/https://creativecommons.org/licenses/by/4.0/https://doi.org/10.1155/2020/7683082
-
is different from the deterministic systems, which reflects
thestochastic nature of the optimal problem. So, the method inthe
deterministic case on time scales cannot be applied to
thestochastic case directly. In order to overcome this
difficulty,we first give a new form of the chain rule on time
scales.Based on this idea, we obtain Ito’s formula for the
stochasticprocess on time scales. Second, we consider a family
ofoptimal control problems with different initial times andstates
to establish the Bellman optimality principle in thetime scale
setting.*ird, using Ito’s formula and the Bellmanoptimality
principle obtained in the time scales, we also getthe associated
Hamilton–Jacobi–Bellman (HJB for short)equation on time scales
which is a nonlinear second-orderpartial differential equation
involving expectation. If the HJBequation is solvable, then we can
get an optimal feedbackcontrol. Our work will enrich the dynamic
programmingproblem by providing a more general time framework
andmake dynamic programming theory to be a powerful tool intackling
the optimal control problem on complexity time.
*e organization of this paper is as follows. In Section 2,we
show some preliminaries about time scale theory. Section3 focuses
on the Bellman optimality principle and the HJBequations on time
scales. By introducing a new symbol, wepresent Ito’s formula in a
new form. On this basis, we get themain results. Finally, an
illustrative example is given to showthe effectiveness of the
proposed results.
2. Preliminaries
A time scale T is a nonempty closed subset in real number setR,
and we denote [0, T]T � [0, T]∩ T . In this paper, wealways suppose
T is bounded. *e forward jump operator σand backward jump operator
ρ are, respectively, defined by
σ(t) � inf s ∈ T : s> t{ },
ρ(t) � sup s ∈ T : s< t{ },(1)
supplemented by inf∅ ≔ supT and sup∅ ≔ infT , where ∅denotes the
empty set. If σ(t) � t and t< supT , the point t iscalled
right-dense, while if σ(t)> t, the point t is called
right-scattered. Similarly, if ρ(t) � t and t> infT , the point
t iscalled left-dense, while if ρ(t)< t, the point t is called
left-scattered. Moreover, a point is called isolated if it is both
left-scattered and right-scattered. For a function f, we
denotefσ(t) � f(σ(t)) to represent the compositions of thefunctions
f and σ. Similarly, we denote fρ(t) � f(ρ(t)).*e definition of the
graininess function μ is as follows:
μ(t) � σ(t) − t. (2)
We now present some basic concepts and propertiesabout time
scales (see [10, 11]).
Definition 1 (see [10]). Let f be a function on T . f is called
aright-dense continuous function if f is continuous at
everyright-dense point and has finite left-sided limits at every
left-dense point. Similarly, f is called a left-dense
continuousfunction if f is continuous at every left-dense point and
hasfinite right-sided limits at every right-dense point. If f
is
right-dense continuous and also is left-dense continuous,then f
is called a continuous function.
Define the set Tκ as
Tκ
�T\(ρ(supT), supT] if supT 0, there exists a neighborhood U of t
suchthat
f(σ(t)) − f(s) − fΔ(t)(σ(t) − s)
≤ ε|σ(t) − s|, for all s ∈ U,
(4)
we call fΔ(t) the Δ-derivative of f at t.We denote by C1([0, T]T
;R) the space of R-valued
continuouslyΔ-differentiable functions on [0, T]T and denoteby
C1,2(T × Rn;R) the family of all functions f(t, x) definedon T × Rn
such that they are continuously Δ-differentiable int and twice
continuously differentiable in x.
Furthermore, we give the derivation rule of the com-pound
function.
Lemma 1 (see [25]). Let g: T⟶ R be Δ-differentiable andf: R⟶ R
be continuously differentiable. ,en f(g(t)) isΔ-differentiable,
and
fΔ(g(t)) � g
Δ(t)
1
0fg′ g(t) + hμ(t)g
Δ(t) dh. (5)
In this paper, we adopt the stochastic integral defined byBohner
et al. [26]. Let (Ω,F, Ft t∈[0,T]T , P) be a completeprobability
space with increasing and continuous filtrationFt t∈[0,T]T . We
define that L
2F([0, T]T ;R) is the set of all
Ft-adapted, R-valued measurable processes X(t) such thatE[
T
0 |X(t)|2Δt]
-
We also have the following properties. Let X(t), Y(t) ∈ L2F([0,
T]T ;R) and α, β ∈ R. *en
(i) T
0(αX(t) + βY(t))ΔW(t) � α
T
0X(t)ΔW(t) + β
T
0Y(t)ΔW(t),
(ii) E T
0X(t)ΔW(t) � 0,
(iii) E T
0X(t)ΔW(t)
2
� E T
0|X(t)|
2Δ〈W〉t � E T
0|X(t)|
2Δt ,
(8)
where the integral X with respect to the quadratic variationof
Brownian motion 〈W〉t is defined by Stieltjes integral as
T
0 Xt(ω)Δ〈W〉t(ω).Let X be an n-dimensional stochastic process
defined by
ΔX(s) � a(s, X(s))Δs + b(s, X(s))ΔW(s), (9)
and we have the following Ito’s formula.
Lemma 2 (see [28]). Let f ∈ C1,2(T × Rn;R) and X satisfy(9),
then the following relation holds:
f(t, X(t)) � f(0, X(0)) + t
0fΔs (s, X(s))Δs +
t
0
zf
zx(s, X(s))
T
a(s, X(s))Δs
+ t
0
zf
zx(s, X(s))
T
b(s, X(s))ΔW(s) +12
t
0(b(s, X(s)))
T z2f
zx zx(s, X(s))b(s, X(s))Δs
+ s∈[0,t]T
f σ(s), Xσ(s)( − f(σ(s), X(s))( − s∈[0,t]T
zf
zx(s, X(s))
T
a(s, X(s))(σ(s) − s)
− s∈[0,t]T
zf
zx(s, X(s))
T
b(s, X(s)) Wσ(s) − W(s)( −
12(b(s, X(s)))
T z2f
zx zx(s, X(s))b(s, X(s)).
(10)
3. Problem Statement and Main Results
Let Ω,F, Ft t∈T, P be a given filtered probability
spacesatisfying the usual condition. Consider the stochasticcontrol
system
ΔX(s) � a(s, X(s), u(s))Δs + b(s, X(s), u(s))ΔW(s), s ∈ [0,
ρ(T)]T ,
X(0) � x0 ∈ Rn.
(11)
*e control u(·) belongs to
U[0, T]T � u: [0, T]T ×Ω⟶ U|u ismeasurable and Ft t∈T − adapted
, (12)
Complexity 3
-
where U is a convex subset of Rm. And the functionsa(t, x, u)
and b(t, x, u) satisfy the Lipschitz condition andlinear growth
condition in x. Obviously, equation (11)admits a unique solution
(see Bohner et al. [26]). *e costfunctional associated with (11)
is
J(u(·)) � E T
0r(s, X(s), u(s))Δs + h(X(T)) , (13)
where the maps r: [0, T]T × Rn × U⟶ R and h: Rn⟶ Rare
continuous.
*e optimal control problem is to find u∗(·) ∈ U[0, T]Tsuch
that
J u∗(·)( � inf
u(·)∈U[0,T]TJ(u(·)). (14)
u∗(·) is called the stochastic optimal control of theproblem,
and the corresponding X(·; x0, u∗(·)) is called anoptimal state
process.
Now, we consider a family of optimal control problemswith
different initial times and states. Let(t, x) ∈ [0, T]T × Rn,
consider the state equation
ΔX(s) � a(s, X(s), u(s))Δs + b(s, X(s), u(s))ΔW(s), s ∈ [t, T]T
,
X(t) � x, (15)
along with the cost functional
J(t, x; u(·)) � E T
tr(s, X(s), u(s))Δs + h(X(T)) .
(16)
For any (t, x) ∈ [0, T]T × Rn, minimize (16) subject to(15)
overU[t, T]T . *e value function of the optimal controlproblem is
defined as
V(t, x) � infu(·)∈U[t,T]T
J(t, x; u(·)). (17)
We first introduce a symbol which is useful in the sequel.Let g:
T⟶ R be Δ-differentiable and f: R⟶ R becontinuously differentiable.
For any t ∈ Tκ, define fΔg(g(t))as follows:
fgΔ(g(t)) �
lims⟶t
f(g(t)) − f(g(s))
g(t) − g(s), t is right − dense andgΔ ≠ 0,
f gσ(t)( − f(g(t))
gσ(t) − g(s)
, t is right − scattered andgΔ ≠ 0,
0, gΔ � 0.
⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩
(18)
Remark 1. Note that fΔg(g(t)) depends not only on thefunctions f
and g but also on the time scales T . If t is a right-dense point
on time scales T , thenfΔg(g(t)) � f
Δg(g(t)) � fg′(g(t)). On the contrary, if t is a
right-scattered point and gσ(t) � σ(g(t)), we havefΔg(g(t)) �
f
Δg(g(t)).
With the help of the new symbol, we have the followinglemma.
Lemma 3. Let f: R × R⟶ R be continuously differen-tiable and x:
T⟶ R and y: T⟶ R be Δ-differentiable,then
fΔ(x(t), y(t)) � f
Δx x(t), y
σ(t)( xΔ
(t) + fΔy(x(t), y(t))y
Δ(t), t ∈ Tκ. (19)
4 Complexity
-
Proof. If xΔ(t) � 0 or yΔ(t) � 0, it is easy to verify that
(19)is true. We only give proofs under conditions xΔ(t)≠ 0
andyΔ(t)≠ 0. If t is right-dense, one has
fΔ
(x(t), y(t)) � lims⟶t
f(x(t), y(t)) − f(x(s), y(s))
t − s
� lims⟶t
f(x(t), y(t)) − f(x(t), y(s)) + f(x(s), y(t)) − f(x(s),
y(s))
t − s
� lims⟶t
f(x(t), y(t)) − f(x(s), y(t))
x(t) − x(s)×
x(t) − x(s)
t − s+
f(x(s), y(t)) − f(x(s), y(s))
y(t) − y(s)×
y(t) − y(s)
t − s
� fΔx (x(t), y(t))x
Δ(t) + f
Δy(x(t), y(t))y
Δ(t)
� fΔx x(t), y
σ(t)( xΔ(t) + f
Δy(x(t), y(t))y
Δ(t).
(20)
When t is right-scattered, then
fΔ(x(t), y(t)) �
f xσ(t), y
σ(t)( − f(x(t), y(t))
σ(t) − t
�f x
σ(t), y
σ(t)( − f x(t), y
σ(t)( + f x(t), y
σ(t)( − f(x(t), y(t))
σ(t) − t
�f x
σ(t), y
σ(t)( − f x(t), y
σ(t)(
xσ(t) − x(t)
×xσ(t) − x(t)
σ(t) − t+
f x(t), yσ(t)( − f(x(t), y(t))
yσ(t) − y(t)
×yσ(t) − y(t)
σ(t) − t
� fΔx x(t), y
σ(t)( xΔ
(t) + fΔy(x(t), y(t))y
Δ(t).
(21)
*is completes the proof. □
Remark 2. Similarly, another form can be expressed as
fΔ(x(t), y(t)) � f
Δx (x(t), y(t))x
Δ(t) + f
Δy x
σ(t), y(t)( y
Δ(t).
(22)
Remark 3. In particular, let g: T⟶ R be Δ-differentiableand f:
R⟶ R be continuously differentiable. *enfΔ(g(t)) � fΔg(g(t))g
Δ(t). It is easy to see that this equalityis equivalent to
(5).
Remark 4. It is not hard for us to get the following result
ofmultidimensions:
FΔ(x(t), y(t)) � F
Δx x(t), y
σ(t)(
T
xΔ(t) + F
Δy(x(t), y(t))
T
yΔ
(t), (23)
where F: Rn × Rm⟶ R is continuously differentiable andx: T⟶ Rn
and y: T⟶ Rm are Δ-differentiable.
Next, we show Ito’s formula in a new form on timescales.
Complexity 5
-
Proposition 1. LetX satisfy (9) andf ∈ C1,2(T × Rn;R),
wehave
f(t, X(t)) � f(0, X(0)) + t
0fΔs (s, X(s))Δs +
t
0fΔx (σ(s), X(s))
T
a(s, X(s))s
+ t
0fΔx(σ(s), X(s))
T
b(s, X(s))ΔW(s) +12
t
0ID(s)(b(s, X(s)))
TfΔ2xx(σ(s), X(s))b(s, X(s))Δs,
(24)
where I(·) is an indicative function and D is the set of
allright-dense points.
Proof. Because of Lemma 2, it is enough to show that
t
0fΔs (s, X(s))Δs +
t
0
zf
zx(s, X(s))
T
a(s, X(s))Δs
+ t
0
zf
zx(s, X(s))
T
b(s, X(s))ΔW(s) +12
t
0(b(s, X(s)))
T z2f
zx zx(s, X(s))b(s, X(s))Δs
+ s∈[0,t]T
f σ(s), Xσ(s)( − f(σ(s), X(s))( − s∈[0,t]T
zf
zx(s, X(s))
T
a(s, X(s))(σ(s) − s)
− s∈[0,t]T
zf
zx(s, X(s))
T
b(s, X(s)) Wσ(s) − W(s)( −
12(b(s, X(s)))
T z2f
zx zx(s, X(s))b(s, X(s))
� t
0fΔs (s, X(s))Δs +
t
0fΔx(σ(s), X(s))
T
a(s, X(s))Δs
+ t
0fΔx(σ(s), X(s))
T
b(s, X(s))ΔW(s) +12
t
0ID(s)(b(s, X(s)))
TfΔ2xx(σ(s), X(s))b(s, X(s))Δs.
(25)
By some manipulation, namely,
s∈[0,t]T
f σ(s), Xσ(s)( − f(σ(s), X(s))(
� s∈[0,t]T
fΔx (σ(s), X(s))
T
a(s, X(s))(σ(s) − s) + s∈[0,t]T
fΔx (σ(s), X(s))
T
b(s, X(s)) Wσ(s) − W(s)( ,
(26)
it is straightforward to show that the above equation is
true.Now, we state the Bellman optimality principle on time
scales. □
Theorem 1 (optimality principle). Let (t, x) ∈ [0, T]T × Rn.,en
for any t ∈ [t, T]T , we have
V(t, x) � infu(·)∈U[t,t]T
E t
tr(s, X(s), u(s))Δs + V(t, X(t))⎡⎢⎣ ⎤⎥⎦. (27)
6 Complexity
-
Proof. For any ε> 0, there exists a control u(·) ∈ U[t,
T]Tsuch that
V(t, x) + ε≥ J(t, x; u(·))
� E T
tr(s, X(s), u(s))Δs + h(X(T))
� E t
tr(s, X(s), u(s))Δs + E
T
tr(s, X(s), u(s))Δs + h(X(T))|Ftt
⎡⎢⎣ ⎤⎥⎦
� E t
tr(s, X(s), u(s))Δs + J(t, X(t); u(·))⎡⎢⎣ ⎤⎥⎦
≥E t
tr(s, X(s), u(s))s + V(t, X(t))⎡⎢⎣ ⎤⎥⎦
≥ infu(·)∈U[t,t]T
E t
tr(s, X(s), u(s))Δs + V(t, X(t))⎡⎢⎣ ⎤⎥⎦,
(28)
where Ftt is the sigma field generated byW(r) − W(t): r ∈ [t,t]T
.
On the contrary, by the definition of value function (17),we
obtain
V(t, x)≤ J(t, x; u(·)) � E t
tr(s, X(s), u(s))Δs + J(t, X(t); u(·))⎡⎢⎣ ⎤⎥⎦. (29)
*us, taking infimum over u(·) ∈ U[t, T]T , one has
V(t, x)≤E t
tr(s, X(s), u(s))Δs + V(t, X(t))⎡⎢⎣ ⎤⎥⎦. (30)
It follows that
V(t, x)≤ infu(·)∈U[t,t]T
E t
tr(s, X(s), u(s))Δs + V(t, X(t))⎡⎢⎣ ⎤⎥⎦.
(31)
Combining with (28) and (31), we get the result.Furthermore, we
give the HJB equation on time scales
which is similar to continuous and discrete cases. □
Theorem 2. Let the value function V ∈ C1,2([0, T]T × Rn),then
the value function V satisfies the following HJB equation:
VΔt (t, x) + inf
u∈UE
xt H t, x, u, Vx
Δ(σ(t), X(t)), VΔ
2
xx(σ(t), x))] � 0, (t, x) ∈ [0, ρ(T)]T × Rn, V(T, x) � h(x),
(32)
where
H(t, x, u, p, P) � r(t, x, u) + pTa(t, x, u) + p
Tb(t, x, u)W
Δ(t) +
12ID(t)(b(t, x, u))
TPb(t, x, u), (33)
and we use the notation
Complexity 7
-
Ext fxΔ
(X(t)) � EfxΔ
(X(t))|X(t) � x, WΔ(t) �
Wσ(t) − W(t)
μ(t), t is right − scattered,
0, t is right − dense.
⎧⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎩
(34)
Proof. According to the definition of the value function,V(T, x)
� h(x) is satisfied. Fix u ∈ U, and let X(·) be thestate process
corresponding to the control u(·) ≡ u. By the
optimality principle and Ito’s formula, for any t> t andt,t ∈
[0, T]T , we have
0≤E[V(t, X(t)) − V(t, x)]
t − t+
1t − t
E t
tr(s, X(s), u)Δs⎡⎢⎣ ⎤⎥⎦
�1
t − tE
t
tVΔs (s, X(s)) + H s, X(s), u, V
Δx(σ(s), tXn(s)) , V
Δ2xx(σ(s), X(s)) Δs⎡⎢⎣ ⎤⎥⎦.
(35)
If t is right-dense, then let t⟶ t, while if t is
right-scattered, then let t � σ(t). *is leads to
0≤VΔt (t, x) + Ext H t, x, u, V
Δx (σ(t), tXn(t)), tV
Δ2xxn(σ(t), x) . (36)
It follows that
0≤VΔt (t, x) + infu∈U
Ext H t, x, u, V
Δx (σ(t), tXn(t)), tV
Δ2xxn(σ(t), x) . (37)
Conversely, for any ε> 0,t> t, and t,t ∈ [0, T]T , we
canfind a control u(·) such that
ε≥E[V(t, X(t)) − V(t, x)]
t − t+
1t − t
E t
tr(s, X(s), u(s))Δs⎡⎢⎣ ⎤⎥⎦
�1
t − tE
t
tVΔs (s, X(s)) + H s, X(s), u(s), V
Δx (σ(s), tXn(s)), tV
Δ2xxn(σ(s), X(s)) Δs⎡⎢⎣ ⎤⎥⎦
≥1
t − tE
t
tVΔs (s, X(s)) + inf
u∈UH s, X(s), u, V
Δx (σ(s), tXn(s)), tV
Δ2
xxn(σ(s), X(s)) Δs⎡⎢⎣ ⎤⎥⎦.
(38)
By the same argument as that used in the above proof, weget
ε≥VΔt (t, x) + infu∈U
Ext H t, x, u, V
Δx (σ(t), tXn(t)), tV
Δ2xxn(σ(t), x) . (39)
8 Complexity
-
*erefore, (32) follows. *e proof is completed. □
Remark 5. It is not surprising that HJB equation (32) ontime
scales is very similar to the classical HJB equation incontinuous
and discrete time (see [29, 30]). An intriguing
feature of the HJB equation on time scales is that the
ex-pectation is involved. When T � R+ or T � Z+, we are ableto
reduce equation (32) into the classical ones.
Remark 6. Suppose b ≡ 0, then HJB equation (32) becomes
VΔt (t, x) + inf
u∈UVΔx (σ(t), X(t))
T
a(t, x, u) + r(t, x, u) X(t)�x
� 0. (40)
In such a case, it is equivalent to the result in [22].
Remark 7. In particular, in Remark 6, if we further letXσ(t) �
σ(X(t)), HJB equation (32) degenerates into
VΔt (t, x) + inf
u∈UVΔx(σ(t), x)
Ta(t, x(t)u) + r(t, x, u) � 0,
(41)
which is just the one given by Seiffertt et al. [20].From the
above, we end up with the following verifi-
cation theorem.
Theorem 3 (verification theorem). Let F ∈ C1,2([0, T]T ×Rn;R) be
the solution of HJB equation (32), and there exists afunction ϕ(t,
X(t)) such that
E H t, X(t), ϕ(t, X(t)), tFΔxn(σ(t), X(t))q, hFΔ2xx(σ(t), X(t))
� inf
u∈UE H t, X(t), u, F
Δx(σ(t), tXn(t)), tF
Δ2xxn(σ(t), X(t)) .
(42)
,en F(t, x) � V(t, x), and u(t) � ϕ(t, X(t)) is an op-timal
control.
Proof. Let u(t) � ϕ(t, X(t)), we have
E[h(X(T))] − F(t, x)
� E T
tFΔs (s, X(s)) + Fx
Δ(σ(s), X(s))
T
a(s, X(s), u(s)) + FxΔ(σ(s), X(s))
T
b(s, X(s), u(s))WΔ
(s)
+12ID(s)(b(s, X(s), u(s)))
TFΔ2xx(σ(s), X(s))b(s, X(s), u(s))Δs
� −E T
tr(s, X(s), u(s))Δs .
(43)
*is yields
F(t, x) � E T
tr(s, X(s), u(s))Δs + h(X(T)) ≥V(t, x).
(44)
In addition, for any admissible pair (X(·), v(·)), we have
E[h(X(T))] − F(t, x)
� E T
tFΔs (s, X(s)) + Fx
Δ(σ(s), X(s))
T
a(s, X(s), v(s)) + FΔx (σ(s), X(s))
Tb(s, X(s), u(s))W
Δ(s)
Complexity 9
-
+12ID(s)(b(s, X(s), v(s)))
TFΔ2
xx(σ(s), X(s))b(s, X(s), v(s))Δs
� E T
tFΔs (s, X(s)) + H s, X(s), u, Fx
Δ(σ(s), X(s)), FΔ
2
xx(σ(s), X(s)) − r(s, X(s), v(s)) Δs
≥E T
tFΔs (s, X(s)) + inf
u∈UH s, X(s), u, Fx
Δ(σ(s), X(s)), FΔ
2
xx(σ(s), X(s)) − r(s, X(s), v(s)) Δs
� −E T
tr(s, X(s), v(s))Δs .
(45)
Namely,
F(t, x)≤E T
tr(s, X(s), v(s))Δs + h(X(T)) . (46)
By arbitrary for (X(·), v(·)), it follows that
F(t, x)≤V(t, x). (47)
Hence, by (44) and (47), one has
F(t, x) � V(t, x). (48)
Finally, inequality (44) together with (48) proves theoptimality
of u(·). □
4. Example
*e dynamic programming on time scales contains not
onlycontinuous and discrete cases but also other more generalcases.
In order to illustrate our result, we give an example.Consider the
quantum time scale T � t � qk: k ∈ N , q> 1.*e state equation on
time scales T is as follows:
ΔX(s) � X(s)u(s)Δs + u(s)ΔW(s), s ∈ T ,
X(1) � x0. (49)
To find the sequence of optimal control policyu(qk) , k � 0, 1,
. . . , N, such that
minimize J(u(·)) � E qN
1X
2(s) + u
2(s) Δs . (50)
In this example, we have μ(t) � (q − 1)t. By *eorem 2,the value
function V satisfies the following equation:
V(t, x)
(q − 1)� inf
u∈UE
xt x
2+ u
2+
V σ(t), Xσ(t)( (q − 1)t
, t ∈ T ,
V qN
, x � 0.
⎧⎪⎪⎪⎪⎪⎨
⎪⎪⎪⎪⎪⎩
(51)
We can see that the graininess function μ affected thesolution
of HJB equation (51).
E X2(t) + ϕ2(t, X(t)) +
V σ(t), Xσ(t)( (q − 1)t
� infu∈U
E X2(t) + u
2+
V σ(t), Xσ(t)( (q − 1)t
, t ∈ T (52)
Besides, by applying *eorem 3, we can find the optimalcontrol
u(t) � ϕ(t, X(t)) through the above equation.Furthermore, it
implies that the optimal strategy also de-pends on the structure of
time scales.
5. Conclusions
In this paper, we developed the dynamic programmingprinciple for
the stochastic system on time scales, for which wepresented Ito’s
formula on time scales in a new form by in-troducing a new symbol.
Similar to the classical cases, weconstructed the HJB equation on
time scales and proved theverification theorem.*e results of this
paper aremore general.*e continuous and discrete analogues of the
dynamic pro-gramming are special cases of this paper. An example
has beengiven to demonstrate the effectiveness of our results.
Data Availability
*e data used to support the findings of this study are in-cluded
within the article.
Conflicts of Interest
*e authors declare that there are no conflicts of
interestregarding the publication of this paper.
Acknowledgments
*is work was supported by the National Key R&D Programof
China (Grant no. 2018YFA0703900) and the MajorProject of National
Social Science Foundation of China(Grant no. 19ZDA091).
10 Complexity
-
References
[1] R. Bellman, “Dynamic programming,” Science, vol. 153,no.
3731, pp. 34–37, 1966.
[2] D. Blackwell, “Discrete dynamic programming,” ,e Annalsof
Mathematical Statistics, vol. 33, no. 2, pp. 719–726, 1962.
[3] E. Bandini, A. Cosso, M. Fuhrman, and H. Pham, “Ran-domized
filtering and bellman equation in wasserstein spacefor partial
observation control problem,” Stochastic Processesand ,eir
Applications, vol. 129, no. 2, pp. 674–711, 2019.
[4] H. Pham and X. Wei, “Dynamic programming for optimalcontrol
of stochastic McKean—Vlasov dynamics,” SiamJournal on Control and
Optimization, vol. 55, no. 2,pp. 1069–1101, 2017.
[5] C. Mu, Y. Zhang, H. Jia, and H. He,
“Energy-storage-basedintelligent frequency control of microgrid
with stochasticmodel uncertainties,” IEEE Transactions on Smart
Grid,vol. 11, no. 2, pp. 1748–1758, 2020.
[6] C. Mu and Y. Zhang, “Learning-based robust tracking
controlof quadrotor with time-varying and coupling
uncertainties,”IEEE Transactions on Neural Networks and Learning
Systems,vol. 31, no. 1, pp. 259–273, 2020.
[7] S. Hilger, “Analysis on measure chains—a unified approach
tocontinuous and discrete calculus,” Results in Mathematics,vol.
18, no. 1-2, pp. 18–56, 1990.
[8] G. S. Guseinov, “Integration on time scales,” Journal
ofMathematical Analysis and Applications, vol. 285, no. 1,pp.
107–127, 2003.
[9] M. Bohner and G. S. Guseinov, “Partial differentiation on
timescales,” Dynamic Systems and Applications, vol. 13, no. 3-4,pp.
351–379, 2004.
[10] M. Bohner and A. Peterson, Dynamic Equations on TimeScales:
An Introduction with Applications, Birkhäuser Boston,Boston, MA,
USA, 2001.
[11] M. Bohner and A. Peterson, Advances in Dynamic Equationson
Time Scales, Birkhäuser Boston, Boston, MA, USA, 2002.
[12] N. H. Du and N. T. Dieu, “*e first attempt on the
stochasticcalculus on time scale,” Stochastic Analysis and
Applications,vol. 29, no. 6, pp. 1057–1080, 2011.
[13] D. Grow and S. Sanyal, “Brownian motion indexed by a
timescale,” Stochastic Analysis and Applications, vol. 29, no.
3,pp. 457–472, 2011.
[14] M. Bohner, “Calculus of variations on time scales,”
DynamicSystems and Applications, vol. 13, no. 3-4, pp. 339–349,
2004.
[15] Z. Zhan and W. Wei, “On existence of optimal
controlgoverned by a class of the first-order linear dynamic
systemson time scales,” Applied Mathematics and Computation,vol.
215, no. 6, pp. 2070–2081, 2009.
[16] Y. Gong and X. Xiang, “A class of optimal control problems
ofsystems governed by the first order linear dynamic equationson
time scales,” Journal of Industrial and Management Op-timization,
vol. 5, no. 1, pp. 1–10, 2009.
[17] Y. Peng, X. Xiang, and Y. Jiang, “Nonlinear dynamic
systemsand optimal control problems on time scales,” ESAIM:Control,
Optimisation and Calculus of Variations, vol. 17,no. 3, pp.
654–681, 2011.
[18] Z. Zhan, S. Chen, andW.Wei, “A unified theory of
maximumprinciple for continuous and discrete time optimal
controlproblems,” Mathematical Control and Related Fields, vol.
2,no. 2, pp. 195–215, 2012.
[19] M. Bohner, K. Kenzhebaev, O. Lavrova, and O.
Stanzhytskyi,“Pontryagin’s maximum principle for dynamic systems
ontime scales,” Journal of Difference Equations and
Applications,vol. 23, no. 7, pp. 1161–1189, 2017.
[20] J. Seiffertt, S. Sanyal, and D. C. Wunsch,
“Hamilton-Jacobi-Bellman equations and approximate dynamic
programmingon time scales,” IEEE Transactions on Systems, Man,
andCybernetics, Part B (Cybernetics), vol. 38, no. 4, pp.
918–923,2008.
[21] Z. Zhan, W. Wei, and H. Xu,
“Hamilton–Jacobi–Bellmanequations on time scales,” Mathematical and
ComputerModelling, vol. 49, no. 9-10, pp. 2019–2028, 2009.
[22] R. Š. Hilscher and V. Zeidan, “Hamilton–Jacobi theory
overtime scales and applications to linear-quadratic
problems,”Nonlinear Analysis:,eory, Methods and Applications, vol.
75,no. 2, pp. 932–950, 2012.
[23] Y. Zhu and G. Jia, “Stochastic linear quadratic
controlproblem on time scales,” submitted.
[24] Y. Zhu and G. Jia, “Linear feedback of mean-field
stochasticlinear quadratic optimal control problems on time
scales,”Mathematical Problems in Engineering, vol. 2020, Article
ID8051918, 11 pages, 2020.
[25] C. Pötzsche, “Chain rule and invariance principle on
measurechains,” Journal of Computational and Applied
Mathematics,vol. 141, no. 1-2, pp. 249–254, 2002.
[26] M. Bohner, O. M. Stanzhytskyi, and A. O.
Bratochkina,“Stochastic dynamic equations on general time
scales,”Electronic Journal of Differential Equations, vol. 2013,
no. 57,pp. 1–15, 2013.
[27] D. Grow and S. Sanyal, “*e quadratic variation of
Brownianmotion on a time scale,” Statistics and Probability
Letters,vol. 82, no. 9, pp. 1677–1680, 2012.
[28] W. Hu, “Itô’s formula, the stochastic exponential, and
changeof measure on general time scales,” Abstract and
AppliedAnalysis, vol. 2017, Article ID 9140138, 13 pages, 2017.
[29] M. Bardi and I. Capuzzo-Dolcetta, Optimal Control
andViscosity Solutions of Hamilton-Jacobi-Bellman
Equations,Springer Science & Business Media, Berlin, Germany,
2008.
[30] L. Grüne, “Error estimation and adaptive discretization
forthe discrete stochastic Hamilton–Jacobi–Bellman
equation,”Numerische Mathematik, vol. 99, no. 1, pp. 85–112,
2004.
Complexity 11