DynamicProgrammingandHamilton–Jacobi–Bellman … · 2020. 6. 7. · isdiﬀerentfromthedeterministicsystems,whichreﬂectsthe stochasticnatureoftheoptimalproblem.So,themethodin

Research ArticleDynamic Programming and Hamilton–Jacobi–BellmanEquations on Time Scales

Yingjun Zhu and Guangyan Jia

Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan 250100, China

Correspondence should be addressed to Guangyan Jia; [email protected]

Received 7 June 2020; Revised 16 October 2020; Accepted 24 October 2020; Published 19 November 2020

Academic Editor: Guang Li

Copyright © 2020 Yingjun Zhu and Guangyan Jia. *is is an open access article distributed under the Creative CommonsAttribution License, which permits unrestricted use, distribution, and reproduction in anymedium, provided the original work isproperly cited.

Bellman optimality principle for the stochastic dynamic system on time scales is derived, which includes the continuous time anddiscrete time as special cases. At the same time, the Hamilton–Jacobi–Bellman (HJB) equation on time scales is obtained. Finally,an example is employed to illustrate our main results.

1. Introduction

*e stochastic control problem is to find an optimal controlsuch that a cost functional associated with a stochasticsystem reaches the minimum value. *e method of dynamicprogramming is a powerful approach to solving the sto-chastic optimal control problems. *e dynamic program-ming is a well-established subject [1–4] to deal withcontinuous and discrete optimal control problems, respec-tively, and it has great practical applications in various fields[5, 6]. It is generally assumed that the time is continuous ordiscrete in the dynamic systems. However, this cannot beguaranteed. In reality, the time scale could be neithercontinuous nor discrete.*ere are many processes which arethe mixture of continuous time and discrete time, non-uniform discrete time, or the union of disjoint time interval,such as the production and storage process in economics, theinvestment process in finance, and the population for theseasonal insects. When the time is more complex, it makesthe control problem more difficult. How to deal with thisproblem?

Time scales were first introduced by Hilger [7] in 1988 inorder to unite differential and difference equations into ageneral framework. *is allows us to deal with the con-tinuous and discrete analyses from the common point ofview. Recently, time scale theory is extensively studied in

many works [8–13]. It is well known that the optimal controlproblems on time scales are an important field for boththeory and applications. Since the calculus of variations ontime scales was studied by Bohner [14], the results on op-timal control problems in the time scale setting and theirapplications have been rapidly growing. *e existence ofoptimal control for the dynamic systems on time scales wasdiscussed [15–17]. Subsequently, Pontryagin maximumprinciple on time scales was studied in several works [18, 19],which specifies the necessary conditions for optimality.

*e dynamic programming for dynamic systems on timescales is not a simple task to unite the continuous time anddiscrete time cases because the time scales contain morecomplex time cases. Seiffertt et al. [20] studied the ap-proximate dynamic programming for the dynamic system inthe isolated time scale setting. In addition, Bellman dynamicprogramming on general time scales for the deterministicoptimal control problems was considered in [21, 22].However, limited work [23, 24] has been done on the linearquadratic stochastic optimal control problem in the timescales setting. *at is to say, the general setting of stochasticoptimal control problems on time scales is completely open.

Motivated by all these significant works, the purpose ofthis paper is to study the method of dynamic programmingfor the stochastic optimal control problems on time scales.As we know, the stochastic dynamic programming principle

HindawiComplexityVolume 2020, Article ID 7683082, 11 pageshttps://doi.org/10.1155/2020/7683082

mailto:[email protected]://orcid.org/0000-0003-1121-2669https://creativecommons.org/licenses/by/4.0/https://creativecommons.org/licenses/by/4.0/https://doi.org/10.1155/2020/7683082

is different from the deterministic systems, which reflects thestochastic nature of the optimal problem. So, the method inthe deterministic case on time scales cannot be applied to thestochastic case directly. In order to overcome this difficulty,we first give a new form of the chain rule on time scales.Based on this idea, we obtain Ito’s formula for the stochasticprocess on time scales. Second, we consider a family ofoptimal control problems with different initial times andstates to establish the Bellman optimality principle in thetime scale setting.*ird, using Ito’s formula and the Bellmanoptimality principle obtained in the time scales, we also getthe associated Hamilton–Jacobi–Bellman (HJB for short)equation on time scales which is a nonlinear second-orderpartial differential equation involving expectation. If the HJBequation is solvable, then we can get an optimal feedbackcontrol. Our work will enrich the dynamic programmingproblem by providing a more general time framework andmake dynamic programming theory to be a powerful tool intackling the optimal control problem on complexity time.

*e organization of this paper is as follows. In Section 2,we show some preliminaries about time scale theory. Section3 focuses on the Bellman optimality principle and the HJBequations on time scales. By introducing a new symbol, wepresent Ito’s formula in a new form. On this basis, we get themain results. Finally, an illustrative example is given to showthe effectiveness of the proposed results.

2. Preliminaries

A time scale T is a nonempty closed subset in real number setR, and we denote [0, T]T � [0, T]∩ T . In this paper, wealways suppose T is bounded. *e forward jump operator σand backward jump operator ρ are, respectively, defined by

σ(t) � inf s ∈ T : s> t{ },

ρ(t) � sup s ∈ T : s< t{ },(1)

supplemented by inf∅ ≔ supT and sup∅ ≔ infT , where ∅denotes the empty set. If σ(t) � t and t< supT , the point t iscalled right-dense, while if σ(t)> t, the point t is called right-scattered. Similarly, if ρ(t) � t and t> infT , the point t iscalled left-dense, while if ρ(t)< t, the point t is called left-scattered. Moreover, a point is called isolated if it is both left-scattered and right-scattered. For a function f, we denotefσ(t) � f(σ(t)) to represent the compositions of thefunctions f and σ. Similarly, we denote fρ(t) � f(ρ(t)).*e definition of the graininess function μ is as follows:

μ(t) � σ(t) − t. (2)

We now present some basic concepts and propertiesabout time scales (see [10, 11]).

Definition 1 (see [10]). Let f be a function on T . f is called aright-dense continuous function if f is continuous at everyright-dense point and has finite left-sided limits at every left-dense point. Similarly, f is called a left-dense continuousfunction if f is continuous at every left-dense point and hasfinite right-sided limits at every right-dense point. If f is

right-dense continuous and also is left-dense continuous,then f is called a continuous function.

Define the set Tκ as

Tκ

�T\(ρ(supT), supT] if supT 0, there exists a neighborhood U of t suchthat

f(σ(t)) − f(s) − fΔ(t)(σ(t) − s)

≤ ε|σ(t) − s|, for all s ∈ U,

(4)

we call fΔ(t) the Δ-derivative of f at t.We denote by C1([0, T]T ;R) the space of R-valued

continuouslyΔ-differentiable functions on [0, T]T and denoteby C1,2(T × Rn;R) the family of all functions f(t, x) definedon T × Rn such that they are continuously Δ-differentiable int and twice continuously differentiable in x.

Furthermore, we give the derivation rule of the com-pound function.

Lemma 1 (see [25]). Let g: T⟶ R be Δ-differentiable andf: R⟶ R be continuously differentiable. ,en f(g(t)) isΔ-differentiable, and

fΔ(g(t)) � g

Δ(t)

1

0fg′ g(t) + hμ(t)g

Δ(t) dh. (5)

In this paper, we adopt the stochastic integral defined byBohner et al. [26]. Let (Ω,F, Ft t∈[0,T]T , P) be a completeprobability space with increasing and continuous filtrationFt t∈[0,T]T . We define that L

2F([0, T]T ;R) is the set of all

Ft-adapted, R-valued measurable processes X(t) such thatE[

T

0 |X(t)|2Δt]

We also have the following properties. Let X(t), Y(t) ∈ L2F([0, T]T ;R) and α, β ∈ R. *en

(i) T

0(αX(t) + βY(t))ΔW(t) � α

T

0X(t)ΔW(t) + β

T

0Y(t)ΔW(t),

(ii) E T

0X(t)ΔW(t) � 0,

(iii) E T

0X(t)ΔW(t)

2

� E T

0|X(t)|

2Δ〈W〉t � E T

0|X(t)|

2Δt ,

(8)

where the integral X with respect to the quadratic variationof Brownian motion 〈W〉t is defined by Stieltjes integral as

T

0 Xt(ω)Δ〈W〉t(ω).Let X be an n-dimensional stochastic process defined by

ΔX(s) � a(s, X(s))Δs + b(s, X(s))ΔW(s), (9)

and we have the following Ito’s formula.

Lemma 2 (see [28]). Let f ∈ C1,2(T × Rn;R) and X satisfy(9), then the following relation holds:

f(t, X(t)) � f(0, X(0)) + t

0fΔs (s, X(s))Δs +

t

0

zf

zx(s, X(s))

T

a(s, X(s))Δs

+ t

0

zf

zx(s, X(s))

T

b(s, X(s))ΔW(s) +12

t

0(b(s, X(s)))

T z2f

zx zx(s, X(s))b(s, X(s))Δs

+ s∈[0,t]T

f σ(s), Xσ(s)( − f(σ(s), X(s))( − s∈[0,t]T

zf

zx(s, X(s))

T

a(s, X(s))(σ(s) − s)

− s∈[0,t]T

zf

zx(s, X(s))

T

b(s, X(s)) Wσ(s) − W(s)( −

12(b(s, X(s)))

T z2f

zx zx(s, X(s))b(s, X(s)).

(10)

3. Problem Statement and Main Results

Let Ω,F, Ft t∈T, P be a given filtered probability spacesatisfying the usual condition. Consider the stochasticcontrol system

ΔX(s) � a(s, X(s), u(s))Δs + b(s, X(s), u(s))ΔW(s), s ∈ [0, ρ(T)]T ,

X(0) � x0 ∈ Rn.

(11)

*e control u(·) belongs to

U[0, T]T � u: [0, T]T ×Ω⟶ U|u ismeasurable and Ft t∈T − adapted , (12)

Complexity 3

where U is a convex subset of Rm. And the functionsa(t, x, u) and b(t, x, u) satisfy the Lipschitz condition andlinear growth condition in x. Obviously, equation (11)admits a unique solution (see Bohner et al. [26]). *e costfunctional associated with (11) is

J(u(·)) � E T

0r(s, X(s), u(s))Δs + h(X(T)) , (13)

where the maps r: [0, T]T × Rn × U⟶ R and h: Rn⟶ Rare continuous.

*e optimal control problem is to find u∗(·) ∈ U[0, T]Tsuch that

J u∗(·)( � inf

u(·)∈U[0,T]TJ(u(·)). (14)

u∗(·) is called the stochastic optimal control of theproblem, and the corresponding X(·; x0, u∗(·)) is called anoptimal state process.

Now, we consider a family of optimal control problemswith different initial times and states. Let(t, x) ∈ [0, T]T × Rn, consider the state equation

ΔX(s) � a(s, X(s), u(s))Δs + b(s, X(s), u(s))ΔW(s), s ∈ [t, T]T ,

X(t) � x, (15)

along with the cost functional

J(t, x; u(·)) � E T

tr(s, X(s), u(s))Δs + h(X(T)) .

(16)

For any (t, x) ∈ [0, T]T × Rn, minimize (16) subject to(15) overU[t, T]T . *e value function of the optimal controlproblem is defined as

V(t, x) � infu(·)∈U[t,T]T

J(t, x; u(·)). (17)

We first introduce a symbol which is useful in the sequel.Let g: T⟶ R be Δ-differentiable and f: R⟶ R becontinuously differentiable. For any t ∈ Tκ, define fΔg(g(t))as follows:

fgΔ(g(t)) �

lims⟶t

f(g(t)) − f(g(s))

g(t) − g(s), t is right − dense andgΔ ≠ 0,

f gσ(t)( − f(g(t))

gσ(t) − g(s)

, t is right − scattered andgΔ ≠ 0,

0, gΔ � 0.

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

(18)

Remark 1. Note that fΔg(g(t)) depends not only on thefunctions f and g but also on the time scales T . If t is a right-dense point on time scales T , thenfΔg(g(t)) � f

Δg(g(t)) � fg′(g(t)). On the contrary, if t is a

right-scattered point and gσ(t) � σ(g(t)), we havefΔg(g(t)) � f

Δg(g(t)).

With the help of the new symbol, we have the followinglemma.

Lemma 3. Let f: R × R⟶ R be continuously differen-tiable and x: T⟶ R and y: T⟶ R be Δ-differentiable,then

fΔ(x(t), y(t)) � f

Δx x(t), y

σ(t)( xΔ

(t) + fΔy(x(t), y(t))y

Δ(t), t ∈ Tκ. (19)

4 Complexity

Proof. If xΔ(t) � 0 or yΔ(t) � 0, it is easy to verify that (19)is true. We only give proofs under conditions xΔ(t)≠ 0 andyΔ(t)≠ 0. If t is right-dense, one has

fΔ

(x(t), y(t)) � lims⟶t

f(x(t), y(t)) − f(x(s), y(s))

t − s

� lims⟶t

f(x(t), y(t)) − f(x(t), y(s)) + f(x(s), y(t)) − f(x(s), y(s))

t − s

� lims⟶t

f(x(t), y(t)) − f(x(s), y(t))

x(t) − x(s)×

x(t) − x(s)

t − s+

f(x(s), y(t)) − f(x(s), y(s))

y(t) − y(s)×

y(t) − y(s)

t − s

� fΔx (x(t), y(t))x

Δ(t) + f

Δy(x(t), y(t))y

Δ(t)

� fΔx x(t), y

σ(t)( xΔ(t) + f

Δy(x(t), y(t))y

Δ(t).

(20)

When t is right-scattered, then

fΔ(x(t), y(t)) �

f xσ(t), y

σ(t)( − f(x(t), y(t))

σ(t) − t

�f x

σ(t), y

σ(t)( − f x(t), y

σ(t)( + f x(t), y

σ(t)( − f(x(t), y(t))

σ(t) − t

�f x

σ(t), y

σ(t)( − f x(t), y

σ(t)(

xσ(t) − x(t)

×xσ(t) − x(t)

σ(t) − t+

f x(t), yσ(t)( − f(x(t), y(t))

yσ(t) − y(t)

×yσ(t) − y(t)

σ(t) − t

� fΔx x(t), y

σ(t)( xΔ

(t) + fΔy(x(t), y(t))y

Δ(t).

(21)

*is completes the proof. □

Remark 2. Similarly, another form can be expressed as

fΔ(x(t), y(t)) � f

Δx (x(t), y(t))x

Δ(t) + f

Δy x

σ(t), y(t)( y

Δ(t).

(22)

Remark 3. In particular, let g: T⟶ R be Δ-differentiableand f: R⟶ R be continuously differentiable. *enfΔ(g(t)) � fΔg(g(t))g

Δ(t). It is easy to see that this equalityis equivalent to (5).

Remark 4. It is not hard for us to get the following result ofmultidimensions:

FΔ(x(t), y(t)) � F

Δx x(t), y

σ(t)(

T

xΔ(t) + F

Δy(x(t), y(t))

T

yΔ

(t), (23)

where F: Rn × Rm⟶ R is continuously differentiable andx: T⟶ Rn and y: T⟶ Rm are Δ-differentiable.

Next, we show Ito’s formula in a new form on timescales.

Complexity 5

Proposition 1. LetX satisfy (9) andf ∈ C1,2(T × Rn;R), wehave

f(t, X(t)) � f(0, X(0)) + t


t

0fΔx (σ(s), X(s))

T

a(s, X(s))s

+ t

0fΔx(σ(s), X(s))

T

b(s, X(s))ΔW(s) +12

t

0ID(s)(b(s, X(s)))

TfΔ2xx(σ(s), X(s))b(s, X(s))Δs,

(24)

where I(·) is an indicative function and D is the set of allright-dense points.

Proof. Because of Lemma 2, it is enough to show that

t


t

0

zf

zx(s, X(s))

T

a(s, X(s))Δs

+ t

0

zf

zx(s, X(s))

T

b(s, X(s))ΔW(s) +12

t

0(b(s, X(s)))

T z2f

zx zx(s, X(s))b(s, X(s))Δs

+ s∈[0,t]T

f σ(s), Xσ(s)( − f(σ(s), X(s))( − s∈[0,t]T

zf

zx(s, X(s))

T

a(s, X(s))(σ(s) − s)

− s∈[0,t]T

zf

zx(s, X(s))

T

b(s, X(s)) Wσ(s) − W(s)( −

12(b(s, X(s)))

T z2f

zx zx(s, X(s))b(s, X(s))

� t


t

0fΔx(σ(s), X(s))

T

a(s, X(s))Δs

+ t

0fΔx(σ(s), X(s))

T

b(s, X(s))ΔW(s) +12

t

0ID(s)(b(s, X(s)))

TfΔ2xx(σ(s), X(s))b(s, X(s))Δs.

(25)

By some manipulation, namely,

s∈[0,t]T

f σ(s), Xσ(s)( − f(σ(s), X(s))(

� s∈[0,t]T

fΔx (σ(s), X(s))

T

a(s, X(s))(σ(s) − s) + s∈[0,t]T

fΔx (σ(s), X(s))

T

b(s, X(s)) Wσ(s) − W(s)( ,

(26)

it is straightforward to show that the above equation is true.Now, we state the Bellman optimality principle on time

scales. □

Theorem 1 (optimality principle). Let (t, x) ∈ [0, T]T × Rn.,en for any t ∈ [t, T]T , we have

V(t, x) � infu(·)∈U[t,t]T

E t

tr(s, X(s), u(s))Δs + V(t, X(t))⎡⎢⎣ ⎤⎥⎦. (27)

6 Complexity

Proof. For any ε> 0, there exists a control u(·) ∈ U[t, T]Tsuch that

V(t, x) + ε≥ J(t, x; u(·))

� E T

tr(s, X(s), u(s))Δs + h(X(T))

� E t

tr(s, X(s), u(s))Δs + E

T

tr(s, X(s), u(s))Δs + h(X(T))|Ftt

⎡⎢⎣ ⎤⎥⎦

� E t

tr(s, X(s), u(s))Δs + J(t, X(t); u(·))⎡⎢⎣ ⎤⎥⎦

≥E t

tr(s, X(s), u(s))s + V(t, X(t))⎡⎢⎣ ⎤⎥⎦

≥ infu(·)∈U[t,t]T

E t

tr(s, X(s), u(s))Δs + V(t, X(t))⎡⎢⎣ ⎤⎥⎦,

(28)

where Ftt is the sigma field generated byW(r) − W(t): r ∈ [t,t]T .

On the contrary, by the definition of value function (17),we obtain

V(t, x)≤ J(t, x; u(·)) � E t

tr(s, X(s), u(s))Δs + J(t, X(t); u(·))⎡⎢⎣ ⎤⎥⎦. (29)

*us, taking infimum over u(·) ∈ U[t, T]T , one has

V(t, x)≤E t

tr(s, X(s), u(s))Δs + V(t, X(t))⎡⎢⎣ ⎤⎥⎦. (30)

It follows that

V(t, x)≤ infu(·)∈U[t,t]T

E t

tr(s, X(s), u(s))Δs + V(t, X(t))⎡⎢⎣ ⎤⎥⎦.

(31)

Combining with (28) and (31), we get the result.Furthermore, we give the HJB equation on time scales

which is similar to continuous and discrete cases. □

Theorem 2. Let the value function V ∈ C1,2([0, T]T × Rn),then the value function V satisfies the following HJB equation:

VΔt (t, x) + inf

u∈UE

xt H t, x, u, Vx

Δ(σ(t), X(t)), VΔ

2

xx(σ(t), x))] � 0, (t, x) ∈ [0, ρ(T)]T × Rn, V(T, x) � h(x), (32)

where

H(t, x, u, p, P) � r(t, x, u) + pTa(t, x, u) + p

Tb(t, x, u)W

Δ(t) +

12ID(t)(b(t, x, u))

TPb(t, x, u), (33)

and we use the notation

Complexity 7

Ext fxΔ

(X(t)) � EfxΔ

(X(t))|X(t) � x, WΔ(t) �

Wσ(t) − W(t)

μ(t), t is right − scattered,

0, t is right − dense.

⎧⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎩

(34)

Proof. According to the definition of the value function,V(T, x) � h(x) is satisfied. Fix u ∈ U, and let X(·) be thestate process corresponding to the control u(·) ≡ u. By the

optimality principle and Ito’s formula, for any t> t andt,t ∈ [0, T]T , we have

0≤E[V(t, X(t)) − V(t, x)]

t − t+

1t − t

E t

tr(s, X(s), u)Δs⎡⎢⎣ ⎤⎥⎦

�1

t − tE

t

tVΔs (s, X(s)) + H s, X(s), u, V

Δx(σ(s), tXn(s)) , V

Δ2xx(σ(s), X(s)) Δs⎡⎢⎣ ⎤⎥⎦.

(35)

If t is right-dense, then let t⟶ t, while if t is right-scattered, then let t � σ(t). *is leads to

0≤VΔt (t, x) + Ext H t, x, u, V

Δx (σ(t), tXn(t)), tV

Δ2xxn(σ(t), x) . (36)

It follows that

0≤VΔt (t, x) + infu∈U

Ext H t, x, u, V


Δ2xxn(σ(t), x) . (37)

Conversely, for any ε> 0,t> t, and t,t ∈ [0, T]T , we canfind a control u(·) such that

ε≥E[V(t, X(t)) − V(t, x)]

t − t+

1t − t

E t

tr(s, X(s), u(s))Δs⎡⎢⎣ ⎤⎥⎦

�1

t − tE

t

tVΔs (s, X(s)) + H s, X(s), u(s), V

Δx (σ(s), tXn(s)), tV

Δ2xxn(σ(s), X(s)) Δs⎡⎢⎣ ⎤⎥⎦

≥1

t − tE

t

tVΔs (s, X(s)) + inf

u∈UH s, X(s), u, V

Δx (σ(s), tXn(s)), tV

Δ2

xxn(σ(s), X(s)) Δs⎡⎢⎣ ⎤⎥⎦.

(38)

By the same argument as that used in the above proof, weget

ε≥VΔt (t, x) + infu∈U

Ext H t, x, u, V


Δ2xxn(σ(t), x) . (39)

8 Complexity

*erefore, (32) follows. *e proof is completed. □

Remark 5. It is not surprising that HJB equation (32) ontime scales is very similar to the classical HJB equation incontinuous and discrete time (see [29, 30]). An intriguing

feature of the HJB equation on time scales is that the ex-pectation is involved. When T � R+ or T � Z+, we are ableto reduce equation (32) into the classical ones.

Remark 6. Suppose b ≡ 0, then HJB equation (32) becomes

VΔt (t, x) + inf

u∈UVΔx (σ(t), X(t))

T

a(t, x, u) + r(t, x, u) X(t)�x

� 0. (40)

In such a case, it is equivalent to the result in [22].

Remark 7. In particular, in Remark 6, if we further letXσ(t) � σ(X(t)), HJB equation (32) degenerates into

VΔt (t, x) + inf

u∈UVΔx(σ(t), x)

Ta(t, x(t)u) + r(t, x, u) � 0,

(41)

which is just the one given by Seiffertt et al. [20].From the above, we end up with the following verifi-

cation theorem.

Theorem 3 (verification theorem). Let F ∈ C1,2([0, T]T ×Rn;R) be the solution of HJB equation (32), and there exists afunction ϕ(t, X(t)) such that

E H t, X(t), ϕ(t, X(t)), tFΔxn(σ(t), X(t))q, hFΔ2xx(σ(t), X(t)) � inf

u∈UE H t, X(t), u, F

Δx(σ(t), tXn(t)), tF

Δ2xxn(σ(t), X(t)) .

(42)

,en F(t, x) � V(t, x), and u(t) � ϕ(t, X(t)) is an op-timal control.

Proof. Let u(t) � ϕ(t, X(t)), we have

E[h(X(T))] − F(t, x)

� E T

tFΔs (s, X(s)) + Fx

Δ(σ(s), X(s))

T

a(s, X(s), u(s)) + FxΔ(σ(s), X(s))

T

b(s, X(s), u(s))WΔ

(s)

+12ID(s)(b(s, X(s), u(s)))

TFΔ2xx(σ(s), X(s))b(s, X(s), u(s))Δs

� −E T

tr(s, X(s), u(s))Δs .

(43)

*is yields

F(t, x) � E T

tr(s, X(s), u(s))Δs + h(X(T)) ≥V(t, x).

(44)

In addition, for any admissible pair (X(·), v(·)), we have

E[h(X(T))] − F(t, x)

� E T

tFΔs (s, X(s)) + Fx

Δ(σ(s), X(s))

T

a(s, X(s), v(s)) + FΔx (σ(s), X(s))

Tb(s, X(s), u(s))W

Δ(s)

Complexity 9

+12ID(s)(b(s, X(s), v(s)))

TFΔ2

xx(σ(s), X(s))b(s, X(s), v(s))Δs

� E T

tFΔs (s, X(s)) + H s, X(s), u, Fx

Δ(σ(s), X(s)), FΔ

2

xx(σ(s), X(s)) − r(s, X(s), v(s)) Δs

≥E T

tFΔs (s, X(s)) + inf

u∈UH s, X(s), u, Fx

Δ(σ(s), X(s)), FΔ

2

xx(σ(s), X(s)) − r(s, X(s), v(s)) Δs

� −E T

tr(s, X(s), v(s))Δs .

(45)

Namely,

F(t, x)≤E T

tr(s, X(s), v(s))Δs + h(X(T)) . (46)

By arbitrary for (X(·), v(·)), it follows that

F(t, x)≤V(t, x). (47)

Hence, by (44) and (47), one has

F(t, x) � V(t, x). (48)

Finally, inequality (44) together with (48) proves theoptimality of u(·). □

4. Example

*e dynamic programming on time scales contains not onlycontinuous and discrete cases but also other more generalcases. In order to illustrate our result, we give an example.Consider the quantum time scale T � t � qk: k ∈ N , q> 1.*e state equation on time scales T is as follows:

ΔX(s) � X(s)u(s)Δs + u(s)ΔW(s), s ∈ T ,

X(1) � x0. (49)

To find the sequence of optimal control policyu(qk) , k � 0, 1, . . . , N, such that

minimize J(u(·)) � E qN

1X

2(s) + u

2(s) Δs . (50)

In this example, we have μ(t) � (q − 1)t. By *eorem 2,the value function V satisfies the following equation:

V(t, x)

(q − 1)� inf

u∈UE

xt x

2+ u

2+

V σ(t), Xσ(t)( (q − 1)t

, t ∈ T ,

V qN

, x � 0.

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

(51)

We can see that the graininess function μ affected thesolution of HJB equation (51).

E X2(t) + ϕ2(t, X(t)) +

V σ(t), Xσ(t)( (q − 1)t

� infu∈U

E X2(t) + u

2+

V σ(t), Xσ(t)( (q − 1)t

, t ∈ T (52)

Besides, by applying *eorem 3, we can find the optimalcontrol u(t) � ϕ(t, X(t)) through the above equation.Furthermore, it implies that the optimal strategy also de-pends on the structure of time scales.

5. Conclusions

In this paper, we developed the dynamic programmingprinciple for the stochastic system on time scales, for which wepresented Ito’s formula on time scales in a new form by in-troducing a new symbol. Similar to the classical cases, weconstructed the HJB equation on time scales and proved theverification theorem.*e results of this paper aremore general.*e continuous and discrete analogues of the dynamic pro-gramming are special cases of this paper. An example has beengiven to demonstrate the effectiveness of our results.

Data Availability

*e data used to support the findings of this study are in-cluded within the article.

Conflicts of Interest

*e authors declare that there are no conflicts of interestregarding the publication of this paper.

Acknowledgments

*is work was supported by the National Key R&D Programof China (Grant no. 2018YFA0703900) and the MajorProject of National Social Science Foundation of China(Grant no. 19ZDA091).

10 Complexity

References

[1] R. Bellman, “Dynamic programming,” Science, vol. 153,no. 3731, pp. 34–37, 1966.

[2] D. Blackwell, “Discrete dynamic programming,” ,e Annalsof Mathematical Statistics, vol. 33, no. 2, pp. 719–726, 1962.

[3] E. Bandini, A. Cosso, M. Fuhrman, and H. Pham, “Ran-domized filtering and bellman equation in wasserstein spacefor partial observation control problem,” Stochastic Processesand ,eir Applications, vol. 129, no. 2, pp. 674–711, 2019.

[4] H. Pham and X. Wei, “Dynamic programming for optimalcontrol of stochastic McKean—Vlasov dynamics,” SiamJournal on Control and Optimization, vol. 55, no. 2,pp. 1069–1101, 2017.

[5] C. Mu, Y. Zhang, H. Jia, and H. He, “Energy-storage-basedintelligent frequency control of microgrid with stochasticmodel uncertainties,” IEEE Transactions on Smart Grid,vol. 11, no. 2, pp. 1748–1758, 2020.

[6] C. Mu and Y. Zhang, “Learning-based robust tracking controlof quadrotor with time-varying and coupling uncertainties,”IEEE Transactions on Neural Networks and Learning Systems,vol. 31, no. 1, pp. 259–273, 2020.

[7] S. Hilger, “Analysis on measure chains—a unified approach tocontinuous and discrete calculus,” Results in Mathematics,vol. 18, no. 1-2, pp. 18–56, 1990.

[8] G. S. Guseinov, “Integration on time scales,” Journal ofMathematical Analysis and Applications, vol. 285, no. 1,pp. 107–127, 2003.

[9] M. Bohner and G. S. Guseinov, “Partial differentiation on timescales,” Dynamic Systems and Applications, vol. 13, no. 3-4,pp. 351–379, 2004.

[10] M. Bohner and A. Peterson, Dynamic Equations on TimeScales: An Introduction with Applications, Birkhäuser Boston,Boston, MA, USA, 2001.

[11] M. Bohner and A. Peterson, Advances in Dynamic Equationson Time Scales, Birkhäuser Boston, Boston, MA, USA, 2002.

[12] N. H. Du and N. T. Dieu, “*e first attempt on the stochasticcalculus on time scale,” Stochastic Analysis and Applications,vol. 29, no. 6, pp. 1057–1080, 2011.

[13] D. Grow and S. Sanyal, “Brownian motion indexed by a timescale,” Stochastic Analysis and Applications, vol. 29, no. 3,pp. 457–472, 2011.

[14] M. Bohner, “Calculus of variations on time scales,” DynamicSystems and Applications, vol. 13, no. 3-4, pp. 339–349, 2004.

[15] Z. Zhan and W. Wei, “On existence of optimal controlgoverned by a class of the first-order linear dynamic systemson time scales,” Applied Mathematics and Computation,vol. 215, no. 6, pp. 2070–2081, 2009.

[16] Y. Gong and X. Xiang, “A class of optimal control problems ofsystems governed by the first order linear dynamic equationson time scales,” Journal of Industrial and Management Op-timization, vol. 5, no. 1, pp. 1–10, 2009.

[17] Y. Peng, X. Xiang, and Y. Jiang, “Nonlinear dynamic systemsand optimal control problems on time scales,” ESAIM:Control, Optimisation and Calculus of Variations, vol. 17,no. 3, pp. 654–681, 2011.

[18] Z. Zhan, S. Chen, andW.Wei, “A unified theory of maximumprinciple for continuous and discrete time optimal controlproblems,” Mathematical Control and Related Fields, vol. 2,no. 2, pp. 195–215, 2012.

[19] M. Bohner, K. Kenzhebaev, O. Lavrova, and O. Stanzhytskyi,“Pontryagin’s maximum principle for dynamic systems ontime scales,” Journal of Difference Equations and Applications,vol. 23, no. 7, pp. 1161–1189, 2017.

[20] J. Seiffertt, S. Sanyal, and D. C. Wunsch, “Hamilton-Jacobi-Bellman equations and approximate dynamic programmingon time scales,” IEEE Transactions on Systems, Man, andCybernetics, Part B (Cybernetics), vol. 38, no. 4, pp. 918–923,2008.

[21] Z. Zhan, W. Wei, and H. Xu, “Hamilton–Jacobi–Bellmanequations on time scales,” Mathematical and ComputerModelling, vol. 49, no. 9-10, pp. 2019–2028, 2009.

[22] R. Š. Hilscher and V. Zeidan, “Hamilton–Jacobi theory overtime scales and applications to linear-quadratic problems,”Nonlinear Analysis:,eory, Methods and Applications, vol. 75,no. 2, pp. 932–950, 2012.

[23] Y. Zhu and G. Jia, “Stochastic linear quadratic controlproblem on time scales,” submitted.

[24] Y. Zhu and G. Jia, “Linear feedback of mean-field stochasticlinear quadratic optimal control problems on time scales,”Mathematical Problems in Engineering, vol. 2020, Article ID8051918, 11 pages, 2020.

[25] C. Pötzsche, “Chain rule and invariance principle on measurechains,” Journal of Computational and Applied Mathematics,vol. 141, no. 1-2, pp. 249–254, 2002.

[26] M. Bohner, O. M. Stanzhytskyi, and A. O. Bratochkina,“Stochastic dynamic equations on general time scales,”Electronic Journal of Differential Equations, vol. 2013, no. 57,pp. 1–15, 2013.

[27] D. Grow and S. Sanyal, “*e quadratic variation of Brownianmotion on a time scale,” Statistics and Probability Letters,vol. 82, no. 9, pp. 1677–1680, 2012.

[28] W. Hu, “Itô’s formula, the stochastic exponential, and changeof measure on general time scales,” Abstract and AppliedAnalysis, vol. 2017, Article ID 9140138, 13 pages, 2017.

[29] M. Bardi and I. Capuzzo-Dolcetta, Optimal Control andViscosity Solutions of Hamilton-Jacobi-Bellman Equations,Springer Science & Business Media, Berlin, Germany, 2008.

[30] L. Grüne, “Error estimation and adaptive discretization forthe discrete stochastic Hamilton–Jacobi–Bellman equation,”Numerische Mathematik, vol. 99, no. 1, pp. 85–112, 2004.

Complexity 11

DynamicProgrammingandHamilton–Jacobi–Bellman … · 2020. 6. 7. · isdiﬀerentfromthedeterministicsystems,whichreﬂectsthe stochasticnatureoftheoptimalproblem.So,themethodin

Documents