A SOLUTION TO A COUNTABLE SYSTEM OF EQUATIONS ARISING IN MARKOVIAN DECISIGN PROCESSES by Cyrus Derman and Arthur F, Veinott, Jr. TECHNICAL P2PGRT NO. 89 July 7, 1966 Supported by the Army, Navy, Air Force, and NASA under Contract Nonr-225( 53) (NR-042-002) with the Ofrice 3T' Naval Eesearcnx Gerald J. Lieberman, Project Director T h i s research was partially supported by the Office of Naval Research under Contract Nonr-225( 77) (NR-347-010). Reproduction i n Whole or i n Part is Permitted for any Purpose of the United States Government DEPARTMENT OF STATISTICS STANFORD TVNIVERS ITY STANFOBD, CALIFXWIA https://ntrs.nasa.gov/search.jsp?R=19660025515 2020-06-03T19:07:40+00:00Z
13
Embed
TECHNICAL 89 - NASA€¦ · provided that the expected time m io required to go from state i to state 0 is finite and that the expected cost c incurred during that time is also finite,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A SOLUTION TO A COUNTABLE SYSTEM OF EQUATIONS
ARISING I N MARKOVIAN DECISIGN PROCESSES
by
C y r u s Derman and Arthur F, Veinott , Jr.
TECHNICAL P2PGRT NO. 89
J u l y 7, 1966
Supported by the Army, Navy, A i r Force, and NASA under
Contract Nonr-225( 53) (NR-042-002)
with t h e Ofrice 3T' Naval Eesearcnx
Gerald J. Lieberman, Project Director
T h i s research w a s p a r t i a l l y supported by the Office of Naval Research under Contract Nonr-225( 77) (NR-347-010).
Reproduction i n Whole or i n Par t i s Permitted for
L e t Xo, X1, . .. be a sequence of non-negative in t ege r valued
random va r i ab le s with t h e property t h a t
Pr(Xn+l = j l X o = xo, ... , x ~ , ~ = x n-1’ x n = i ) = p i j
X n. The co l l ec t ion of random va r i ab le s n’ f o r a l l i, j , xo, ,
{Xn) is c a l l e d a Markov chain and t h e pij
are ca l l ed t r a n s i t i o n
p r o b a b i l i t i e s , We r e f e r t o Xn as t h e s t a t e of t h e process a t time E,
Let wi be t h e cost incurred a t t i m e n if t h e process i s i n state i
at t h a t time. Consider t h e system of equations
i n t h e -unknown va r i ab le s
connection with construct ing optimal rules f o r cont ro l l ing Markovian
g, vo, vl, ..(. Such a system a r i s e s i n
dec is ion processes. Also t h e numbers g, vo, vl, . are of i n t e r e s t
i n t h e i r own r i g h t . Of ten g i s the long run expected average cost
and v - v i s t h e l i m i t , as n + 03, of t h e d i f fe rence between
expected t o t a l cost during times 0, 1, ..* , n given t h a t t h e process
i j
siai.is i r i siaies i aria j respeciiveiy.
We show i n t h i s paper t h a t one so lu t ion t o the system (1) i s given
by
i
provided t h a t t h e expected time m required t o go from state i t o i o state 0 i s f i n i t e and t h a t t h e expected cost c incurred during
t h a t time i s a l s o f i n i t e , i = 0, 1, Notice t h a t v = 0.
io
0
A s an i l l u s t r a t i o n of t h e above ideas , consider a s ing le i t e m
inventory model i n which t h e demands dn per iods 1, 2, . O . are indepen-
dent., A demand of s i ze one occurs with p robab i l i t y p, 0 < p < 1, and ’
a demand of s i z e zero occurs with probabi l i ty 1 - p. Let X denote
t h e stock on hand a t the beginning of period n. An order f o r one u n i t
n
i s placed i n period n with immediate de l ivery i f X = 0; otherwise,
no order is placed i n period n. There i s a un i t cost h f o r each un i t
of s tock on hand af ter ordering i n a period. There is a cost K f o r
n
placing an order i n a period. Under these assumptions t h e nonzero
t r a n s i t i o n p r o b a b i l i t i e s are
Pi , i-1 i
Thus t h e system (1) becomes
p,, = p, pol = 1 - p, pii = P - p, and
= p, i = 1, 2, . Also wo = K + h and w = h i , i = 1 , 2 , . # . .
g + v0 = K + h + pv0 + (1 - p)vl
g + vi = i h + pvi,l + (1 - p)vi, i = 1, 2, . o o
The so lu t ion given i n (2) i s
g = p K + h ,
ii
Thus
order ing policy. Also vi i s t h e l i m i t , as n +m, of t h e amount by
which t h e expected cost i n per iods 0, 1, . O . , n starting with i
u n i t s of stock on hand exceeds t'mt s t a r t i n g with no stock on hand.
g i s here t h e long run expected average cost under t h e ind ica ted
iii
A SOLUTION TO A C0UNTABL;E SYSTEM OF EQUATIONS
ARISING I N MARKOVIAN DECISION PROCESSES
by Cyrus Derman
and
Columbia University and Stanford Vniversity
Arthur F. Veinott, JE. Stanford Universi ty
Let {X,], n = 0, 1, ... , be a Markov chain having a s t a t e space
cons is t ing of t h e non-negative in tegers and having s t a t iona ry t r a n s i t i o n
p r o b a b i l i t i e s {p,;] . Let {w 1 , i = 0, 1, ... , be a sequence of real
numbers. Consider t h e system of equations
LJ 1
g + v i = w + pijvj, i = o , 1, ... , i j = o
i n the unknown variables {g, vo, vl, 1 . I n [2], t h e system (1)
arises i n connection with conditions f o r the exis tence and construct ion
of optimal r u l e s for cont ro l l ing a Markovian dec is ion process. For a
f i n i t e state space exis tence of so lu t ions t o (I-> i s guaranteed by the
condi t ion t h a t t he Markov chain have a t most one ergodic c l a s s of s t a t e s .
(See [ 3 ] . ) I n t h i s note we give condi t ions ensuring the exis tence
(Theorem 1) and uniqueness (Theorem 2 ) of so lu t ions t o (1).
Let
j , n = 0 , 1 , ... ,
p* = E C Z n ( j ) / X o = , i, j = 0, 1, , o i j In=, ii
1
and m
If t h e last series converges absolutely, then m i s t h e mean first
passage t i m e from i t o 0 and we say m i s f i n i t e . If t h e mio
a r e a l l f i n i t e , as we assume throughout, then state 0 i s pos i t i ve
i o
i o
recur ren t and the re i s only one recurrent c lass .
m / m \ Let yn = 2 w.Z ( j ) and cio = E ' 1 y /x =
j =o J n \ \"=" n o
BJ- a n obv5om general izat ion of Theorem 5 in [l, p. 811 we ge t m
pX.w provided t h e s e r i e s i s absolutely convergent. If the 'io = .E o ij j
J =O series i s absolu te ly convergent we say c i s f i n i t e . I n appl ica t ions
w, i s o f t en t h e cos t incurred when i n state i so c i s then t h e
expected cos t during a first passage f r o m i t o 0 .
io
I i o
Theorem 1 (Existence)
If the numbers mio and cio9 i = 0, 1, .,. , are f i n i t e , then
t h e numbers
C 00
00
g = - and v = c - gmio, i = 0, 1, ... m i i o
m
s a t i s f y (1) and 1 pijvj converges absolutely, i = 0, I, . a I)
j =o Proof: -
00
2
/ m \
m m
= WT + 1 C E(y;ixo = 1 n=l j=o
a m
= + 2 1 E(Y;Ix~ = i, x1 = j ) p i j j = o n=l
CQ
= q + 1 p v i j j j = o
so (1) holds. The interchange of expectation and summation i s j u s t i f i e d
s ince t h e f i n i t e n e s s of t he m and cio imply t h a t i o
03
1 E( IY:Il Xo = i) < m. This i n turn implies t h a t t h e series above are n= o
.absolutely convergent so t h e interchange of summations i s a l s o j u s t i f i e d .
Theorem 2 (Uniqueness)
and cio, i = 0, 1, .,, , are f i n i t e , i f C m
, i = 0, 1, ... converges absolutely, and i f
m J =O 00 ;
{g, vo, vl, . . . I i s a sequence with oprjvj , i = 0, 1, . . * , J=O
converging absolutely, then
i f t h e r e i s a real number r such t h a t
(g, vo, vl, . . . I s a t i s f i e s (1) i f and onl:q
(3) L! 00
00 g = - and v = c - g m i o + s , i = O , l , . . . . m i i o
Proof:
It i s immediate from t h e hypotheses and Theorem 1 t h a t 03
p* .v .Z o ij j {g, vo, vl, ... ) converges absolu te ly a s wel l as
defined i n ( 3 ) s a t i s f i e s (1) and 00 j =o
1 pijvj. L e t {g ' , v;, v i , , o . 1 be j =o
3
. m
p?.v! converging absc lu t e ly for .I 0 1J J J =O
any o the r so lu t ion t o (1) with
m
i = 0, 1, D . . . Hence 1 pikvi i s absolu te ly convergent. Now pre- k=o
P+ - o c i multiplying both s ides of (1) by 5 = - , summing over i = 0, 1, , i m 00
03 03
using t h e r e l a t i o n s 1 ni = 1 and TI = 1 pkgTk, j = 0 , 1, * * . , and i=o j kTQ
t h e f a c t t h a t t h e interchange of summations i s j u s t i f i e d , we ge t
m
g' = 5 w which i s independent of Cv;, vi, c ? . Thus s ince i i i=o
{g, vo, vl, . . . I s a t i s f i e s (1) we must have g = g ' .
Let t ing A. 1 = v! 1 - v i' i = 0 J 1, 0 - 0 we ge t fram (1) on subtrac-
t i n g one system from t h e o ther t ha t
(4) 00
Ai = pijnj, i = 0, 1, ..* j =o
n Let 7 = P r ( X n = j l X o = i). Evidently f o r N = 1, 2 , '5 j
so
( 5 ) j = 0, 1, O O .
Since t h e s e r i e s on t h e r igh t side of (5 ) converges absolu te ly by hypoth- 00
= TI we ge t from t h e dominated convergence l i m - 1 pij 1 N j'
e s i s , and N 4 0 3 n=l
theorem t h a t
4
I . 4
m 00 00
m n Since from (5 ) ,
y ie ld ing
1 pijAj converges absolu te ly );e can i t e r a t e (4), j=o
m n
Ai = c PijAj, i = 0 , 1, . = e j n = 1, 2, e * * . j =o
Hence on subs t i t u t ing (7) i n t o (6)
m
ni = c ri A i = 0, 1, . j=o j j'
Thus Ai i s independent of i, which completes the proofo
Example :
If t h e sequences { m 1 and {wi), i = 0, 1, . . + , are bounded, i o
then so i s the sequence I C 1, i = 0, 1, r) , since i o
leio/ 5 sup mk0IwjI. k,.i
Thus Theorem 1 app l i e s and i n addi t ion t h e solutior!
t o (1) gi& i n ( 2 ) i s bounded. This r e s u l t i s used i n [ 2 ]
We remark t h a t s ince
w rn
where
5
m
j = o 1 opEj
provided
ujl
t h a t 1 opEj I u . 1 i s absolutely convergent e ThGs the hypoth-
i s absolu te ly convergent for every recurrent s t a t e k
m
j=o J
eses of Theorems 1 and 2 could have been s t a t ed only f o r s t a t e 0 and
t n e t r ans i en t s t a t e s .
6
References
[l] Chung, K. L. (1960), Markov Chains w i t h S t a t i o n a r y T r a n s i t i o n
P r o b a b i l i t i e s , Springer, Berlin.
Eerman, C j . r u s (2.966), “Iknuxrable State Markovian Decision
Processes - Average Cost Cr i te r ion ,” (To Appear in Ann. Math.
- stat.).
Howard, Ronald (1960), Dynamic Programming and Markov Processes,
John Wiley, New York.
[ 2 ]
- -
[ 33
7
1 . O R l G l N A T l N G ACTIV ITY fCo#pomte arthod Stanford University
c I
2.. REPORT SECUUITY C L A I S i F i C A T l O N
Unclassified
I
5 R E P O R T DATE 7.. T O T A L W O . OF P A C L S
J d Y 7, 1966 10
UNCLASSIFIED SecuriM Classification
7b. NO. OF REFS
3
DOCUMENT CONTROL DATA - RLD
1 - 5. C O N T R A C T OR GRANT NO.
Contract Bonr-225( 53)
fSreurirr c l a a a i f f u t f m of 1f11a. bod9 of abmtmcl end fndaxiru mmtatim m u m t k .nlamd drr (h. o w r d l mwn i a cLaaaifhdJ
I
S a . 0 R i O I N A T O I . S R E P O R T NUMOENS)
Technical Report No. 89
Department of S t a t i s t i c s Starford, California
3 . R E P O R T TITLE
A Solution t o a Countable System of Equations Arising i n Markovian Decision Processes.
I. DESCRIPTIVE NOTES (rrp. ol rrpor( md bcluofm.dah.) Technical Report
I. AUTHOW) ~ L L U I -a. nmt IU.. tntn.r> - Derman, Cyrus Veinott, Arthur F., Jr.
b. F R O J E C T NO.
:a- 342- 002 C
I None- d
I lo. A Ir A ILABIL ITY , 'L I~ ITATlON NOTICES
Distr ibut ion of t h i s document is unlimited.
- 1 ' SLPPL EMF-kTARY NOTES 12 SPONSORING MILITARY ACTIVITY
Logistics and Mathematical S t a t i . , t i c s bra^ Office of Naval Research Washington, D. C. 20362
3 ABSTRACT
A coilntable system of equations arising i n Markovlan decision processis is j tudiea. Ccnditions are given ensuring the existence and uniqw.nese of an e x p l i c i t w l u t i o n t o the system.
INSTRUCTIONS 1. ORIGINATING ACTIVITY Enter the name and address of the contractor, subcontractor, .grantee, Department of De fense activity or other organizatron (corporate author) issuing the report. 2s. REPORT S E C m T Y CLASSIFICATION Enter the over- all security classification of the report. Indicate whether "Restricted Dats" is included. Marking is to be in accorC ance with appropriate security regulations. 26. GROUP: Automatic downgrading is specified in DoD Di- rective 5200.10 and Armed Forces Industrial Manual. Enter the group number. Also, when applicable, show that optional markings have been used for Group 3 and Group 4 as author- ized. 3. REPORT T I T L E Entd the complete report title in all capital letters. Titles in all cases should be unclassified If a meaningful title cannot be selected without classifica- tion, show title classification in all capitals in parenthesis immediately following the title. 4. DESCRPTIVE NOTES If .ppropriate, enter the type of report, e.g., interim, progress, summary, annual, or final. Give the inclusive dates when a specific reporting period is covered. 5. AUTHOR(S): Enter the name(.) of author(s) as shown on or in the report, Enter last name, first name, middle initia'. If rilitary, show rank and branch of service. The name of the principal duthor is an absolute minimum requirement 6. REPORT DATE Enter the date of the report as day, month, year, or month, year. If more than one date appears on the report, use date of publication. 7s. TOTAL NUMBER OF PAGES: The total page count should follow normal pagination procedures, L e . , enter the number of pages containing information. 7h. NUMBER O F REFERENCE3 Enter the total number of references cited in the report. Ba. CONTRACT OR GRANT NUMBER: If appropriate, enter the applicable number of the contract or grant under which the report was written. Bb, &, b Ed. PROJECT NUMBER: Enter the appropriate military department identification, such aa project number, subproject number, system numbers, task number, e t o 9a. ORIGINATOR'S REPORT NUuBER(S): Enter the offi- cial report number by which the document will be identified and controlled by the originating activity. This number must b e unique to this report. 96. OTHER REPORT NUMBERS): If the report has been assigned any other report numbers (either b y the originator or b y the sponsor), also enter this number(s). 10. AVAILABILITY/LIMITATN NOTICES: Enter ury l b itations on further dissemination of the report, other than thonc
LINK B
T I LII -
ROLE C -
W T
imposed by security classification, using St8ndard statements such as:
(1)
(2)
(3)
"Qualified reque$ers may obtain copies of thls report from DDC "Foreign announcement and dissefnination of thia report by DDC is not authorized." "U. S. Government agencies may obtain copies of this report directly from DDC. Other qualified DDC users shall request through
"U. S. military agencies may obtain copies of thir report directly from DDC Other qualified usCts shall request through
"All distribution of this report is controlled. Qual- ified DDC users shall request through
t t
(4)
s s
( 5 )
e t
If the report has been furnished to the Oface of Technical Services, Department of Commerce, for sale to the public, indi- ca te this fact and enter the price, i f known. 1L SUPPLEMENTARY NOTES: Uae for additional explane tory n o t e s 12 SPONSORING MILITARY ACTIVITY: E d e r the -me of the departmental project office or laboratory sponmriw (pap ing for) the research and development Include address 13. ABSTRACT: Enter an abstmct giving a brief urd factual summary of the document indicative of the report, even though it may sho appear elsewhere in the body of the technical re- port. If additional space i3 required, s continuation sheet aha1 be attached.
It i s highly desirable that the abstract of classified mportl be unclassified. Each paragraph of the abmtract ahall end with an indication of the military security classification of the in- formation in the paragraph, represented as (TS). IS). rc), or IU)
There is no limitation on the length of the abstract. HOW- ever, the suggested length is from 150 to 225 words.
14. KEY WORDS: Key words-are technically meankrgful term# or short phrases that characterize a report and may be used as index mtriea for catn1oging the report. Key words must be selected SO that no security classification is nquired. Identi- fiers, such as equipment model d e s i p t i o n , tmde name, millta Project code name, geographic location, may be uaed as key words but will be followed by an indication Of technical con- text. The assignment of links, mles, and weights is optional.
I D lZ'NR1. 1473 (BACK) T TNPT.A sSIFIED ".."I----