TECHNICAL 89 - NASA€¦ · provided that the expected time m io required to go from state i to state 0 is finite and that the expected cost c incurred during that time is also finite,

A SOLUTION TO A COUNTABLE SYSTEM OF EQUATIONS

ARISING I N MARKOVIAN DECISIGN PROCESSES

by

C y r u s Derman and Arthur F, Veinott , Jr.

TECHNICAL P2PGRT NO. 89

J u l y 7, 1966

Supported by the Army, Navy, A i r Force, and NASA under

Contract Nonr-225( 53) (NR-042-002)

with t h e Ofrice 3T' Naval Eesearcnx

Gerald J. Lieberman, Project Director

T h i s research w a s p a r t i a l l y supported by the Office of Naval Research under Contract Nonr-225( 77) (NR-347-010).

Reproduction i n Whole or i n Par t i s Permitted for

any Purpose of t h e United S ta t e s Government

DEPARTMENT OF STATISTICS

STANFORD TVNIVERS ITY

STANFOBD, CALIFXWIA

https://ntrs.nasa.gov/search.jsp?R=19660025515 2020-06-03T19:07:40+00:00Z

Nontechnical Summary

L e t Xo, X1, . .. be a sequence of non-negative in t ege r valued

random va r i ab le s with t h e property t h a t

Pr(Xn+l = j l X o = xo, ... , x ~ , ~ = x n-1’ x n = i ) = p i j

X n. The co l l ec t ion of random va r i ab le s n’ f o r a l l i, j , xo, ,

{Xn) is c a l l e d a Markov chain and t h e pij

are ca l l ed t r a n s i t i o n

p r o b a b i l i t i e s , We r e f e r t o Xn as t h e s t a t e of t h e process a t time E,

Let wi be t h e cost incurred a t t i m e n if t h e process i s i n state i

at t h a t time. Consider t h e system of equations

i n t h e -unknown va r i ab le s

connection with construct ing optimal rules f o r cont ro l l ing Markovian

g, vo, vl, ..(. Such a system a r i s e s i n

dec is ion processes. Also t h e numbers g, vo, vl, . are of i n t e r e s t

i n t h e i r own r i g h t . Of ten g i s the long run expected average cost

and v - v i s t h e l i m i t , as n + 03, of t h e d i f fe rence between

expected t o t a l cost during times 0, 1, ..* , n given t h a t t h e process

i j

siai.is i r i siaies i aria j respeciiveiy.

We show i n t h i s paper t h a t one so lu t ion t o the system (1) i s given

by

i

provided t h a t t h e expected time m required t o go from state i t o i o state 0 i s f i n i t e and t h a t t h e expected cost c incurred during

t h a t time i s a l s o f i n i t e , i = 0, 1, Notice t h a t v = 0.

io

0

A s an i l l u s t r a t i o n of t h e above ideas , consider a s ing le i t e m

inventory model i n which t h e demands dn per iods 1, 2, . O . are indepen-

dent., A demand of s i ze one occurs with p robab i l i t y p, 0 < p < 1, and ’

a demand of s i z e zero occurs with probabi l i ty 1 - p. Let X denote

t h e stock on hand a t the beginning of period n. An order f o r one u n i t

n

i s placed i n period n with immediate de l ivery i f X = 0; otherwise,

no order is placed i n period n. There i s a un i t cost h f o r each un i t

of s tock on hand af ter ordering i n a period. There is a cost K f o r

n

placing an order i n a period. Under these assumptions t h e nonzero

t r a n s i t i o n p r o b a b i l i t i e s are

Pi , i-1 i

Thus t h e system (1) becomes

p,, = p, pol = 1 - p, pii = P - p, and

= p, i = 1, 2, . Also wo = K + h and w = h i , i = 1 , 2 , . # . .

g + v0 = K + h + pv0 + (1 - p)vl

g + vi = i h + pvi,l + (1 - p)vi, i = 1, 2, . o o

The so lu t ion given i n (2) i s

g = p K + h ,

ii

Thus

order ing policy. Also vi i s t h e l i m i t , as n +m, of t h e amount by

which t h e expected cost i n per iods 0, 1, . O . , n starting with i

u n i t s of stock on hand exceeds t'mt s t a r t i n g with no stock on hand.

g i s here t h e long run expected average cost under t h e ind ica ted

iii

A SOLUTION TO A C0UNTABL;E SYSTEM OF EQUATIONS

ARISING I N MARKOVIAN DECISION PROCESSES

by Cyrus Derman

and

Columbia University and Stanford Vniversity

Arthur F. Veinott, JE. Stanford Universi ty

Let {X,], n = 0, 1, ... , be a Markov chain having a s t a t e space

cons is t ing of t h e non-negative in tegers and having s t a t iona ry t r a n s i t i o n

p r o b a b i l i t i e s {p,;] . Let {w 1 , i = 0, 1, ... , be a sequence of real

numbers. Consider t h e system of equations

LJ 1

g + v i = w + pijvj, i = o , 1, ... , i j = o

i n the unknown variables {g, vo, vl, 1 . I n [2], t h e system (1)

arises i n connection with conditions f o r the exis tence and construct ion

of optimal r u l e s for cont ro l l ing a Markovian dec is ion process. For a

f i n i t e state space exis tence of so lu t ions t o (I-> i s guaranteed by the

condi t ion t h a t t he Markov chain have a t most one ergodic c l a s s of s t a t e s .

(See [ 3 ] . ) I n t h i s note we give condi t ions ensuring the exis tence

(Theorem 1) and uniqueness (Theorem 2 ) of so lu t ions t o (1).

Let

j , n = 0 , 1 , ... ,

p* = E C Z n ( j ) / X o = , i, j = 0, 1, , o i j In=, ii

1

and m

If t h e last series converges absolutely, then m i s t h e mean first

passage t i m e from i t o 0 and we say m i s f i n i t e . If t h e mio

a r e a l l f i n i t e , as we assume throughout, then state 0 i s pos i t i ve

i o

i o

recur ren t and the re i s only one recurrent c lass .

m / m \ Let yn = 2 w.Z ( j ) and cio = E ' 1 y /x =

j =o J n \ \"=" n o

BJ- a n obv5om general izat ion of Theorem 5 in [l, p. 811 we ge t m

pX.w provided t h e s e r i e s i s absolutely convergent. If the 'io = .E o ij j

J =O series i s absolu te ly convergent we say c i s f i n i t e . I n appl ica t ions

w, i s o f t en t h e cos t incurred when i n state i so c i s then t h e

expected cos t during a first passage f r o m i t o 0 .

io

I i o

Theorem 1 (Existence)

If the numbers mio and cio9 i = 0, 1, .,. , are f i n i t e , then

t h e numbers

C 00

00

g = - and v = c - gmio, i = 0, 1, ... m i i o

m

s a t i s f y (1) and 1 pijvj converges absolutely, i = 0, I, . a I)

j =o Proof: -

00

2

/ m \

m m

= WT + 1 C E(y;ixo = 1 n=l j=o

a m

= + 2 1 E(Y;Ix~ = i, x1 = j ) p i j j = o n=l

CQ

= q + 1 p v i j j j = o

so (1) holds. The interchange of expectation and summation i s j u s t i f i e d

s ince t h e f i n i t e n e s s of t he m and cio imply t h a t i o

03

1 E( IY:Il Xo = i) < m. This i n turn implies t h a t t h e series above are n= o

.absolutely convergent so t h e interchange of summations i s a l s o j u s t i f i e d .

Theorem 2 (Uniqueness)

and cio, i = 0, 1, .,, , are f i n i t e , i f C m

, i = 0, 1, ... converges absolutely, and i f

m J =O 00 ;

{g, vo, vl, . . . I i s a sequence with oprjvj , i = 0, 1, . . * , J=O

converging absolutely, then

i f t h e r e i s a real number r such t h a t

(g, vo, vl, . . . I s a t i s f i e s (1) i f and onl:q

(3) L! 00

00 g = - and v = c - g m i o + s , i = O , l , . . . . m i i o

Proof:

It i s immediate from t h e hypotheses and Theorem 1 t h a t 03

p* .v .Z o ij j {g, vo, vl, ... ) converges absolu te ly a s wel l as

defined i n ( 3 ) s a t i s f i e s (1) and 00 j =o

1 pijvj. L e t {g ' , v;, v i , , o . 1 be j =o

3

. m

p?.v! converging absc lu t e ly for .I 0 1J J J =O

any o the r so lu t ion t o (1) with

m

i = 0, 1, D . . . Hence 1 pikvi i s absolu te ly convergent. Now pre- k=o

P+ - o c i multiplying both s ides of (1) by 5 = - , summing over i = 0, 1, , i m 00

03 03

using t h e r e l a t i o n s 1 ni = 1 and TI = 1 pkgTk, j = 0 , 1, * * . , and i=o j kTQ

t h e f a c t t h a t t h e interchange of summations i s j u s t i f i e d , we ge t

m

g' = 5 w which i s independent of Cv;, vi, c ? . Thus s ince i i i=o

{g, vo, vl, . . . I s a t i s f i e s (1) we must have g = g ' .

Let t ing A. 1 = v! 1 - v i' i = 0 J 1, 0 - 0 we ge t fram (1) on subtrac-

t i n g one system from t h e o ther t ha t

(4) 00

Ai = pijnj, i = 0, 1, ..* j =o

n Let 7 = P r ( X n = j l X o = i). Evidently f o r N = 1, 2 , '5 j

so

( 5 ) j = 0, 1, O O .

Since t h e s e r i e s on t h e r igh t side of (5 ) converges absolu te ly by hypoth- 00

= TI we ge t from t h e dominated convergence l i m - 1 pij 1 N j'

e s i s , and N 4 0 3 n=l

theorem t h a t

4

I . 4

m 00 00

m n Since from (5 ) ,

y ie ld ing

1 pijAj converges absolu te ly );e can i t e r a t e (4), j=o

m n

Ai = c PijAj, i = 0 , 1, . = e j n = 1, 2, e * * . j =o

Hence on subs t i t u t ing (7) i n t o (6)

m

ni = c ri A i = 0, 1, . j=o j j'

Thus Ai i s independent of i, which completes the proofo

Example :

If t h e sequences { m 1 and {wi), i = 0, 1, . . + , are bounded, i o

then so i s the sequence I C 1, i = 0, 1, r) , since i o

leio/ 5 sup mk0IwjI. k,.i

Thus Theorem 1 app l i e s and i n addi t ion t h e solutior!

t o (1) gi& i n ( 2 ) i s bounded. This r e s u l t i s used i n [ 2 ]

We remark t h a t s ince

w rn

where

5

m

j = o 1 opEj

provided

ujl

t h a t 1 opEj I u . 1 i s absolutely convergent e ThGs the hypoth-

i s absolu te ly convergent for every recurrent s t a t e k

m

j=o J

eses of Theorems 1 and 2 could have been s t a t ed only f o r s t a t e 0 and

t n e t r ans i en t s t a t e s .

6

References

[l] Chung, K. L. (1960), Markov Chains w i t h S t a t i o n a r y T r a n s i t i o n

P r o b a b i l i t i e s , Springer, Berlin.

Eerman, C j . r u s (2.966), “Iknuxrable State Markovian Decision

Processes - Average Cost Cr i te r ion ,” (To Appear in Ann. Math.

- stat.).

Howard, Ronald (1960), Dynamic Programming and Markov Processes,

John Wiley, New York.

[ 2 ]

- -

[ 33

7

1 . O R l G l N A T l N G ACTIV ITY fCo#pomte arthod Stanford University

c I

2.. REPORT SECUUITY C L A I S i F i C A T l O N

Unclassified

I

5 R E P O R T DATE 7.. T O T A L W O . OF P A C L S

J d Y 7, 1966 10

UNCLASSIFIED SecuriM Classification

7b. NO. OF REFS

3

DOCUMENT CONTROL DATA - RLD

1 - 5. C O N T R A C T OR GRANT NO.

Contract Bonr-225( 53)

fSreurirr c l a a a i f f u t f m of 1f11a. bod9 of abmtmcl end fndaxiru mmtatim m u m t k .nlamd drr (h. o w r d l mwn i a cLaaaifhdJ

I

S a . 0 R i O I N A T O I . S R E P O R T NUMOENS)

Technical Report No. 89

Department of S t a t i s t i c s Starford, California

3 . R E P O R T TITLE

A Solution t o a Countable System of Equations Arising i n Markovian Decision Processes.

I. DESCRIPTIVE NOTES (rrp. ol rrpor( md bcluofm.dah.) Technical Report

I. AUTHOW) ~ L L U I -a. nmt IU.. tntn.r> - Derman, Cyrus Veinott, Arthur F., Jr.

b. F R O J E C T NO.

:a- 342- 002 C

I None- d

I lo. A Ir A ILABIL ITY , 'L I~ ITATlON NOTICES

Distr ibut ion of t h i s document is unlimited.

- 1 ' SLPPL EMF-kTARY NOTES 12 SPONSORING MILITARY ACTIVITY

Logistics and Mathematical S t a t i . , t i c s bra^ Office of Naval Research Washington, D. C. 20362

3 ABSTRACT

A coilntable system of equations arising i n Markovlan decision processis is j tudiea. Ccnditions are given ensuring the existence and uniqw.nese of an e x p l i c i t w l u t i o n t o the system.

rnJCLASS IFIED Security Classification

KEY WORDS 4.

Markov chains Dynamic Programming Markov decision processes

INSTRUCTIONS 1. ORIGINATING ACTIVITY Enter the name and address of the contractor, subcontractor, .grantee, Department of De fense activity or other organizatron (corporate author) issuing the report. 2s. REPORT S E C m T Y CLASSIFICATION Enter the over- all security classification of the report. Indicate whether "Restricted Dats" is included. Marking is to be in accorC ance with appropriate security regulations. 26. GROUP: Automatic downgrading is specified in DoD Di- rective 5200.10 and Armed Forces Industrial Manual. Enter the group number. Also, when applicable, show that optional markings have been used for Group 3 and Group 4 as authorized. 3. REPORT T I T L E Entd the complete report title in all capital letters. Titles in all cases should be unclassified If a meaningful title cannot be selected without classification, show title classification in all capitals in parenthesis immediately following the title. 4. DESCRPTIVE NOTES If .ppropriate, enter the type of report, e.g., interim, progress, summary, annual, or final. Give the inclusive dates when a specific reporting period is covered. 5. AUTHOR(S): Enter the name(.) of author(s) as shown on or in the report, Enter last name, first name, middle initia'. If rilitary, show rank and branch of service. The name of the principal duthor is an absolute minimum requirement 6. REPORT DATE Enter the date of the report as day, month, year, or month, year. If more than one date appears on the report, use date of publication. 7s. TOTAL NUMBER OF PAGES: The total page count should follow normal pagination procedures, L e . , enter the number of pages containing information. 7h. NUMBER O F REFERENCE3 Enter the total number of references cited in the report. Ba. CONTRACT OR GRANT NUMBER: If appropriate, enter the applicable number of the contract or grant under which the report was written. Bb, &, b Ed. PROJECT NUMBER: Enter the appropriate military department identification, such aa project number, subproject number, system numbers, task number, e t o 9a. ORIGINATOR'S REPORT NUuBER(S): Enter the offi- cial report number by which the document will be identified and controlled by the originating activity. This number must b e unique to this report. 96. OTHER REPORT NUMBERS): If the report has been assigned any other report numbers (either b y the originator or b y the sponsor), also enter this number(s). 10. AVAILABILITY/LIMITATN NOTICES: Enter ury l b itations on further dissemination of the report, other than thonc

LINK B

T I LII -

ROLE C -

W T

imposed by security classification, using St8ndard statements such as:

(1)

(2)

(3)

"Qualified reque$ers may obtain copies of thls report from DDC "Foreign announcement and dissefnination of thia report by DDC is not authorized." "U. S. Government agencies may obtain copies of this report directly from DDC. Other qualified DDC users shall request through

"U. S. military agencies may obtain copies of thir report directly from DDC Other qualified usCts shall request through

"All distribution of this report is controlled. Qual- ified DDC users shall request through

t t

(4)

s s

( 5 )

e t

If the report has been furnished to the Oface of Technical Services, Department of Commerce, for sale to the public, indi- ca te this fact and enter the price, i f known. 1L SUPPLEMENTARY NOTES: Uae for additional explane tory n o t e s 12 SPONSORING MILITARY ACTIVITY: E d e r the -me of the departmental project office or laboratory sponmriw (pap ing for) the research and development Include address 13. ABSTRACT: Enter an abstmct giving a brief urd factual summary of the document indicative of the report, even though it may sho appear elsewhere in the body of the technical report. If additional space i3 required, s continuation sheet aha1 be attached.

It i s highly desirable that the abstract of classified mportl be unclassified. Each paragraph of the abmtract ahall end with an indication of the military security classification of the information in the paragraph, represented as (TS). IS). rc), or IU)

There is no limitation on the length of the abstract. HOW- ever, the suggested length is from 150 to 225 words.

14. KEY WORDS: Key words-are technically meankrgful term# or short phrases that characterize a report and may be used as index mtriea for catn1oging the report. Key words must be selected SO that no security classification is nquired. Identi- fiers, such as equipment model d e s i p t i o n , tmde name, millta Project code name, geographic location, may be uaed as key words but will be followed by an indication Of technical con- text. The assignment of links, mles, and weights is optional.

I D lZ'NR1. 1473 (BACK) T TNPT.A sSIFIED ".."I----

security Claasi5cation

TECHNICAL 89 - NASA€¦ · provided that the expected time m io required to go from state i to state 0 is finite and that the expected cost c incurred during that time is also finite,

Documents