DEPARTMENT OF ECONOMICS AND FINANCE WORKING PAPER SERIES • September 2004 Dynamic Programming: An Introduction by Example † Joachim Zietz * Middle Tennessee State University, Murfreesboro, TN Abstract Some basic dynamic programming techniques are introduced by way of example with the help of the computer algebra system Maple. The emphasis is on building confidence and intuition for the solution of dynamic problems in economics. To better integrate the material, the same examples are used to introduce different techniques. One covers the optimal extraction of a natural resource, another consumer utility maximization, and the final example solves a simple real business cycle model. Every example is accompanied by Maple computer code to make replication and extension easy. Key words: Dynamic Programming, Computer-Assisted Solutions, Learning by Example. JEL category: C610, A230 † A revised version is forthcoming in the Journal of Economic Education, Vol. 38, No. 2 (Spring 2007). *Joachim Zietz, Professor, Department of Economics and Finance, Middle Tennessee State University, Murfreesboro, TN 37132, phone: 615-898-5619, email: [email protected]., url: www.mtsu.edu/~jzietz.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
DEPARTMENT OF ECONOMICS AND FINANCE WORKING PAPER SERIES • September 2004
Dynamic Programming: An Introduction by Example†
Joachim Zietz * Middle Tennessee State University, Murfreesboro, TN
Abstract Some basic dynamic programming techniques are introduced by way of example with the help of the computer algebra system Maple. The emphasis is on building confidence and intuition for the solution of dynamic problems in economics. To better integrate the material, the same examples are used to introduce different techniques. One covers the optimal extraction of a natural resource, another consumer utility maximization, and the final example solves a simple real business cycle model. Every example is accompanied by Maple computer code to make replication and extension easy.
Key words: Dynamic Programming, Computer-Assisted Solutions, Learning by Example.
JEL category: C610, A230
†A revised version is forthcoming in the Journal of Economic Education, Vol. 38, No. 2 (Spring 2007). *Joachim Zietz, Professor, Department of Economics and Finance, Middle Tennessee State University, Murfreesboro, TN 37132, phone: 615-898-5619, email: [email protected]., url: www.mtsu.edu/~jzietz.
1 Introduction
The objective of this paper is to provide an introduction to dynamic programming that empha-
sizes intuition over mathematical rigor.1 The discussion is built around a set of examples that cover
a number of key problems from economics. In contrast to other brief introductions to dynamic pro-
gramming, such as King (2002), the paper integrates the use of a computer algebra system.2 Maple
program code is provided for every example.3 Based on classroom experience, this allows readers
not only to easily replicate and extend all problems but also to see more clearly their structure. The
problems in this paper are fully accessible to undergraduates that are familiar with the Lagrangian
multiplier method to solve contrained optimization problems.
Dynamic programming has strong similarities with optimal control, a competing approach to
dynamic optimization. Dynamic programming has its roots in the work of Bellman (1957), while
optimal control techniques rest on the work of the Russion mathematician Pontryagin and his
coworkers in the late 1950s.4 While both dynamic programming and optimal control can be ap-
plied to discrete time and continuous time problems, most current applications in economics appear
to favor dynamic programming for discrete time problems and optimal control for continous time
problems. To keep the discussion reasonably simple, this paper will only deal with discrete time
dynamic programming problems. These have become very popular in recent years in macroeco-
nomics. However, the discussion will not be limited to macroeconomics but will try to convey the
idea that dynamic programming has applications in other settings as well.
The paper is organized as follows. The next section provides some motivation for the use of
dynamic programming techniques. This is followed by a discussion of finite horizon problems. Both
numerical and non-numerical problems are treated. The subsequent section covers infinite horizon
problems, again using both numerical and non-numerical examples. The last section before the
concluding comments solves a simple stochastic infinite horizon problem of the real business cycle
variety.
2 A Motivating Example
The basic principle of dynamic programming is best illustrated with an example. Consider
for that purpose the problem of an oil company that wants to maximize profits from an oil well.
1Readers interested in more formal mathematical treatments are urged to read Sargent (1987), Stokey and Lucas(1989) or Ljungqvist and Sargent (2000).
2The book by Miranda and Fackler (2002) emphasizes numerical techniques to solve dynamic programming prob-lems. The software package Matlab is used extensively. Adda and Cooper (2003) put less emphasis on computatioalaspects, but nonetheless make available a fair amount of computer code in both Matlab and Gauss.
3The brief overview of discrete time dynamic programming in Parlar (2000) was very helpful to the author becauseof its focus on coding in Maple.
4Kamien and Schwartz (1981) contains one of the most thorough treatments of optimal control. Most majortextbooks in mathematical methods for economists also contain chapters on optimal control.
1
Revenue at time t is given as
Rt = ptut,
where p is the price of oil and where u is the amount of oil that is extracted and sold. In dynamic
programming applications, u is typically called a control variable. The company’s cost function is
quadratic in the amount of oil that is extracted,
Ct = 0.05u2t .
The amount of oil remaining in the well follows the recursion or transition equation
xt+1 = xt − ut,
where x is known as a state variable in dynamic programming language. Since oil is a non-renewable
resource, pumping out oil in the amount of u at time t means that there is exactly that much less
left in the oil well at time t+ 1. We assume that the company applies the discount factor β = 0.9
for profit streams that occur in the future. We also assume that the company intends to have the
oil well depleted in 4 years, which means that x4 = 0. This is known as a boundary condition in
dynamic programming. Given these assumptions, the central question to be solved is how much oil
should be pumped at time t, t = 0, ..., 3 to maxime the discounted profit stream.
If one had never heard of dynamic programming or optimal control, the natural way to solve
this problem is to set up a Lagrangian multiplier problem along the following lines,
max L(x,u,λ) =3Xt=0
βt(Rt − Ct) +4Xt=1
λt(xt − xt−1 + ut−1),
where x = (x1, x2, x3), with x0 given exogenously and x4 = 0 by design, u = (u0, u1, u2, u3),λ =
(λ1,λ2,λ3,λ4, ) and where the constraints simply specify that the amount of oil left at period t
is equal to the amount in the previous period minus the amount pumped in the previous period.
Using standard techniques the above Lagrangian problem can be solved for the decision variables
x and u and the Lagrangian multipliers λ in terms of the exogenous variables {p0, p1, p2, p3, x0}.The Maple commands to solve this problem are given as follows:
Next, the above value function is inserted into the Bellman equation for period 0, which is given as
V0 = maxu0(p0u0 − 0.05u20) + 0.9V1. (10)
After substituting out all occurrences of x1 in the above equation with (x0 − u0), the right-handside of equation 10 is maximized with respect to u0. One obtains
The decision rule for t = 0 still needs calculating. It requires the value function for t = 1. This value
function is equal to the Bellman equation for period 1, with c1 replaced by the optimal decision
rule, equation 12,
V1 = lna1 (1 + r)
1 + β+ β ln
a1 (1 + r)2 β
1 + β.
Hence, the Bellman equation for t = 0 is given as
V0 = maxc0ln c0 + βV1
V0 = maxc0ln c0 + β ln
a1 (1 + r)
1 + β+ β2 ln
a1 (1 + r)2 β
1 + β
Substituting out a1 with
a1 = (1 + r)a0 − c0
gives
V0 = maxc0ln c0 + β ln
µ(1 + r)
(1 + r)a0 − c01 + β
¶+ β2 ln
µ(1 + r)2
((1 + r)a0 − c0)β1 + β
¶Maximizing the right-hand side with respect to the control variable c0,results in the following
9
first-order condition and optimal value for c0
dV0dc0
=−a0(1 + r) + c0(1 + β + β2)
c0 (−a0(1 + r) + c0) = 0
c0 = (1 + r)a0
1 + β + β2. (13)
A comparison of the consumption decision rules for t = 0, 1, 2 reveals a pattern,
ct =at (1 + r)P2−t
i=0 βi, t = 0, 1, 2,
or, more generally, for t = 0, ..., n
ct =at (1 + r)Pn−t
i=0 βi=
at
βPn−ti=0 β
i=
atPn−t+1i=1 βi
, t = 0, ..., n. (14)
The ability to recognize and concisely describe such a pattern plays a key role in applications of
dynamic programming in economics, in particular those applications that deal with infinite horizon
problems. They will be discussed next.
The Maple program that solves the above consumption problem follows.
restart: N:=2: Digits:=4:
V[N]:=unapply(log((1+r)*a),a);
# V[N] is made a function of a
# this value function follows from the boundary condition
for t from N-1 by -1 to 0 do
VV[t]:=log(c)+beta*V[t+1]((1+r)*a-c);
# Since V is a function of a, the term in parenthesis after
# V[t+1] is substituted for a everywhere a appears in V[t+1]
deriv[t]:=diff(VV[t],c);
mu[t]:=solve(deriv[t],c);
V[t]:=unapply(simplify(subs(c=mu[t],VV[t])),a);
od;
4 Infinite Horizon Problems
A question often posed in economics is to find the optimal decision rule if the planning horizon
does not consist of a fixed number of n periods but of an infinite number of periods. Examing the
limiting case n → ∞ may also be of interest if n is not strictly infinite but merely large because
taking the limit leads to a decision rule that is both simplified and the same for every period. Most
10
often that is preferred when compared to a situation where one has to deal with a distinct and
rather complex rule for each period from t = 0 to t = n. We shall illustrate this point with the
consumption problem of the last section.
4.1 A Problem Without Numerics
We want to modify the optimal consumption decision rule given in equation 14 so it applies for
an infinite time horizon. For taking the limit, the first part of equation 14 is most useful,
limn→∞ ct = lim
n→∞at (1 + r)Pn−t
i=0 βi.
In taking the limit, one needs to make use of the fact that
∞Xi=0
βi =1
1− β.
Application of this rule yields
limn→∞ ct = at (1 + r) (1− β) = atr.
The decision rule for consumption in period t simplifies to the intuitively obvious: consume in t
only what you get in interest from your endowment in t. The associated transition function for
wealth is given as
at+1 = (1 + r)at − ct = (1 + r)at − atr = at,
that is, wealth will remain constant over time.
This example has illustrated one way of arriving at optimal decision rules that apply to an
infinite planning horizon: simply solve the problem for a finite number of periods, identify and
write down the pattern for the evolution of the control variable, and take the limit n → ∞. Itshould be obvious that the same procedure can be applied to identify the value function for time
period t. To illustrate this point, recall the sequence of value functions for t = 1, 2
V2 = ln [(1 + r)a2] (15)
V1 = lna1 (1 + r)
1 + β+ β ln
a1 (1 + r)2 β
1 + β. (16)
11
It is easy although tedious to verify that the value function for t = 0 is given as
V0 = lna0 (1 + r)
1 + β + β2+ β ln
a0 (1 + r)2 β
1 + β + β2+ β2 ln
a0 (1 + r)3 β2
1 + β + β2. (17)
One can proceed now as in the case of the consumption rule. The following pattern emerges from
equations 15 to 17,
Vt =n−tXi=0
βi lnat (1 + r)
i+1 βiPn−ti=0 β
i. (18)
It is possible to check the correctness of this pattern by letting a program like Maple calculate the
sequence of the Vt. For equations 15 through 17, the Maple code is given as
n:=2:
seq(V[t]=sum(beta^i*log(a[t]*(1+r)^(i+1)*beta^i/
(sum(beta^(i),i=0..n-t))),i=0..n-t),t=0..n);
After one has verified that the formula is correct, one can take the limit n→∞
limn→∞Vt = lim
n→∞
n−tXi=0
βi lnat (1 + r)
i+1 βiPn−ti=0 β
i.
Since interest centers on finding an expression involving the variable at, the idea is to isolate at.
This can be done as follows
limn→∞Vt = lim
n→∞
n−tXi=0
βi
Ãln at + ln
(1 + r)i+1 βiPn−ti=0 β
i
!
or
limn→∞Vt = lim
n→∞
n−tXi=0
βi ln at + limn→∞
n−tXi=0
βi ln(1 + r)i+1 βiPn−t
i=0 βi.
The last expression can be rewitten as
limn→∞Vt = lim
n→∞
n−tXi=0
βi ln at + η (19)
where η is an expression independent of a. Taking the limit leads to
limn→∞Vt =
1
1− βln at + η = (1 +
1
r) ln at + η. (20)
With the help of a computer algebra program one could circumvent equation 18 and the following
12
algebra altogether by going from equation 17 immediately to an expression like 19 for n = 2. The
sequence of Maple commands below takes equation 17, expands it, and isolates the terms in a.
assume(beta>0,r>0);
assume(a[0],real);
V[0]:=expand(V[0]);
collect(V[0],log(a[0]));
The output of these commands is
V0 = (1 + β + β2) ln a0 + ζ (21)
where ζ consists of a number of terms in β and r. One could do the same with equation 16 if one
had trouble moving from equation 21 immediately to equations 19 and then 20.
4.2 A Numerically Specified Problem
Infinite horizon problems that are numerically specified can be solved by iterating on the value
function. This will be demonstrated for the above consumption optimization problem. The task is
to find the optimal decision rule for consumption regardless of time t. For that purpose we employ
the Bellman equation without time subscripts,
V (a) = maxcln c+ βV (a0),
where a and a0 denote current and future value of the state variable, respectively. The time
subscripts are eliminated in the Bellman equation to reflect the idea that the value function V (a)
needs to satisfy the Bellman equation for any a. The iteration method starts with an initial guess
of the solution V (a), which we will identiy as V (0)(a). Guessing a possible solution is not as difficult
as it sounds. Each class of problem typically has associated with it a general form of the solution.
For example, in the current case we know from solving the consumption problem for t = 2, that
the value function will somehow depend on the log of a
V (0)(a) = A+B ln a.
We will assume that β = 0.95.
4.2.1 A numerically specified initial guess
We convert the initial guess of the value function into numerical form by assuming A = B = 1,
V (0)(a) = ln a.
13
The initial guess V (0)(a) is substituted on the right-hand side of the Bellman equation
V (1)(a) = maxcln c+ 0.95 ln a0.
Next, we make use of the tranition equation
a0 = (1 + 0.05263)a− c
to substitute out a0
V (1)(a) = maxcln c+ 0.95 ln (1.05263a− c) .
We maximize the right-hand side of V (1)(a) with respect to the control variable c,
dV (1)(a)
dc=1
c− 0.95
(1.05263a− c) = 0.
which gives c = 0.53981a. Upon substituting the solution for c back into V (1)(a) one obtains
V (1)(a) = ln(0.53981a) + 0.95 ln (1.05263a− (0.53981a))
or after simplifying
V (1)(a) = −1.251 + 1.95 ln a.
Since
V (1)(a) 6= V (0)(a),
we need to continue to iterate on the value function. For that purpose, we substitute the new guess,
V (1)(a), into the Bellman equation and get
V (2)(a) = maxcln c+ 0.95(−1.251 + 1.95 ln a0).
Substitution of the transition equation for a0 gives
V (2)(a) = maxcln c+ 0.95(−1.251 + 1.95 ln(1.05263a− c))
which simplifies to
V (2)(a) = maxcln c− 1.1884 + 1.8525 ln (1.05263a− c) .
14
Maximizing the right-hand side of V (2)(a) with respect to c one obtains c = 0.36902a. Substituting
the optimal policy rule for c back into V (2)(a) gives
and set the coefficients of ln a in V (a) and V (0)(a) equal to each other,
(1 + 0.95B) = B.
This results in B = 20. To solve for the parameter A, we substitute B = 20 into both V (a) and
V (0)(a), set the two equal, and solve for A,
−2.9444 + 20.0 ln a+ .95A = A+ 20 ln a
The solution is A = −58.888. With the parameters A and B known, the policy function for c is
given as
c = 0.05263a
and the value function is
V (a) = −58.888 + 20.0 ln a.
Both match the ones obtained in earlier sections with other solution methods.
The Maple commands to replicate the above solution method are
restart: Digits:=8:
assume(a>0):
beta:=0.95:
V[0]:=(a)->A+B*log(a);
VV:=(a,c)->ln(c)+beta*V[0]((1+((1/beta)-1))*a-c);
diff(VV(a,c),c);
c_opt:=evalf(solve(%,c));
subs(c=c_opt,VV(a,c));
V[1]:=collect(expand(simplify(%)),ln(a));
B:=solve(diff(V[1],a)*a=B,B);
V0:=A+B*log(a);
A:=solve(V[1]=V0,A);
c_opt;V0;
17
5 A Stochastic Infinite Horizon Example
Dynamic programming problems can be made stochastic by letting one or more state variables
be influenced by a stochastic disturbance term. Most of the applications of dynamic programming
in macroeconomics follow this path. One example is provided in this section.
We will consider the basic real business cycle (RBC) model discussed in King (2002), which
is very much related to Kydland and Prescott (1982) and Long and Plosser (1983). Many RBC
models are variations of this basic model.
The problem is to maximize the expected discounted sum of present and future utility, where
utility is assumed to be a logarithmic function of consumption (C) and leisure (1− n)
maxE0
∞Xy=0
βt [lnCt + δ ln(1− nt )] ,
where n represents labor input and available time is set at unity. Maximization is subject to
the market clearing constraint that output (y) equals the sum of the two demand components,
consumption and investment
yt = Ct + kt,
where investment equals the capital stock (k) because capital is assumed to depreciate fully in one
period. There are two transition equations, one for each of the two state variables, output and the
state of technology (A),
yt+1 = At+1kαt n
1−αt
At+1 = Aρt e
εt+1 .
The transition equation of y is a dynamic Cobb-Douglas production function and the transition
equation of A follows an autoregressive process subject to a disturbance term (ε). C, n, and k
are the problem’s control variables. To make the problem easier to follow, we use numbers for the
parameters, β = 0.9,α = 0.3, δ = 1, ρ = 0.8.
The Bellman equation for this problem is given as
V (y,A) = maxC,k,n
lnCt + ln(1− n) + βEV (0)(y,A),
where E is the expectations operator. The Bellman equation can be simplified by substituting for
C and, hence, reducing the problem to two control variables
V (y,A) = maxk,n
ln(y − k) + ln(1− n) + 0.9EV (0)(y,A).
The next step in the solution process is to find an initial value function. In the earlier consumption
18
problem, which was similar in terms of the objective function but had only one state variable (a),
we found that the value function was of the type
V (a) = A+B ln a.
Since there are two state variables in the present problem, we try the analogous solution
V (0)(y,A) = Z +G ln y +H lnA.
From here on, the problem follows the methodology used for the previous section. Substituting
V (0)(y,A) into the Bellman equation gives
V (y,A) = maxk,n
ln(y − k) + ln(1− n) + 0.9E (Z +G ln y +H lnA) .
Before the right-hand side of V (y,A) can be maximized, y and A that appear in the expectations
term are substituted out by their transition equations,
V (y,A) = maxk,n
ln(y − k) + ln(1− n) + 0.9E ¡Z +G ln ¡A0.8eεk0.3n0.7¢+H ln ¡A0.8eε¢¢ .The right-hand side is expanded and then maximized with respect to both k and n. This results in
two first-order conditions that need to be solved simultaneously for the two control variables. The
solution of the maximization process is given as
k =0.27Gy
1 + 0.27G, n =
0.63G
1 + 0.63G.
These solutions are substituted back into the Bellman equation and the expectations operator is
employed to remove the disturbance terms,
V (y,A) = ln
µy − 0.27Gy
1 + 0.27G
¶+ ln
µ1− 0.63G
1 + 0.63G
¶+0.9
µZ +G
µ0.8 lnA+ 0.3 ln
0.27Gy
1 + 0.27G+ 0.7 ln
0.63G
1 + 0.63G
¶+H0.8 lnA
¶.
As in the consumption example, coefficients are compared between V (y,A) and V (0)(y,A) to solve
for the unknown values of Z,G, and H. To make a coefficient comparison feasible, we expand the
right-hand side of V (y,A), then simplify, and finally, collect the ln y and lnA terms to get
Setting the coefficients of ln y in V (y,A) and V (0)(y,A) equal to each other, we obtain the deter-
mining equation
(1 + 0.27G) = G,
which gives G = 1.37. Similarly comparing the coefficients of lnA between V (y,A) and V (0)(y,A),
0.72(G+H) = H,
gives H = 3.52. Making use of the solutions for G and H in V (y,A), the value of Z can be
determined from the equation
V (y,A)− 1.37 ln y − 3.52 lnA = Z
as Z = −20.853.With the values of Z, G, and H known, the value function that solves the problem is given as
V (y,A) = −20.853 + 1.37 ln y + 3.52 lnA,
while the policy functions or decision rules for k, n, and C are
k = 0.27y, n = 0.463 C = 0.73y.
We note that the addition of a stochastic term to the variable A has no effect on the optimal
decision rules for k, n, and C. This result is not an accident but, as discussed at the beginning of
this paper, the key reason why dynamic programming is preferred over a traditional Lagrangian
multiplier method in dynamic optimization problems: unpredictable shocks to the state variables
do not change the optimal decision rules for the control variables.7
The above solution process can be automated and processed with other parameters if one uses
the Maple commands below,
7The stochastic nature of state variables do affect how the state variables move through time. Different sequencesof shocks to the state variables produce different time paths even with the same decision rules for the control variables.From a large number of such stochastic time paths, RBC researchers calculate standard deviations or other momentsof the state and control variables and compare them to the moments of actually observed variables to check theirmodels. Compare Adda and Cooper (2003) on this.