Top Banner
373
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Optimal Control Theory With Applications in Economics
Page 2: Optimal Control Theory With Applications in Economics

Optimal Control Theory with Applications in Economics

Page 3: Optimal Control Theory With Applications in Economics
Page 4: Optimal Control Theory With Applications in Economics

Optimal Control Theory with Applications in Economics

Thomas A. Weber

Foreword by A. V. Kryazhimskiy

The MIT PressCambridge, MassachusettsLondon, England

Page 5: Optimal Control Theory With Applications in Economics

© 2011 Massachusetts Institute of Technology

All rights reserved. No part of this book may be reproduced in any form by any elec-tronic or mechanical means (including photocopying, recording, or information storageand retrieval) without permission in writing from the publisher.

For information about special quantity discounts, please [email protected].

This book was set in Palatino by Westchester Book Composition. Printed and bound inthe United States of America.

Library of Congress Cataloging-in-Publication Data

Weber, Thomas A., 1969–Optimal control theory with applications in economics / Thomas A. Weber; foreword byA. V. Kryazhimskiy.

p. cm.Includes bibliographical references and index.ISBN 978-0-262-01573-8 (hardcover : alk. paper)1. Economics—Mathematical models. 2. Control theory. 3. Mathematical optimization.4. Game theory. I. Title.HB135.W433 2011330.01′515642—dc22 2010046482

10 9 8 7 6 5 4 3 2 1

Page 6: Optimal Control Theory With Applications in Economics

For Wim

Page 7: Optimal Control Theory With Applications in Economics
Page 8: Optimal Control Theory With Applications in Economics

Contents

Foreword by A. V. Kryazhimskiy ixAcknowledgments xi

1 Introduction 11.1 Outline 31.2 Prerequisites 51.3 A Brief History of Optimal Control 51.4 Notes 15

2 Ordinary Differential Equations 172.1 Overview 172.2 First-Order ODEs 202.3 Higher-Order ODEs and Solution Techniques 672.4 Notes 752.5 Exercises 76

3 Optimal Control Theory 813.1 Overview 813.2 Control Systems 833.3 Optimal Control—A Motivating Example 883.4 Finite-Horizon Optimal Control 1033.5 Infinite-Horizon Optimal Control 1133.6 Supplement 1: A Proof of the Pontryagin

Maximum Principle 1193.7 Supplement 2: The Filippov Existence Theorem 1353.8 Notes 1403.9 Exercises 141

Page 9: Optimal Control Theory With Applications in Economics

viii Contents

4 Game Theory 1494.1 Overview 1494.2 Fundamental Concepts 1554.3 Differential Games 1884.4 Notes 2024.5 Exercises 203

5 Mechanism Design 2075.1 Motivation 2075.2 A Model with Two Types 2085.3 The Screening Problem 2155.4 Nonlinear Pricing 2205.5 Notes 2265.6 Exercises 227

Appendix A: Mathematical Review 231A.1 Algebra 231A.2 Normed Vector Spaces 233A.3 Analysis 240A.4 Optimization 246A.5 Notes 251

Appendix B: Solutions to Exercises 253B.1 Numerical Methods 253B.2 Ordinary Differential Equations 258B.3 Optimal Control Theory 271B.4 Game Theory 302B.5 Mechanism Design 324

Appendix C: Intellectual Heritage 333

References 335Index 349

Page 10: Optimal Control Theory With Applications in Economics

Foreword

Since the discovery, by L. S. Pontryagin, of the necessary optimality con-ditions for the control of dynamic systems in the 1950s, mathematicalcontrol theory has found numerous applications in engineering and inthe social sciences. T. A. Weber has dedicated his book to optimal con-trol theory and its applications in economics. Readers can find here asuccinct introduction to the basic control-theoretic methods, and alsoclear and meaningful examples illustrating the theory.

Remarkable features of this text are rigor, scope, and brevity, com-bined with a well-structured hierarchical approach. The author startswith a general view on dynamical systems from the perspective of thetheory of ordinary differential equations; on this basis, he proceeds tothe classical optimal control theory, and he concludes the book withmore recent views of game theory and mechanism design, in whichoptimal control plays an instrumental role.

The treatment is largely self-contained and compact; it amounts toa lucid overview, featuring much of the author’s own research. Thecharacter of the problems discussed in the book promises to make thetheory accessible to a wide audience. The exercises placed at the chapterendings are largely original.

I am confident that readers will appreciate the author’s style andstudents will find this book a helpful guide on their path of discovery.

A. V. KryazhimskiySteklov Institute of Mathematics, Russian Academy of SciencesInternational Institute for Applied Systems Analysis

Page 11: Optimal Control Theory With Applications in Economics
Page 12: Optimal Control Theory With Applications in Economics

Acknowledgments

This book is based on my graduate course on Applied Optimal ControlTheory taught both at Moscow State University and Stanford Uni-versity. The development of this course was made possible throughfunding from Moscow State University (MSU), the Steklov Mathemati-cal Institute in Moscow, and Stanford University. In particular, a coursedevelopment grant from James Plummer and Channing Robertson,dean and former vice dean of the School of Engineering at Stanford Uni-versity, allowed Elena Rovenskaya to spend time at Stanford to work onproblem sets and solutions. The numerous discussions with her wereinvaluable, and I am very grateful for her contributions. Nikolai Grig-orenko, the deputy head of the Optimal Control Department at MSU,was instrumental in making this possible.

I am very grateful to Evgenii Moiseev, dean of the Faculty of Com-putational Mathematics and Cybernetics at Lomonosov Moscow StateUniversity, for his 2007 invitation to deliver a summer course ondynamic optimization with applications in economics. My deepest grat-itude also goes to Arkady Kryazhimskiy and Sergey Aseev for theirencouragement to write this book and their continuous support andfriendship. They have fueled my interest in optimal control since 2001,when I was fortunate enough to meet them while participating in theYoung Scientists Summer Program at the International Institute forApplied Systems Analysis in Laxenburg, Austria. They also invitedme to the 2008 International Conference on Differential Equations andTopology in Moscow, dedicated to the centennial anniversary of LevPontryagin, where the discussions about the summer course continued.

I am indebted to the students in Moscow and Stanford for the manyafter-class discussions where they often taught me, perhaps unknow-ingly, just as much as I taught them. Kenneth Gillingham provideda set of impeccable handwritten notes from the course, which helped

Page 13: Optimal Control Theory With Applications in Economics

xii Acknowledgments

organize the thoughts for the book. Naveed Chehrazi was an excel-lent course assistant who contributed numerous insightful suggestions.Elena Rovenskaya and Denis Pivovarchuk organized the practice ses-sions in Moscow. Stefan Behringer, Andrei Dmitruk, and three MITPress reviewers provided useful feedback on earlier versions of thebook. Markus Edvall from TOMLAB was helpful in debugging sev-eral numerical algorithms. I would like to thank Alice Cheyer for herdetailed copyediting, as well as Alexey Smirnov and my assistant, Mari-lynn Rose, for their help with editing earlier versions of the manuscript.Jane Macdonald at the MIT Press believed in this project from themoment I told her about it at the 2009 European Meeting of the Econo-metric Society in Barcelona. I am very grateful for her helpful advice,encouragement, and great support throughout the publishing process.

I should also like to acknowledge my indebtedness to the great teach-ers in control theory and economics, whom I encountered at MIT, theUniversity of Pennsylvania, and Stanford University, in particularKenneth Arrow, Dimitri Bertsekas, David Cass, Richard Kihlstrom,Alexandre Kirillov, Steve Matthews, and Ilya Segal. Richard Vinter intro-duced me to control theory in 1994 as a wonderful research advisorat Imperial College London. In my thinking and intellectual approachI also owe very much to my advisor at MIT, Alexandre Megretski, whodid not believe in books and whose genius and critical mind I admire.Paul Kleindorfer, my dissertation advisor at Wharton, taught me somuch, including the fact that broad interests can be a real asset. I amvery grateful for his friendship and constant support.

My special thanks go to Sergey Aseev, Eric Clemons, David Luen-berger, James Sweeney, and Andrew Whinston for their friendship, aswell asAnn and Wim for their sacrifice, endless love, and understanding.

Stanford, CaliforniaMay 2011

Page 14: Optimal Control Theory With Applications in Economics

1 Introduction

Our nature consists in movement;absolute rest is death.

—Blaise Pascal

Change is all around us. Dynamic strategies seek to both anticipate andeffect such change in a given system so as to accomplish objectives ofan individual, a group of agents, or a social planner. This book offersan introduction to continuous-time systems and methods for solvingdynamic optimization problems at three different levels: single-persondecision making, games, and mechanism design. The theory is illus-trated with examples from economics. Figure 1.1 provides an overviewof the book’s hierarchical approach.

The first and lowest level, single-person decision making, concernsthe choices made by an individual decision maker who takes the evo-lution of a system into account when trying to maximize an objectivefunctional over feasible dynamic policies. An example would be an eco-nomic agent who is concerned with choosing a rate of spending fora given amount of capital, each unit of which can either accumulateinterest over time or be used to buy consumption goods such as food,clothing, and luxury items.

The second level, games, addresses the question of finding predictionsfor the behavior and properties of dynamic systems that are influencedby a group of decision makers. In this context the decision makers(players) take each other’s policies into account when choosing theirown actions. The possible outcomes of the game among different play-ers, say, in terms of the players’ equilibrium payoffs and equilibriumactions, depend on which precise concept of equilibrium is applied.Nash (1950) proposed an equilibrium such that players’ policies do notgive any player an incentive to deviate from his own chosen policy, given

Page 15: Optimal Control Theory With Applications in Economics

2 Chapter 1

OrdinaryDifferentialEquations

(Chapter 2)

Mechanism Design

(Chapter 5)

Game Theory

(Chapter 4)

Optimal Control Theory

(Chapter 3)

Figure 1.1Topics covered in this book.

the other players’ choices are fixed to the equilibrium policies. A clas-sic example is an economy with a group of firms choosing productionoutputs so as to maximize their respective profits.

The third and highest level of analysis considered here is mechanismdesign, which is concerned with a designer’s creation of an environmentin which players (including the designer) can interact so as to maximizethe designer’s objective functional. Leading examples are the design ofnonlinear pricing schemes in the presence of asymmetric information,and the design of markets. Arguably, this level of analysis is isomorphicto the first level, since the players’ strategic interaction may be foldedinto the designer’s optimization problem.

The dynamics of the system in which the optimization takes place aredescribed in continuous time, using ordinary differential equations. Thetheory of ordinary differential equations can therefore be considered thebackbone of the theory developed in this book.

Page 16: Optimal Control Theory With Applications in Economics

Introduction 3

1.1 Outline

Ordinary Differential Equations (ODEs) Chapter 2 reviews basicconcepts in the theory of ODEs. One-dimensional linear first-orderODEs can be solved explicitly using the Cauchy formula. The key insightfrom the construction of this formula (via variation of an integrationconstant) is that the solution to a linear initial value problem of theform

x + g(t)x = h(t), x(t0) = x0,

for a given tuple of initial data (t0, x0) can be represented as the super-position of a homogeneous solution (obtained when h = 0) and aparticular solution to the original ODE (but without concern for theinitial condition). Systems of linear first-order ODEs,

x = A(t)x + b(t), (1.1)

with an independent variable of the form x = (x1, . . . , xn) and an initialcondition x(t0) = x0 can be solved if a fundamental matrix�(t, t0) as thesolution of a homogeneous equation is available. Higher-order ODEs(containing higher-order derivatives) can generally be reduced to first-order ODEs. This allows limiting the discussion to (nonlinear) first-orderODEs of the form

x = f (t, x), (1.2)

for t ≥ t0. Equilibrium points, that is, points x at which a system does notmove because f (t, x) = 0, are of central importance in understanding acontinuous-time dynamic model. The stability of such points is usuallyinvestigated using the method developed by Lyapunov, which is basedon the principle that if system trajectories x(t) in the neighborhood of anequilibrium point are such that a certain real-valued function V(t, x(t))is nonincreasing (along the trajectories) and bounded from below by itsvalue at the equilibrium point, then the system is stable. If this functionis actually decreasing along system trajectories, then these trajectoriesmust converge to an equilibrium point. The intuition for this finding isthat the Lyapunov function V can be viewed as energy of the systemthat cannot increase over time. This notion of energy, or, in the con-text of economic problems, of value or welfare, recurs throughout thebook.

Page 17: Optimal Control Theory With Applications in Economics

4 Chapter 1

Optimal Control Theory Given a description of a system in the formof ODEs, and an objective functional J(u) as a function of a dynamicpolicy or control u, together with a set of constraints (such as initialconditions or control constraints), a decision maker may want to solvean optimal control problem of the form

J(u) =∫ T

t0

h(t, x(t), u(t)) dt −→ maxu(·)

, (1.3)

subject to x(t) = f (t, x(t), u(t)), x(t0) = x0, and u ∈ U , for all t ∈ [t0, T].Chapter 3 introduces the notion of a controllable system, which is a sys-tem that can be moved using available controls from one state to another.Then it takes up the construction of solutions (in the form of state-controltrajectories (x∗(t), u∗(t)), t ∈ [t0, T]) to such optimal control problems:necessary and sufficient optimality conditions are discussed, notablythe Pontryagin maximum principle (PMP) and the Hamilton-Jacobi-Bellman (HJB) equation. Certain technical difficulties notwithstanding,it is possible to view the PMP and the HJB equation as two complemen-tary approaches to obtain an understanding of the solution of optimalcontrol problems. In fact, the HJB equation relies on the existence of acontinuously differentiable value function V(t, x), which describes thedecision maker’s optimal payoff, with the optimal control problem ini-tialized at time t and the system in the state x. This function, somewhatsimilar to a Lyapunov function in the theory of ODEs, can be inter-preted in terms of the value of the system for a decision maker. Thenecessary conditions in the PMP can be informally derived from theHJB equation, essentially by restricting attention to a neighborhood ofthe optimal trajectory.

Game Theory When more than one individual can make payoff-relevant decisions, game theory is used to determine predictions aboutthe outcome of the strategic interactions. To abstract from the complex-ities of optimal control theory, chapter 4 introduces the fundamentalconcepts of game theory for simple discrete-time models, along thelines of the classical exposition of game theory in economics. Once allthe elements, including the notion of a Nash equilibrium and its var-ious refinements, for instance, via subgame perfection, are in place,attention turns to differential games. A critical question that arises indynamic games is whether the players can trust each other’s equilib-rium strategies, in the sense that they are credible even after the gamehas started. A player may, after a while, find it best to deviate from a

Page 18: Optimal Control Theory With Applications in Economics

Introduction 5

Nash equilibrium that relies on a “noncredible threat.” The latter con-sists of an action which, as a contingency, discourages other playersfrom deviating but is not actually beneficial should they decide to ignorethe threat. More generally, in a Nash equilibrium that is not subgame-perfect, players lack the ability to commit to certain threatening actions(thus, noncredible threats), leading to “time inconsistencies.”

Mechanism Design A simple economic mechanism, discussed inchapter 5, is a collection of a message space and an allocation function.The latter is a mapping from possible messages (elements of the messagespace) to available allocations. For example, a mechanism could consistof the (generally nonlinear) pricing schedule for bandwidth delivered bya network service provider. Amechanism designer, who is often referredto as the principal, initially announces the mechanism, after which theagent sends a message to the principal, who determines the outcomefor both participants by evaluating the allocation function. More gen-eral mechanisms, such as an auction, can include several agents playinga game that is implied by the mechanism.

Optimal control theory becomes useful in the design of a static mech-anism because of an information asymmetry between the principal andthe various agents participating in the mechanism. Assuming for sim-plicity that there is only a single agent, and that this agent possessesprivate information that is encapsulated in a one-dimensional type vari-able θ in a type space � = [θ

¯, θ ], it is possible to write the principal’s

mechanism design problem as an optimal control problem.

1.2 Prerequisites

The material in this book is reasonably self-contained. It is recom-mended that the reader have acquired some basic knowledge ofdynamic systems, for example, in a course on linear systems. In addi-tion, the reader should possess a firm foundation in calculus, since thelanguage of calculus is used throughout the book without necessarilyspecifying all the details or the arguments if they can be consideredstandard material in an introductory course on calculus (or analysis).

1.3 A Brief History of Optimal Control

Origins The human quest for finding extrema dates back to antiquity.Around 300 B.C., Euclid of Alexandria found that the minimal distancebetween two points A and B in a plane is described by the straight

Page 19: Optimal Control Theory With Applications in Economics

6 Chapter 1

line AB, showing in his Elements (Bk I, Prop. 20) that any two sides ofa triangle together are greater than the third side (see, e.g., Byrne 1847,20). This is notwithstanding the fact that nobody has actually ever seen astraight line. As Plato wrote in hisAllegory of the Cave1 (Republic, Bk VII,ca. 360 B.C.), perceived reality is limited by our senses ( Jowett 1881).Plato’s theory of forms held that ideas (or forms) can be experiencedonly as shadows, that is, imperfect images (W. D. Ross 1951). WhileEuclid’s insight into the optimality of a straight line may be regardedmerely as a variational inequality, he also addressed the problem offinding extrema subject to constraints by showing in his Elements (Bk VI,Prop. 27) that “of all the rectangles contained by the segments of a givenstraight line, the greatest is the square which is described on half theline” (Byrne 1847, 254). This is generally considered the earliest solvedmaximization problem in mathematics (Cantor 1907, 266) because

a2

∈ arg maxx∈R

{x(a − x)},

for any a > 0. Another early maximization problem, closely related tothe development of optimal control, is recounted by Virgil in his Aeneid(ca. 20 B.C.). It involves queen Dido, the founder of Carthage (located inmodern-day Tunisia), who negotiated to buy as much land as she couldenclose using a bull’s hide. To solve her isoperimetric problem, that is, tofind the largest area with a given perimeter, she cut the hide into a longstrip and laid it out in a circle. Zenodorus, a Greek mathematician, stud-ied Dido’s problem in his book On Isoperimetric Figures and showed that acircle is greater than any regular polygon of equal contour (Thomas 1941,2:387–395). Steiner (1842) provided five different proofs that any figureof maximal area with a given perimeter in the plane must be a circle. Heomitted to show that there actually exists a solution to the isoperimetricproblem. Such a proof was given later by Weierstrass (1879/1927).2

Remark 1.1 (Existence of Solutions) Demonstrating the existence of asolution to a variational problem is in many cases both important andnontrivial. Perron (1913) commented specifically on the gap left bySteiner in the solution of the isoperimetric problem regarding existence,

1. In the Allegory of the Cave, prisoners in a cave are restricted to a view of the real world(which exists behind them) solely via shadows on a wall in front of them.2. Weierstrass’s numerous contributions to the calculus of variations, notably on the exis-tence of solutions and on sufficient optimality conditions, are summarized in his extensivelectures on Variationsrechnung, published posthumously based on students’ notes.

Page 20: Optimal Control Theory With Applications in Economics

Introduction 7

and he provided several examples of variational problems withoutsolutions (e.g., finding a polygon of given perimeter and maximal sur-face). Astriking problem without a solution was posed by Kakeya (1917).He asked for the set of minimal measure that contains a unit line segmentin all directions. One can think of such a Kakeya set (or Besicovitch set) asthe minimal space that an infinitely slim car would need to turn aroundin a parking spot. Somewhat surprisingly, Besicovitch (1928) was ableto prove that the measure of the Kakeya set cannot be bounded frombelow by a positive constant. �

The isoperimetric constraint appears naturally in economics as abudget constraint, which was recognized by Frisi in his written-in com-mentary on Verri’s (1771) notion that a political economy shall be tryingto maximize production subject to the available labor supply (Robert-son 1949). Such budget-constrained problems are natural in economics.3

For example, Sethi (1977) determined a firm’s optimal intertemporaladvertising policy based on a well-known model by Nerlove andArrow (1962), subject to a constraint on overall expenditure over a finitetime horizon.

Calculus of Variations The infinitesimal calculus (or later just calculus)was developed independently by Newton and Leibniz in the 1670s.Newton formulated the modern notion of a derivative (which he termedfluxion) in his De Quadratura Curvarum, published as an appendix tohis treatise on Opticks in 1704 (Cajori 1919, 17–36). In 1684, Leibnizpublished his notions of derivative and integral in the Acta Eruditorum,a journal that he had co-founded several years earlier and that enjoyeda significant circulation in continental Europe. With the tools of calculusin place, the time was ripe for the calculus of variations, the birth ofwhich can be traced to the June 1696 issue of the Acta Eruditorum. There,Johann Bernoulli challenged his contemporaries to determine the pathfrom point A to point B in a vertical plane that minimizes the timefor a mass point M to travel under the influence of gravity between Aand B. This problem of finding a brachistochrone (figure 1.2) was posed

3. To be specific, let C(t, x, u) be a nonnegative-valued cost function and B > 0 a givenbudget. Then along a trajectory (x(t), u(t)), t ∈ [t0, T], a typical isoperimetric constraint isof the form

∫ Tt0

C(t, x(t), u(t)) dt ≤ B. It can be rewritten as y(t) = C(t, x(t), u(t)), y(t0) =0, y(T) ≤ B. The latter formulation falls squarely within the general optimal-controlformalism developed in this book, so isoperimetric constraints do not need specialconsideration.

Page 21: Optimal Control Theory With Applications in Economics

8 Chapter 1

A

MB

Figure 1.2Brachistochrone connecting the points A and B in parametric form: (x(ϕ), y(ϕ)) = (α(ϕ−sin (ϕ)),α( cos (ϕ) − 1)), where ϕ = ϕ(t) = √

g/α t, and g ≈ 9.81 meters per second squaredis the gravitational constant. The parameter α and the optimal time t = T∗ are determinedby the endpoint condition (x(ϕ(T∗)), y(ϕ(T∗))) = B.

earlier (but not solved) by Galilei (1638).4 In addition to his own solution,Johann Bernoulli obtained four others, by his brother Jakob Bernoulli,Leibniz, de l’Hôpital, and Newton (an anonymous entry). The last wasrecognized immediately by Johann ex ungue leonem (“one knows the lionby his claw”).

Euler (1744) investigated the more general problem of finding extremaof the functional

J =∫ T

0L(t, x(t), x(t)) dt, (1.4)

subject to suitable boundary conditions on the function x( · ). He derivedwhat is now called the Euler equation (see equation (1.5)) as a necessaryoptimality condition used to this day to construct solutions to variationalproblems. In his 1744 treatise on variational methods, Euler did notcreate a name for his complex of methods and referred to variationalcalculus simply as the isoperimetric method. This changed with a 1755letter from Lagrange to Euler informing the latter of his δ-calculus, withδ denoting variations (Goldstine 1980, 110–114). The name “calculus ofvariations” was officially born in 1756, when the minutes of meeting

4. Huygens (1673) discovered that a body which is bound to fall following a cycloid curveoscillates with a periodicity that is independent of the starting point on the curve, so hetermed this curve tautochrone. The brachistochrone is also a cycloid and thus identical tothe tautochrone, which led Johann Bernoulli to remark that “nature always acts in thesimplest possible way” (Willems 1996).

Page 22: Optimal Control Theory With Applications in Economics

Introduction 9

no. 441 of the Berlin Academy on September 16 note that Euler read“Elementa calculi variationum” (Hildebrandt 1989).

Remark 1.2 (Extremal Principles) Heron of Alexandria explained theequality of angles in the reflection of light by the principle that naturemust take the shortest path, for “[i]f Nature did not wish to lead oursight in vain, she would incline it so as to make equal angles” (Thomas1941, 2:497). Olympiodorus the younger, in a commentary (ca. 565) onAristotle’s Meteora, wrote, “[T]his would be agreed by all . . .Nature doesnothing in vain nor labours in vain” (Thomas 1941, 2:497).

In the same spirit, Fermat in 1662 used the principle of least time (nowknown as Fermat’s principle) to derive the law of refraction for light(Goldstine 1980, 1–6). More generally, Maupertuis (1744) formulated theprinciple of least action, that in natural phenomena a quantity called action(denoting energy × time) is to be minimized (cf. also Euler 1744). Thecalculus of variations helped formulate more such extremal principles,for instance, d’Alembert’s principle, which states that along any virtualdisplacement the sum of the differences between the forces and the timederivatives of the moments vanishes. It was this principle that Lagrange(1788/1811) chose over Maupertuis’s principle in his Mécanique Ana-lytique to firmly establish the use of differential equations to describethe evolution of dynamic systems. Hamilton (1834) subsequently estab-lished that the law of motion on a time interval [t0, T] can be derivedas extremal of the functional in equation (1.4) (principle of stationaryaction), where L is the difference between kinetic energy and poten-tial energy. Euler’s equation in this variational problem is also knownas the Euler-Lagrange equation,

ddt∂L(t, x(t), x(t))

∂ x− ∂L(t, x(t), x(t))

∂x= 0, (1.5)

for all t ∈ [t0, T]. With the Hamiltonian function H(t, x, x,ψ) = 〈ψ , x〉 −L(t, x, x), whereψ = ∂L/∂ x is an adjoint variable, one can show that (1.5)is in fact equivalent to the Hamiltonian system,5

5. To see this, note first that (1.6) holds by definition and that irrespective of the initialconditions,

0 = dHdt

− dHdt

= ∂H∂t

+⟨∂H∂x

, x⟩+⟨∂H∂ψ

, ψ⟩−(〈ψ , x〉 + 〈ψ , x〉 − ∂L

∂t−⟨∂L∂x

, x⟩−⟨∂L∂ x

, x⟩)

,

whence, using ψ = ∂L/∂ x and x = ∂H/∂ψ , we obtain

0 = ∂H∂t

+ ∂L∂t

+⟨∂H∂x

+ ∂L∂x

, x⟩+⟨∂H∂ψ

, ψ⟩− 〈ψ , x〉 =

⟨∂H∂x

+ ∂L∂x

, x⟩

.

Thus, ∂H/∂x = −∂L/∂x, so the Euler-Lagrange equation (1.5) immediately yields (1.7).

Page 23: Optimal Control Theory With Applications in Economics

10 Chapter 1

x(t) = ∂H(t, x(t), x(t),ψ(t))∂ψ

, (1.6)

ψ(t) = − ∂H(t, x(t), x(t),ψ(t))∂x

, (1.7)

for all t ∈ [t0, T]. To integrate the Hamiltonian system, given some ini-tial data (t0, x0), Jacobi (1884, 143–157) proposed to introduce an actionfunction,

V(t, x) =∫ t

t0

L(s, x(s), x(s)) ds,

on an extremal trajectory, which satisfies (1.6)–(1.7) on [t0, t]and connectsthe initial point (t0, x0) to the point (t, x). One can now show (see, e.g.,Arnold 1989, 254–255) that

dV(t, x(t))dt

= ∂V(t, x(t))∂t

+ ∂V(t, x(t))∂x

= 〈ψ(t), x(t)〉 − H(t, x(t), x(t),ψ(t)),

so that H = −∂V/∂t and ψ = ∂V/∂x, and therefore the Hamilton-Jacobiequation,

−∂V(t, x(t))∂t

= H(t, x(t), x(t),∂V(t, x(t))

∂x), (1.8)

holds along an extremal trajectory. This result is central for the cons-truction of sufficient as well as necessary conditions for solutions tooptimal control problems (see chapter 3). Extremal principles also playa role in economics. For example, in a Walrasian exchange economy,prices and demands will adjust so as to maximize a welfare fun-ctional. �

Remark 1.3 (Problems with Several Independent Variables) Lagrange (1760)raised the problem of finding a surface of minimal measure, given anintersection-free closed curve. The Euler-Lagrange equation for thisproblem expresses the fact that the mean curvature of the surface mustvanish everywhere. This problem is generally referred to as Plateau’sproblem, even though Plateau was born almost half a century afterLagrange had formulated it originally. (Plateau conducted extendedexperiments with soap films leading him to discover several laws thatwere later proved rigorously by others.) Plateau’s problem was solvedindependently by Douglas (1931) and Radó (1930). For historical detailssee, for instance, Fomenko (1990) and Struwe (1989). This book considers

Page 24: Optimal Control Theory With Applications in Economics

Introduction 11

only problems where the independent variable is one-dimensional, so allsystems can be described using ordinary (instead of partial) differentialequations. �

In an article about beauty in problems of science the economist PaulSamuelson (1970) highlighted several problems in the calculus of varia-tions, such as the brachistochrone problem, and connected those insightsto important advances in economics. For example, Ramsey (1928) for-mulated an influential theory of saving in an economy that determinesan optimal growth path using the calculus of variations. The Ramseymodel, which forms the basis of the theory of economic growth, wasfurther developed by Cass (1965) and Koopmans (1965).6

Feedback Control Before considering the notion of a control system,one can first define a system as a set of connected elements, where theconnection is an arbitrary relation among them. The complement of thisset is the environment of the system. If an element of the system is notconnected to any other element of the system, then it may be viewed aspart of the environment. When attempting to model a real-world sys-tem, one faces an age-old trade-off between veracity and usefulness. Inthe fourteenth century William of Occam formulated the law of parsi-mony (also known as Occam’s razor), entia non sunt multiplicanda sinenecessitate, to express the postulate that “entities are not to be multipliedwithout necessity” (Russell 1961, 453).7 The trade-off between useful-ness and veracity of a system model has been rediscovered many times,for instance, by Leonardo da Vinci (“simplicity it is the ultimate sophisti-cation”) and by Albert Einstein (“make everything as simple as possible,but not simpler”).8

A control system is a system with an input (or control) u(t) that can beinfluenced by human intervention. If the state x(t) of the system can alsobe observed, then the state can be used by a feedback law u(t) = μ(t, x(t))to adjust the input, which leads to a feedback control system (figure 1.3).

There is a rich history of feedback control systems in technology, dat-ing back at least to Ktesibios’s float regulator in the third century B.C. fora water clock, similar to a modern flush toilet (Mayr 1970). Wedges

6. For more details on the modern theory of economic growth, see, e.g., Acemoglu (2009),Aghion and Howitt (2009), and Weitzman (2003).7. For a formalization of Occam’s razor, see Pearl (2000, 45–48).8. Some “anti-razors” warn of oversimplification, e.g., Leibniz’s principle of plenitude(“everything that can happen will happen”) or Kant’s insight that “[t]he variety of entitiesis not to be diminished rashly” (1781, 656).

Page 25: Optimal Control Theory With Applications in Economics

12 Chapter 1

StateControl

System

Feedback Law

Figure 1.3Feedback control system.

were inserted in the water flow to control the speed at which a floatingdevice would rise to measure the time. In 1788, Watt patented the designof the centrifugal governor for regulating the speed of a rotary steamengine, which is one of the most famous early feedback control sys-tems. Rotating flyballs, flung apart by centrifugal force, would throttlethe engine and regulate its speed. A key difference between the Ktesi-bios’s and Watt’s machines is that the former does not use feedback todetermine the control input (the number and position of the wedges),which is therefore referred to as open-loop control. Watt’s flyball mecha-nism, on the other hand, uses the state of the system (engine rotations)to determine the throttle position that then influences the engine rota-tions, which is referred to as closed-loop (or feedback) control. Wiener(1950, 61) noted that “feedback is a method of controlling a system byreinserting into it the results of its past performance.” He suggestedthe term cybernetics (from the Greek word κυβερνητης—governor) forthe study of control and communication systems (Wiener 1948, 11–12).9

Maxwell (1868) analyzed the stability of Watt’s centrifugal governor bylinearizing the system equation and showing that it is stable, providedits eigenvalues have strictly negative real parts. Routh (1877) worked outa numerical algorithm to determine when a characteristic equation (orequivalently, a system matrix) has stable roots. Hurwitz (1895) solvedthis problem independently, and to this day a stable system matrix Ain equation (1.1) carries his name (see lemma 2.2). The stability ofnonlinear systems of the form (1.2) was advanced by the seminal work ofLyapunov (1892), which showed that if an energy function V(t, x) couldbe found such that it is bounded from below and decreasing along any

9. The term was suggested more than a hundred years earlier for the control of socio-political systems by Ampère (1843, 140–141).

Page 26: Optimal Control Theory With Applications in Economics

Introduction 13

system trajectory x(t), t ≥ t0, then the system is (asymptotically) sta-ble, that is, the system is such that any trajectory that starts close toan equilibrium state converges to that equilibrium state. In variationalproblems the energy function V(t, x) is typically referred to as a valuefunction and plays an integral role for establishing optimality condi-tions, such as the Hamilton-Jacobi equation (1.8), or more generally, theHamilton-Jacobi-Bellman equation (3.16).

In 1892, Poincaré published the first in a three-volume treatise oncelestial mechanics containing many path-breaking advances in the the-ory of dynamic systems, such as integral invariants, Poincaré maps, therecurrence theorem, and the first description of chaotic motion. In pass-ing, he laid the foundation for a geometric and qualitative analysis ofdynamic systems, carried forward, among others, by Arnold (1988). Animportant alternative to system stability in the sense of asymptotic con-vergence to equilibrium points is the possibility of a limit cycle. Based onPoincaré’s work between 1881 and 1885,10 Bendixson (1901) establishedconditions under which a trajectory of a two-dimensional system con-stitutes a limit cycle (see proposition 2.13); as a by-product, this resultimplies that chaotic system behavior can arise only if the state-spacedimension is at least 3. The theory of stability in feedback control sys-tems has proved useful for the description of real-world phenomena.For example, Lotka (1920) and Volterra (1926) proposed a model for thedynamics of a biological predator-prey system that features limit cycles(see example 2.8).

In technological applications (e.g., when stabilizing an airplane) itis often sufficient to linearize the system equation and minimize acost that is quadratic in the magnitude of the control and quadraticin the deviations of the system state from a reference state (or trackingtrajectory)11 in order to produce an effective controller. The popular-ity of this linear-quadratic approach is due to its simple closed-formsolvability. Kalman and Bucy (1961) showed that the approach canalso be very effective in dealing with (Gaussian) noise incorporatinga state-estimation component, resulting in a continuous-time versionof the Kalman filter, which was first developed by Rudolf Kalmanfor discrete-time systems. To deal with control constraints in a noisy

10. The relevant series of articles was published in the Journal de Mathématiques, reprintedin Poincaré (1928, 3–222); see also Barrow-Green (1997).11. A linear-quadratic regulator is obtained by solving an optimal control problem ofthe form (1.3), with linear system function f (t, x, u) = Ax + Bu and quadratic payofffunction h(t, x, u) = −x′Rx − u′Su (with R, S positive definite matrices); see example 3.3.

Page 27: Optimal Control Theory With Applications in Economics

14 Chapter 1

environment, the linear-quadratic approach has been used in receding-horizon control (or model predictive control), where a system is period-ically reoptimized over the same fixed-length horizon.12 More recently,this approach has been applied in financial engineering, for example,portfolio optimization (Primbs 2007).

Optimal Control In the 1950s the classical calculus of variations un-derwent a transformation driven by two major advances. Both advanceswere fueled by the desire to find optimal control interventions for givenfeedback control systems, in the sense that the optimal control trajec-tory u∗(t), t ∈ [t0, T], would maximize an objective functional J(u) bysolving a problem of the form (1.3). The first advance, by Richard Bell-man, was to incorporate a control function into the Hamilton-Jacobi vari-ational equation, leading to the Hamilton-Jacobi-Bellman equation,13

−Vt(t, x) = maxu∈U

{h(t, x, u) + 〈Vx(t, x), f (t, x, u)〉}, (1.9)

which, when satisfied on the rectangle [t0, T] × X (where the statespace X contains all the states), together with the endpoint conditionV(T, x) ≡ 0, serves as a sufficient condition for optimality. The optimalfeedback law μ(t, x) is obtained as the optimal value for u on the right-hand side of (1.9), so the optimal state trajectory x∗(t), t ∈ [t0, T], solvesthe initial value problem (IVP)

x = f (t, x,μ(t, x)), x(t0) = x0,

which yields the optimal control

u∗(t) = μ(t, x∗(t)),

for all t ∈ [t0, T]. This approach to solving optimal control problemsby trying to construct the value function is referred to as dynamicprogramming (Bellman 1957).14 The second advance, by Lev Pontryaginand his students, is related to the lack of differentiability of the valuefunction V(t, x) in (1.9), even for the simplest problems (see, e.g., Pon-tryagin et al. 1962, 23–43, 69–73) together with the difficulties of actuallysolving the partial differential equation (1.9) when the value function is

12. Receding-horizon control has also been applied to the control of nonlinear sys-tems, be they discrete-time (Keerthi and Gilbert 1988) or continuous-time (Mayne andMichalska 1990).13. Subscripts denote partial derivatives.14. The idea of dynamic programming precedes Bellman’s work: for example, von Neu-mann and Morgenstern (1944, ch. 15) used backward induction to solve sequential decisionproblems in perfect-information games.

Page 28: Optimal Control Theory With Applications in Economics

Introduction 15

differentiable. Pontryagin (1962), together with his students, provideda rigorous proof for a set of necessary optimality conditions for optimalcontrol problems of the form (1.3). As shown in section 3.3, the condi-tions of the Pontryagin maximum principle (in its most basic version)can be obtained, at least heuristically, from the Hamilton-Jacobi-Bellmanequation. A rigorous proof of the maximum principle usually takes adifferent approach, using needle variations introduced by Weierstrass(1879/1927). As Pontryagin et al. (1962) pointed out,

The method of dynamic programming was developed for the needs of optimalcontrol processes which are of a much more general character than those whichare describable by systems of differential equations. Therefore, the method ofdynamic programming carries a more universal character than the maximumprinciple. However, in contrast to the latter, this method does not have therigorous logical basis in all those cases where it may be successfully made useof as a valuable heuristic tool. (69)

In line with these comments, the Hamilton-Jacobi-Bellman equation isoften used in settings that are more complex than those considered inthis book, for instance for the optimal control of stochastic systems. Theproblem with the differentiability of the value function was addressedby Francis Clarke by extending the notion of derivative, leading tothe concept of nonsmooth analysis (Clarke 1983; Clarke et al. 1998).15

From a practical point of view, that is, to solve actual real-world prob-lems, nonsmooth analysis is still in need of exploration. In contrast tothis, an abundance of optimal control problems have been solved usingthe maximum principle and its various extensions to problems withstate-control constraints, pure state constraints, and infinite time hori-zons. For example, Arrow (1968) and Arrow and Kurz (1970a) providedan early overview of optimal control theory in models of economicgrowth.

1.4 Notes

An overview of the history and content of mathematics as a disciplinecan be found in Aleksandrov et al. (1969) and Campbell and Higgins(1984). Blåsjö (2005) illuminates the background of the isoperimetricproblem. The historical development of the calculus of variations issummarized by Goldstine (1980) and Hildebrandt and Tromba (1985).For a history of technological feedback control systems, see Mayr (1970).

15. Vinter (2000) provided an account of optimal control theory in the setting ofnonsmooth analysis.

Page 29: Optimal Control Theory With Applications in Economics
Page 30: Optimal Control Theory With Applications in Economics

2 Ordinary Differential Equations

Natura non facit saltus.(Nature does not make jumps.)

—Gottfried Wilhelm Leibniz

2.1 Overview

An ordinary differential equation (ODE) describes the evolution of avariable x(t) as a function of time t. The solution of such an equationdepends on the initial state x0 at a given time t0. For example, x(t) mightdenote the number of people using a certain product at time t ≥ t0 (e.g.,a mobile phone). An ordinary differential equation describes how the(dependent) variable x(t) changes as a function of time and its own cur-rent value. The change of state from x(t) to x(t + δ) between the timeinstants t and t + δ as the increment δ tends to zero defines the timederivative

x(t) = limδ→0

x(t + δ) − x(t)δ

. (2.1)

In an economic system such as a market, the change of a state can oftenbe described as a function of time and the state at that time, in the form

x(t) = f (t, x(t)), (2.2)

where the system function f is usually given. The last relation is referredto as an ordinary differential equation. It is ordinary because the inde-pendent variable t is one-dimensional.1 The descriptive question of howto find an appropriate representation f of the system is largely ignored

1. An equation that involves partial derivatives of functions with respect to componentsof a multidimensional independent variable is referred to as a partial differential equation(PDE). An example of such a PDE is the Hamilton-Jacobi-Bellman equation in chapter 3.

Page 31: Optimal Control Theory With Applications in Economics

18 Chapter 2

here because it typically involves observation and appropriate inferencefrom data, requiring techniques that are different from optimal control,the main focus of this book.

Example 2.1 (Product Diffusion) To see how to construct a system modelin practice, let us consider the adoption of a new product, for example,a high-tech communication device. Let x(t) ∈ [0, 1] denote the installedbase at time t ≥ t0 (for some given t0), that is, the fraction of all potentialadopters who at time t are in possession of the device. The fraction ofnew adopters between the instants t and t + δ as δ → 0 is referred to asthe hazard rate,

h(t) ≡ limδ→0

x(t + δ) − x(t)1 − x(t)

= x(t)1 − x(t)

,

which is defined using the concept of a derivative in equation (2.1).Based on empirical evidence on the adoption of television, Bass (1969)postulated an affine relation between the hazard rate and the installedbase, such that

h(t) = αx(t) +β,

referring to α as the coefficient of imitation and to β as the coefficientof innovation. A positive coefficient α can be attributed to a word-of-mouth effect, which increases the (conditional) likelihood of adoptionproportional to the installed base. A positive coefficient β increases thatlikelihood irrespective of the installed base. The last relation implies asystem equation of the form (2.2),

x(t) = (1 − x(t))(αx(t) +β),

for all t ≥ t0, where f (t, x) = (1 − x)(αx +β) is in fact independent of t, ortime-invariant. Despite its simplicity, the Bass diffusion model has oftenbeen shown to fit data of product-adoption processes astonishingly well,which may at least in part explain its widespread use (Bass et al. 1994).For more details on how to find the trajectories generated by the Bassmodel, see exercise 2.1c. �

Section 2.2 discusses how to analyze a differential equation of theform (2.2). Indeed, when the system function f is sufficiently simple,it may be possible to obtain an explicit solution x(t), which generallydepends on a given initial state x(t0) = x0.2 However, in many interesting

2. The Bass diffusion model introduced in example 2.1 is solved for β = 0 in example 2.3and for the general case in exercise 2.1c.

Page 32: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 19

applications, either the system function is too complex for obtaining aclosed-form solution of the system, or the closed-form solution is toocomplicated to be useful for further analysis. In fact, the reader shouldnot be shocked by this generic unsolvability of ordinary differentialequations but should come to expect this as the modal case. Luckily,explicit solutions as a function of time are often not necessary to obtainimportant insights into the behavior of a system. For example, whenthe problem is well-posed (see section 2.2.3), then small variations ofthe system and the initial data will lead to small variations of the solu-tion as well. This implies that the behavior of complicated systems thatare sufficiently close to a simple system will be similar to the behavior ofthe simple system. This structural stability justifies the analysis of simple(e.g., linearized) systems and subsequent use of perturbation techniquesto account for the effects of nonlinearities. An important system pro-perty that can be checked without explicitly solving the system equa-tion (2.2) is the existence and stability of equilibria, which are points x atwhich the system function vanishes. The Bass diffusion model in exam-ple 2.1 for α > 0 and β = 0 has two equilibria, at x = 0 and x = 1. Whenthe initial state x0 = x(t0) of the system at time t0 coincides with one ofthese equilibrium states, the system will stay at rest there forever, thatis, x(t) = x0 for all t ≥ t0, because the rate of change x is zero. Yet, smallperturbations of the initial state away from an equilibrium can havedifferent consequences, giving rise to the notion of stability of an equi-librium. For example, choosing an initial state x0 = ε for a small ε > 0will lead the state to move further away from zero, until the installedbase x(t) approaches saturation. Thus, x0 = 0 is an unstable equilibrium.On the other hand, starting from an initial state 1 − ε for small positive ε,the system will tend to the state x0 = 1, which is therefore referred to asstable equilibrium. Stability properties such as these, including the con-vergence of system trajectories to limit cycles, can often be analyzed byexamining the monotonicity properties of a suitably defined energy orvalue function, usually referred to as a Lyapunov function. A generalizedversion of a Lyapunov function is used in chapter 3 to derive optimalityconditions for optimal control problems.

In section 2.3 the framework for the analysis of first-order ordi-nary differential equations is extended to differential equations withhigher-order time-derivatives by reducing the latter to systems of theformer. That section also discusses a few more sophisticated solutiontechniques for systems of ordinary differential equations, such as theLaplace transform.

Page 33: Optimal Control Theory With Applications in Economics

20 Chapter 2

2.2 First-Order ODEs

2.2.1 DefinitionsLet n ≥ 1 be the dimension of the (real-valued) dependent variable

x(t) = (x1(t), . . . , xn(t)),

which is also called state. A (first-order) ordinary differential equation is ofthe form

F(t, x(t), x(t)) = 0,

where x(t) is the total derivative of x(t) with respect to t, and F : R1+2n →Rn is a continuously differentiable function. The differential equation isreferred to as ordinary because the independent variable t is an ele-ment of the real line, R. Instead of using the preceding implicit repre-sentation of an ODE, it is usually more convenient to use an explicitrepresentation of the form

x(t) = f (t, x(t)), (2.3)

where f : D → R is a continuous function that directly captures how thederivative x depends on the state x and the independent variable t. Thedomain D is assumed to be a nonempty connected open subset of R1+n.Throughout this book it is almost always assumed that first-order ODEsare available in the explicit representation (2.3).3

Let I ⊂ R be a nontrivial interval. Adifferentiable function x : I → Rn

is called a solution to (or integral curve of) the ODE (2.3) (on I) if

(t, x(t)) ∈ D and x(t) = f (t, x(t)), ∀ t ∈ I. (2.4)

Thus, x(t) solves an initial value problem (IVP) relative to a given point(t0, x0) ∈ D,4 with t0 ∈ I, if in addition to (2.4), the initial condition

x(t0) = x0 (2.5)

is satisfied (figure 2.1).

2.2.2 Some Explicit Solutions When n = 1Let n = 1. For certain classes of functions f it is possible to obtain directsolutions to an IVP relative to a given point (t0, x0) ∈ D ⊆ R2.

3. As long as Fx �= 0 at a point (t, x(t), x(t)), by the implicit function theorem (proposi-tion A.7 in appendix A) it is possible to solve for x(t), at least locally.4. The IVP (2.4)–(2.5) is sometimes also referred to as the Cauchy problem (see foot-note 9). Augustin-Louis Cauchy provided the first result on the existence and uniquenessof solutions in 1824 (see Cauchy 1824/1913, 399ff).

Page 34: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 21

Figure 2.1Solution to an ODE x = f (t, x) with initial condition x(t0) = x0 on I.

Separability The function f is called separable if f (t, x) = g(t)h(x) forall (t, x) ∈ D. If h(x) is nonzero for all x ∈ {ξ ∈ R : ∃ (s, ξ ) ∈ D}, then

H(x) ≡∫ x

x0

dξh(ξ )

=∫ t

t0

g(s) ds ≡ G(t)

holds for all (t, x) ∈ D, and H(x) is invertible. Thus, x(t) = H−1(G(t))solves the IVP because in addition to (2.4), the initial condition x(t0) =H−1(G(t0)) = H−1(0) = x0 is satisfied.

Example 2.2 (Exponential Growth) For a given parameter α > 0, consid-er the ODE x = αx with initial condition x(t0) = x0 for some (t0, x0) ∈D = R2 with x0 > 0. Since the right-hand side of the ODE is separable,

ln (x) − ln (x0) =∫ x

x0

dξξ

= α

∫ t

t0

ds = α(t − t0),

so x(t) = x0eα(t−t0) is the unique solution to the IVP for all t ∈ R. �

Example 2.3 (Logistic Growth) The initial size of a population is x0 > 0.Let x(t) denote the size of this population at time t ≥ t0, which evolves

Page 35: Optimal Control Theory With Applications in Economics

22 Chapter 2

Exponential

Logistic

Figure 2.2Exponential and logistic growth (see examples 2.2 and 2.3).

according to

x = α(

1 − xx

)x, x(t0) = x0,

where α > 0 is the (maximum) relative growth rate, and x > x0 is acarrying capacity, which is a tight upper bound for x(t). Analogous toexample 2.2,

1x

[ln(

xx − x

)− ln

(x0

x − x0

)]=∫ x

x0

dξ(x − ξ )ξ

= α

x

∫ t

t0

ds = α

x(t − t0),

so x(t)/(x − x(t)) = x0/(x − x0)eα(t−t0) for all t ≥ t0. Hence,

x(t) = x0xx0 + (x − x0)e−α(t−t0) , t ≥ t0,

solves the IVP with logistic growth (figure 2.2). �

Homogeneity The function f is called homogeneous (of degree zero)if f (t, x) = ρ(x/t).5 Set ϕ(t) = x(t)/t (for t ≥ t0 > 0); then ϕ = x/t − x/t2 =(x −ϕ)/t. Thus, the ODE (2.3) becomes x = tϕ+ϕ = ρ(ϕ), so

ϕ = (1/t)(ρ(ϕ) −ϕ

) ≡ g(t)h(ϕ)

5. The function f : D → Rn is homogeneous of degree k ≥ 0 if for any α > 0 and any (t, x) ∈

D the relation f (αt,αx) = αkf (t, x) holds. Thus, for k = 0 and α = 1/t > 0, we obtainthat f (t, x) = ρ(x/t) as long as ρ(x/t) = f (1, x/t).

Page 36: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 23

is a first-order ODE with separable right-hand side. A solution ϕ(t)to the corresponding IVP with initial condition ρ(t0) = x0/t0 implies asolution x(t) = tϕ(t) to the original IVP with initial condition (2.5).

Generalized Homogeneity If f (t, x) = ρ( at+bx+cαt+βx+γ ), where a, b, c, α, β,

γ are constants, then it is possible to find a solution using the previ-ous methods after a suitable affine coordinate transformation. To see this,

let A =[

a bα β

]and assume that (a, b) �= 0 and that γ �= 0, without loss

of generality.6 Depending on whether the matrix A is singular or not,two cases can be distinguished:

• Case 1. det A = 0, that is, there is a λ such that (α,β) = λ(a, b). Since inthat case

at + bx + cαt +βx + γ

= at + bx + cλ(at + bx) + γ

= 1 + c/(at + bx)λ+ γ /(at + bx)

,

consider only f (t, x) = ρ(at + bx). Set ϕ = at + bx; then

ϕ = a + bρ(ϕ) ≡ h(ϕ)

is an ODE with separable right-hand side.• Case 2. det A �= 0, which implies that it is possible to find a referencepoint (t, x) such that7

at + bx + cαt +βx + γ

= a(t − t) + b(x − x)

α(t − t) +β(x − x)=

a + b(

x−xt−t

)α+β

(x−xt−t

) = a + b(ξ

τ

)α+β

τ

) ,

where τ = t − t and ξ (τ ) = x(t) − x. Thus, ξ (τ ) = x(t), which implies that

the original ODE, using ρ(ξ/τ ) = ρ

(a+b

(ξτ

)α+β

(ξτ

))

, can be replaced by

ξ (τ ) = ρ(ξ/τ ),

an ODE with homogeneous right-hand side. Given a solution ξ (τ ) tothat ODE, a solution to the original ODE is then x(t) = ξ (t − t) + x.

6. Otherwise, if f is not already separable or homogeneous, simply switch the labelsof (a, b, c) with (α,β, γ ) and use a suitable definition of ρ.

7. Indeed,[

tx

]= A−1

[cγ

].

Page 37: Optimal Control Theory With Applications in Economics

24 Chapter 2

Linear First-Order ODE An important special case is when f (t, x) =−g(t)x + h(t). The resulting (first-order) linear ODE is usually written inthe form

x + g(t)x = h(t). (2.6)

The linear ODE is called homogeneous if h(t) ≡ 0. A so-called homoge-neous solution xh(t; C) for that case is obtained immediately by realizingthat f (t, x) = −g(t)x is separable:

xh(t; C) = C exp[−∫ t

t0

g(s) ds]

, (2.7)

where C is a suitable (nonzero) constant, which is determined by aninitial condition. The solution x(t) to the linear ODE (2.6) subject to theinitial condition (2.5) can be provided as the sum of the homogeneoussolution xh(t; C) in (2.7) and any particular solution xp(t) of (2.6). Toconstruct the particular solution, one can use the so-called variation-of-constants method, dating back to Lagrange (1811),8 where one takes thehomogeneous solution xh(t; C) in (2.7) but allows the constant to varywith t. That is, one sets xp(t) = xh(t; C(t)). Substituting this in (2.6), oneobtains the ODE

C(t) = h(t) exp[∫ t

t0

g(s) ds]

,

which implies (by separability of the right-hand side) that

C(t) = C0 +∫ t

t0

h(s) exp[∫ s

t0

g(θ ) dθ]

ds,

where C(t0) = C0. Without any loss of generality one can set C0 = 0,which entails that xp(t0) = 0. Moreover,

x(t) = xh(t; x0) + xp(t).

This is often referred to as the Cauchy formula.

Proposition 2.1 (Cauchy Formula) The unique solution to the linearIVP

8. Joseph-Louis Lagrange communicated the method in 1808 to the French Academyof Sciences; in concrete problems it was applied earlier by Leonhard Euler and DanielBernoulli.

Page 38: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 25

x + g(t)x = h(t), x(t0) = x0, (2.8)

is given by9

x(t) =(

x0 +∫ t

t0

h(s) exp[∫ s

t0

g(θ ) dθ]

ds)

exp[−∫ t

t0

g(s) ds]

. (2.9)

The formula (2.9) is very helpful in practice, and it is used frequently inthis book.

Remark 2.1 (Duhamel Principle) The Cauchy formula (2.9) can be writ-ten in the form

x(t) = xh(t; x0) + (k ∗ h)(t), (2.10)

where the second term on the right-hand side is referred to as aconvolution product with kernel k(t, s) = xh(t; 1)

(xh(s; 1)

)−1,

(k ∗ h)(t) =∫ t

t0

k(t, s)h(s) ds =∫ t

t0

exp[−∫ t

sg(θ ) dθ

]h(s) ds.

Equation (2.10) may be easier to remember than the Cauchy formula(2.9). It makes plain that the solution to the linear IVP (2.8) is obtainedas a superposition of the homogeneous solution (which depends onlyon the function g(t)) and a solution that is directly generated bythe disturbance function h(t). �

Figure 2.3 summarizes the methods used to solve several well-knownclasses of IVPs for n = 1.

Example 2.4 (Effect of Advertising on Sales) (Vidale and Wolfe 1957)Let x(t) represent the sales of a certain product at time t ≥ 0, and letinitial sales x0 ∈ [0, x] at time t = 0 be given, where x > 0 is an estimatedsaturation level. A well-known model for the response of sales to a con-tinuous rate of advertising expenditure u(t), t ≥ 0, can be written in theform of a linear IVP,

x = r(

1 − xx

)u(t) − λx, x(t0) = x0,

where r ∈ (0, 1] is the response coefficient and λ > 0 a sales decay con-stant. The former describes how effective advertising expenditure is in

9. The term Cauchy formula is adopted for convenience. Instead, one can also refer toequation (2.9), which, after all, was obtained by Lagrange’s variation-of-constants method,as the “solution formula to the (linear) Cauchy problem” (see footnote 4).

Page 39: Optimal Control Theory With Applications in Economics

26 Chapter 2

ODETypes

SolutionMethods

Riccati

Bernoulli

Change ofVariables

Change ofVariables

Change ofVariables

Change ofVariables

+ ParticularSolution

HomogeneousCase

Linear

Separable

Homogeneous

GeneralizedHomogeneous

InitialCondition

InitialCondition

Cauchy Formula

Direct Integration

Variation ofConstants

Figure 2.3Solution of well-known types of IVPs when n = 1.

generating sales, and the latter defines the sales response when adver-tising is stopped altogether. Setting g(t) = λ+ ru(t)/x and h(t) = ru(t),the Cauchy formula (2.9) yields that

x(t) =(

x0 exp[− r

x

∫ t

0u(θ ) dθ

]

+ r∫ t

0u(s) exp

[λs − r

x

∫ t

su(θ ) dθ

]ds)

e−λt,

for all t ≥ 0. For example, if the rate of advertising expenditure is equalto the constant u0 > 0 on the time interval [0, T] for some campaign hori-zon T, and zero thereafter, then the previous expression specializes to

x(t) ={

x0e−(λ+ru0/x)t + x ru0λx+ru0

(1 − e−(λ+ru0/x)t

)if t ∈ [0, T],

x(T)e−λ(t−T) if t > T.

Page 40: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 27

Thus, an infinite-horizon advertising campaign (as T → ∞) with ex-penditure rate u0 would cause sales x(t) to approach the level x∞ ≡(1 + λx

ru0)−1x < x for t → ∞. �

Bernoulli Equation The nonlinear ODE

x + g(t)x + h(t)xα = 0, α �= 1, (2.11)

can be transformed into a linear ODE. Indeed, multiplying (2.11) by x−α

yields

x−α x + g(t)x1−α + h(t) = 0.

With the substitution of ϕ = x1−α/(1 −α) the last ODE becomes linear,

ϕ+ (1 −α)g(t)ϕ+ h(t) = 0, (2.12)

and can be solved using the Cauchy formula (see exercise 2.1a).

Riccati Equation The nonlinear ODE

x + ρ(t)x + h(t)x2 = σ (t) (2.13)

cannot generally be solved explicitly. Yet, if a particular solution xp isknown, then it is possible to compute all other solutions. Let x be anothersolution to (2.13). Then the difference, � = x − xp, satisfies the ODE

�+ ρ(t)�+ h(t) (x2 − x2p)︸ ︷︷ ︸

(x+xp)(x−xp)

= �+ ρ(t)�+ h(t)(�+ 2xp)� = 0.

In other words, the difference � satisfies the Bernoulli equation

�+ g(t)�+ h(t)�2 = 0,

where g(t) = ρ(t) + 2xp(t)h(t). Thus, using the substitution ϕ = −(1/�),one obtains the linear ODE (2.12) with α = 2, which can be solvedusing the Cauchy formula. Any particular solution xp to the Riccatiequation (2.13) therefore implies all other solutions in the form

x = xp − 1ϕ

,

where ϕ is any solution to (2.12). Example 2.14 shows how to solvea matrix Riccati equation with some special structure by reducing itto a system of linear ODEs. The Riccati equation plays an important

Page 41: Optimal Control Theory With Applications in Economics

28 Chapter 2

role for the optimal control of a linear system with quadratic objectivefunctional, which is often referred to as a linear-quadratic regulator (seeexample 3.3).

2.2.3 Well-Posed ProblemsIn accord with a notion by Hadamard (1902) for mathematical modelsof physical phenomena, an IVP is said to be well-posed if the followingthree requirements are satisfied:

• There exists a solution.• Any solution is unique.• The solution depends continuously on the available data (parametersand initial condition).

Problems that do not satisfy at least one of these requirements arecalled ill-posed.10 A deterministic economic system is usually describedhere in terms of a well-posed IVP. This ensures that system responsescan be anticipated as unique consequences of outside intervention andthat these responses remain essentially unaffected by small changes inthe system, leading to a certain robustness of the analysis with respectto modeling and identification errors. In what follows, easy-to-verifyconditions are established under which an IVP is well-posed.

Existence and Uniqueness The existence of a solution to an IVP isguaranteed when the right-hand side of the ODE (2.3) is continuous, asassumed from the outset.

Proposition 2.2 (Existence) (Peano 1890) Let f ∈ C0(D). For any (t0, x0)∈ D the IVP

x = f (t, x), x(t0) = x0,

has a solution on a nontrivial interval I ⊂ R that contains t0. Any suchsolution can be extended (in the direction of both positive and negativetimes t) such that it comes arbitrarily close to the boundary of D.

Proof See, for example, Walter (1998, 73–78).11 n

10. A well-known class of ill-posed problems is that of inverse problems, where a modelis to be determined from data. To reduce the sensitivity of solutions to data errors, onecan use regularization methods, e.g., the one by Tikhonov (1963) for linear models (see,e.g., Kress 1998, 86–90).11. The proof of this result is nonconstructive and is therefore omitted.

Page 42: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 29

The phrase “arbitrarily close to the boundary of D” in proposition 2.2means that any solution x(t) can be extended for all times t ∈ R, unlessa boundary of D can be reached in finite time, in which case it is pos-sible to approach that escape time arbitrarily closely. A solution to theIVP that has been fully extended to the closure � ⊂ D of a nonemptyconnected open set� such that � contains the initial point (t0, x0) is saidto be maximal on �.

Example 2.5 (Nonuniqueness) Let D = R2, and consider the IVP

x = √|x|, x(0) = 0.

Note first that the function f (t, x) = √|x| on the right-hand side is contin-uous. Proposition 2.2 guarantees the existence of a solution to this IVP,but one cannot expect uniqueness, the second of Hadamard’s require-ments for well-posedness, to hold. Indeed, since the right-hand sideis symmetric, with any solution x(t) of the IVP the function −x(−t) isalso a solution. Note also that the function x(t) = 0 is a solution forall t ∈ R. Since the system function f is separable, one can computeanother solution by direct integration,

2√

x =∫ x

0

dξ√ξ

=∫ t

0ds = t,

for (t, x) ≥ 0. Using the aforementioned symmetry, this results in a sec-ond solution to the IVP, of the form x(t) = t|t|/4, which is defined forall t ∈ R. Besides x(t) ≡ 0 and x(t), there are (infinitely) many moresolutions,

x(t; C1, C2) =

⎧⎪⎨⎪⎩

− (t+C1)2

4 if t ≤ −C1,0 if t ∈ [−C1, C2],(t−C2)2

4 if t ≥ C2,

indexed by the nonnegative constants C1, C2, with the two earliersolutions at diametrically opposed extremes, such that

x(t; 0, 0) = x(t), and limC1,C2→∞

x(t; C1, C2) = 0 (pointwise),

for all t ∈ R (figure 2.4). �

The function f (t, x) is said to be Lipschitz (with respect to x, on D) ifthere exists a nonnegative constant L such that

‖ f (t, x) − f (t, x)‖ ≤ L‖x − x‖, ∀ (t, x), (t, x) ∈ D, (2.14)

Page 43: Optimal Control Theory With Applications in Economics

30 Chapter 2

Figure 2.4Nonuniqueness of solutions to the IVP x = √|x|, x(0) = 0 (see example 2.5).

where ‖ · ‖ is any suitable norm on Rn. If, instead, for any point (t, x) ∈ Dthere exist εx > 0 and Lx ≥ 0 such that

‖ f (t, x) − f (t, x)‖ ≤ Lx‖x − x‖,

∀ (t, x) ∈ {(t, ξ ) ∈ D : ‖(t, x) − (t, ξ )‖<εx},(2.15)

then the function f (t, x) is called locally Lipschitz (with respect to x, on D).Clearly, if a function is Lipschitz, then it is also locally Lipschitz, but notvice versa.12 On the other hand, if f (t, x) is locally Lipschitz, then itis Lipschitz on any compact subset of D. Both properties are, broadlyspeaking, implied if the function f (t, x) is continuously differentiablein x on D.

Lemma 2.1 Assume that the function f (t, x) is continuously differen-tiable in x on D. (1) The function f (t, x) is locally Lipschitz with respectto x on D. (2) If D is convex and the Jacobian matrix fx = [∂fi/∂xj] isbounded on D, then the function f (t, x) is Lipschitz with respect to x on D.

Proof (1) Fix any point (t, x) ∈ D, and select εx > 0 such that the ball

Bεx (t, x) = {(τ , ξ ) ∈ R1+n : ‖(t, x) − (τ , ξ )‖ ≤ εx}is contained in D. This is possible, since D is an open set. Consider (t, x) ∈Bεx (t, x), and apply the mean-value theorem (proposition A.14) to the ith

12. Consider f (t, x) = tx2 on D = R2, which is locally Lipschitz but does not satisfy (2.14).

Page 44: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 31

coordinate function fi, i ∈ {1, . . . , n}, of f = ( f1, . . . , fn)′. Then

fi(t, x) − fi(t, x) = 〈 fi,x(t, λi x + (1 − λi)x), x − x〉,for some λi ∈ [0, 1]. Since the derivative fi,x is continuous on the compactset Bεx (t, x), it is also bounded there. Hence, there exists a constant Li,x ≥0 suchthat

‖ fi(t, x) − fi(t, x)‖ ≤ Li,x‖x − x‖.

Combining this argument for the n coordinate functions (e.g., using themaximum norm together with the norm equivalence in remark A.1)yields that there exists a constant Lx ≥ 0 such that

‖ f (t, x) − f (t, x)‖ = ‖ fx(t, λx + (1 − λ)x)(x − x

) ‖ ≤ Lx‖x − x‖,

for all (t, x) ∈ Bεx (t, x), which in turn implies that f (t, x) is locallyLipschitz with respect to x on D.

(2) Since D is convex, for any two points (t, x) and (t, x) in D, (t, λx +(1 − λ)x) ∈ D for all λ ∈ [0, 1]. As in part (1) (with Bεx (t, x) replaced by D),applying the mean-value theorem yields that f (t, x) is Lipschitz withrespect to x on D. n

Proposition 2.3 (Local Existence and Uniqueness) If f is locally Lip-schitz with respect to x on D, then for any given (t0, x0) ∈ D the IVP

x = f (t, x), x(t0) = x0, (2.16)

has a unique solution on a nontrivial interval I ⊂ R which contains t0.

Proof Let I = [t0, t0 + δ] be a nontrivial interval for a suitably smallpositive δ. If x(t) is a solution to the IVP (2.16) on I, then by integrationone obtains the integral equation

x(t) = x0 +∫ t

t0

f (s, x(s)) ds, (2.17)

for all t ∈ I. Since (2.17) in turn implies (2.16), this integral equation isin fact an equivalent integral representation of the IVP. The statements forproving (2.17) follow.

(1) Existence. Consider the Banach space13 X = C0(I) of all continuousfunctions x : I → Rn, equipped with the maximum norm ‖ · ‖∞ that isdefined by

13. A Banach space is a linear space that is equipped with a norm and that is complete inthe sense that any Cauchy sequence converges. See appendix A.2 for more details.

Page 45: Optimal Control Theory With Applications in Economics

32 Chapter 2

‖x‖∞ = maxt∈[t,t0+δ] ‖x(t)‖

for all x ∈ X . The right-hand side of the integral equation (2.17) maps anyfunction x ∈ X to another element Px in X , where P : X → X is a con-tinuous functional. For any given time t ∈ I the right-hand side of (2.17)evaluates to (Px)(t). With this notation, the integral equation (2.17) on Ican be equivalently rewritten as a fixed-point problem of the form

x = Px. (2.18)

Let Sr = {x ∈ X : ‖x − x0‖∞ ≤ r} be a closed ball of radius r > 0 in theBanach space X centered on the constant function x0. Now note thatfor a small enough r, P : Sr → Sr (i.e., P maps the ball Sr onto itself) andmoreover P is a contraction mapping on Sr, that is, it satisfies the Lipschitzcondition

‖Px − Px‖∞ ≤ K‖x − x‖∞, (2.19)

for all x, x ∈ Sr, with a Lipschitz constant K < 1. This would allow theapplication of the Banach fixed-point theorem (proposition A.3), whichguarantees the existence of a unique solution to the fixed-point prob-lem (2.18). In addition, the fixed point can be obtained by successiveiteration from an arbitrary starting point in Sr.

For this, let Mf = ‖ f (·, x0)‖∞ be the maximum of ‖ f (t, x0)‖ over allt ∈ I. Take an arbitrary function x ∈ Sr. Since

(Px)(t) − x0 =∫ t

t0

f (s, x(s)) ds

=∫ t

t0

(f (s, x(s)) − f (s, x0) + f (s, x0)

)ds,

this implies (using the assumption that f is Lipschitz with respect to xwith constant L) that

‖(Px)(t) − x0‖ ≤∫ t

t0

‖f (s, x(s))‖ ds

=∫ t

t0

(‖ f (s, x(s)) − f (s, x0)‖ + ‖ f (s, x0)‖) ds

≤∫ t

t0

(L‖x(s) − x0‖ + Mf

)ds

Page 46: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 33

≤ (t − t0)(Lr + Mf )

≤ δ(Lr + Mf ),

for all t ∈ I. Thus, as long as δ ≤ r/(Lr + Mf ),

‖Px − x0‖∞ = maxt∈[t0,t0+δ] ‖(Px)(t) − x0‖ ≤ δ(Lr + Mf ) ≤ r,

which implies that Px ∈ Sr. Hence, it has been shown that when δ issmall enough P maps Sr into itself. Now select any x, x ∈ Sr. Then

‖(Px)(t) − (Px)(t)‖ =∥∥∥∥∫ t

t0

(f (s, x(s)) − f (s, x(s))

)ds∥∥∥∥

≤∫ t

t0

‖ f (s, x(s)) − f (s, x(s))‖ds

≤∫ t

t0

L‖x(s) − x(s)‖ds ≤ Lδ‖x − x‖∞,

for all t ∈ [t0, t0 + δ], which implies that

‖Px − Px‖∞ ≤ Lδ‖x − x‖∞,

that is, the Lipschitz condition (2.19) for K = Lδ < 1, provided thatδ < 1/L. Hence, as long as

δ < min{

rLr + Mf

,1L

}, (2.20)

the continuous functional P : Sr → Sr is a contraction mapping on theconvex set Sr. The hypotheses of the Banach fixed-point theoremare satisfied, so (2.18) has a unique solution x, which is obtained as thepointwise limit of the sequence {xk}∞k=0, where x0 = x0 and xk+1 = Pxk

for all k ≥ 0, that is,

x(t) = limk→∞

xk(t) = limk→∞

(Pkx0)(t),

for all t ∈ I, where Pk denotes the k-fold successive application of P.(2) Uniqueness. To show that the solution x(t) established earlier is

unique, it is enough to demonstrate that x(t) must stay inside theball Br(x0) = {ξ ∈ Rn : ‖ξ − x0‖ ≤ r}. Let t0 +� be the first intersectiontime, so that x(t0 +�) − x0 = r. If � > δ, then there is nothing to prove.Thus, suppose that � ≤ δ. Then

Page 47: Optimal Control Theory With Applications in Economics

34 Chapter 2

‖x(t0 +�) − x0‖ = r,

and for all t ∈ [t0, t0 +�], as before,

‖x(t) − x0‖ ≤ �(Lr + Mf ).

In particular, r = ‖x(t0 +�) − x0‖ ≤ �(Lr + Mf ), so by virtue of (2.20)it is

δ ≤ rLr + Mf

≤ �,

which implies that the solution x(t) cannot leave the ball Br(x0) forall t ∈ I. Therefore any continuous solution x must be an element of Sr,which by the Banach fixed-point theorem implies that the solution isunique.

This completes the proof of proposition 2.3. n

Remark 2.2 (Picard-Lindelöf Error Estimate) Consider the interval I =[t0, t0 + δ], where the constant δ satisfies (2.20). By the Banach fixed-pointtheorem (proposition A.3) the iteration scheme featured in the proof ofproposition 2.3 (often referred to as successive approximation) convergesto a unique solution. If, starting from any initial function x0 ∈ Sr,

‖x1(t) − x0(t)‖ ≤ M|t − t0| +μ

with appropriate positive constants μ, M for all t ∈ I (e.g., M = Mf

and μ = r), then

‖xk(t) − xk−1(t)‖ = ‖(Pxk−1)(t) − (Pxk−2)(t)‖

≤ L∣∣∣∣∫ t

t0

‖xk−1(s) − xk−2(s)‖ds∣∣∣∣

≤ · · · ≤ Lk−1∫ t

t0

(M|t − t0| +μ) ds

= Lk−1M|t − t0|2

2+ Lk−1μ|t − t0|,

for all k ≥ 2. Carrying the recursion for the integration toward the leftin this chain of inequalities yields

‖xk(t) − x k−1(t)‖ ≤ ML

(L|t − t0|)k

k! +μ(L|t − t0|)k−1

(k − 1)! .

Page 48: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 35

In addition,

x = xn +∞∑

k=n+1

(xk − xk−1),

for all n ≥ 0, which, using the previous inequality, implies (by summingover k) the Picard-Lindelöf error estimate

‖xn(t) − x(t)‖ ≤ ML

∞∑k=n+1

(L(t − t0))k

k! +μ

∞∑k=n

(L|t − t0|)k−1

(k − 1)!

≤(

ML

(L|t − t0|)n+1

(n + 1)! +μ(L|t − t0|)n

n!)

eL|t−t0|

≤(

n + 1+μ

)(Lδ)neLδ

n! ,

for all t ∈ I. �

Proposition 2.4 (Global Existence and Uniqueness) If f is uniformlybounded and Lipschitz with respect to x on D, then for any given (t0, x0)the IVP (2.16) has a unique solution.14

Proof By proposition 2.3 there exists a local solution on [t0, t0 + δ] forsome small (but finite) δ > 0. Since f is Lipschitz on the domain D anduniformly bounded, the constants r, Mf , L, K in the proof of the localresult become independent of the point x0. Hence, it is possible to extendthe solution forward starting from the initial data (t0 + δ, x(t0 + δ)) on theinterval I2 = [t0 + δ, t0 + 2δ], and so forth, on Ik = [t0 + (k − 1)δ, t0 + kδ]for all k ≥ 2 until the solution approaches the boundary of D. The samecan be done in the direction of negative times. n

A system with finite escape time is such that it leaves any compact sub-set � of an unbounded state space X in finite time (assuming that D =[t0, ∞) × X ). Under the assumptions of proposition 2.4 a system cannothave a finite escape time.

Example 2.6 (Finite Escape Time) Consider the IVP x = x2, x(t0) = x0, forsome (t0, x0) ∈ D = R2++. The function x2 is locally (but not globally)Lipschitz with respect to x on D. Thus, by proposition 2.3 there is aunique solution,

14. More specifically, the maximal solution on any� ⊂ D, as in the definition of a maximalsolution following proposition 2.2, exists and is unique.

Page 49: Optimal Control Theory With Applications in Economics

36 Chapter 2

x(t) = x0

1 − x0(t − t0),

which can be computed by direct integration (using separability of theright-hand side of the system equation). Since

limt→t−e

x(t) = ∞,

for te = t0 + (1/x0) < ∞, the system has finite escape time (equalto te). �

Continuous Dependence Continuous dependence, the third condi-tion for well-posedness of an IVP, requires that small changes in initialdata as well as in the system equation have only a small impact on thesolution (figure 2.5). Assume thatα ∈ Rp is an element of a p-dimensionalEuclidean parameter space, where p ≥ 1 is a given integer. Now con-sider perturbations of a parameterized IVP,

x = f (t, x,α), x(t0) = x0, (2.21)

where the function f : D × Rp is assumed to be continuous and (t0, x0) ∈D. For a given α = α0 ∈ Rp the parameterized IVP is called a nominalIVP, relative to which small model perturbations can be examined.

Proposition 2.5 (Continuous Dependence) If the function f (t, x,α) islocally Lipschitz with respect to x on D, then for any ε > 0 thereexists δε > 0 such that

|t0 − t0| + ‖x0 − x0‖ + ‖α−α0‖ ≤ δε ⇒ ‖x(t, α) − x(t,α0)‖ ≤ ε,

for all t in an open intervalI, where x(t,α0) is the solution to the IVP (2.21)for α = α0 and x(t, α) is the solution to (2.21), for α = α and (t0, x0) =(t0, x0).

Proof Since f (t, x,α0) is locally Lipschitz, by applying proposition 2.3in both directions of time, one can find� > 0 such that there is a uniquesolution x(t,α0) of the (nominal) IVP (2.21) forα = α0 on the interval [t0 −�, t0 +�]. Fix ε1 ∈ (0, min{�, ε, ε1}) such that the ε1-neighborhood of thecorresponding trajectory is contained in D, that is, such that

N = {(τ , ξ ) ∈ D ∩ [t0 −�, t0 +�] × Rn : ‖x(τ ,α0) − ξ‖ ≤ ε1} ⊂ D.

The constant ε1 is determined in equation (2.23). Note that the set N(which can be interpreted geometrically as a tube containing the nominaltrajectory) is compact. Fix an arbitrary a > 0. Without loss of generality,

Page 50: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 37

Figure 2.5Continuous dependence on initial conditions and parameter α of solutions to the IVP x =f (t, x) with initial conditions x(t0;α) = x0 and x(t0;α) = x0.

one can restrict attention to parameters α that lie in a closed ball Ba(α0)of radius a > 0, with the nominal parameter α0 ∈ Rp at its center, so that

‖α−α0‖ ≤ a.

Since f (t, x,α) is by assumption continuous on D × Rp, it is uniformlycontinuous on the compact subset N × Ba(α0). Thus, there exists a ρ ∈(0, a] such that

‖α−α‖ ≤ ρ ⇒ ‖ f (t, x, α) − f (t, x,α)‖ ≤ ε1. (2.22)

Using an integral representation of the solution to the parameterizedIVP (2.21), analogous to (2.17) in the proof of proposition 2.3, it is

x(t, α) − x(t,α0) = x0 +∫ t

t0

f (s, x(s, α), α) ds

−(

x0 +∫ t

t0

f (s, x(s,α0),α0) ds)

Page 51: Optimal Control Theory With Applications in Economics

38 Chapter 2

= x0 − x0 +∫ t0

t0

f (s, x(s, α), α) ds

+∫ t

t0

(f (s, x(s, α), α) − f (s, x(s,α0),α0)

)ds,

for all t ∈ [t0 −�, t0 +�]. Hence, provided that

|t0 − t0| + ‖x0 − x0‖ + ‖α−α0‖ ≤ δ

for some δ ∈ (0, min{ε1, ρ}], and taking into account (2.22),15

‖x(t, α) − x(t,α0)‖ ≤ δ+ δMf

+∣∣∣∣∫ t

t0

‖ f (s, x(s, α), α) − f (s, x(s,α0),α0)‖ds∣∣∣∣

≤ (1 + Mf )δ+ ε1|t − t0|

+∣∣∣∣∫ t

t0

‖ f (s, x(s, α),α0) − f (s, x(s,α0),α0)‖ds∣∣∣∣

≤ (1 + Mf )δ+ ε1�+ L∣∣∣∣∫ t

t0

‖x(s, α) − x(s,α0)‖ds∣∣∣∣ ,

for all t ∈ [t0 −�, t0 +�], where Mf = max(t,x,α)∈N×Ba(α0) ‖ f (t, x,α)‖ andL ≥ 0 is the (local) Lipschitz constant of f with respect to x. Applyingthe Gronwall-Bellman inequality (proposition A.9) to the last inequalityyields that

‖x(t, α) − x(t,α0)‖ ≤ ((1 + Mf )δ+ ε1�

)eL|t−t0| ≤ (

(1 + Mf )δ+ ε1�)

eL�,

for all t ∈ [t0 −�, t0 +�]. Since by construction δ ≤ ε1, it is

‖x(t, α) − x(t,α0)‖ ≤ ε,

for all t ∈ [t0 −�, t0 +�], as long as

ε1 ≤ ε1 ≡ min{

εe−L�

1 + Mf +�,�}

, (2.23)

and δ = δε ≤ min{ε1, ρ}, which completes the proof. n

15. Note also that

‖ f (s, x, α) − f (s, x,α0)‖ ≤ ‖ f (s, x, α) − f (s, x,α0)‖ + ‖ f (s, x,α0) − f (s, x,α0)‖≤ ε1 + ∥∥ f (s, x,α0) − f (s, x,α0)

∥∥ .

Page 52: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 39

Remark 2.3 (Sensitivity Analysis) If, in addition to the assumptions inproposition 2.5, the function f (t, x,α) is continuously differentiablein (x,α), then it is possible to differentiate the solution

x(t,α) = x0 +∫ t

t0

f (s, x(s,α),α) ds

to the parameterized IVP (2.21) in a neighborhood of (t0,α0) with respectto α,

xα(t,α) =∫ t

t0

(fx(s, x(s,α),α)xα(s,α) + fα(s, x(s,α),α)

)ds,

and subsequently with respect to t. Thus, the sensitivity matrix S(t) =xα(t,α0) (with values in Rn×p) satisfies the linear IVP

S(t) = A(t)S(t) + B(t), S(t0) = 0,

where A(t) = fx(t, x(t,α0),α0) ∈ Rn×n and B(t) = fα(t, x(t,α0),α0) ∈ Rn×p.This results in the approximation16

x(t,α) − x(t,α0) = xα(t,α0)(α−α0) + O((α−α0)2)

= S(t)(α−α0) + O((α−α0)2) , (2.24)

for (t,α) in a neighborhood of (t0,α0). �

2.2.4 State-Space AnalysisInstead of viewing the trajectory of the solution to an IVP on I as a graph

{(t, x(t)) : t ∈ I} ⊂ R1+n

that lies in the (1 + n)-dimensional domain D of f (t, x), it is often con-venient to consider the projections D(t) = {x ∈ R : (t, x) ∈ D} and D0 ={t ∈ R : ∃ (t, x) ∈ D}, and restrict attention to the graph

{x(t) ∈ D(t) : t ∈ D0} ⊂ Rn

that lies in the state space Rn. This is especially true when n ∈ {2, 3}because then it is possible to graphically represent the state trajectoriesin the state space. When viewed in the state space, the right-hand side ofthe ODE x = f (t, x) defines a vector field, and the unique solution to any

16. The Landau notation O( · ) describes the limiting behavior of the function in its argu-ment. For example, �(ξ ) = O(‖ξ‖2

2) if and only if there exists M > 0, such that ‖�(ξ )‖ ≤M‖ξ‖2

2 in a neighborhood of the origin, i.e., for all ξ such that ‖ξ‖ < ε for some ε > 0.

Page 53: Optimal Control Theory With Applications in Economics

40 Chapter 2

well-posed IVP is described by the flow φ : R × D → Rn of this vectorfield, which for a time increment τ maps the initial data (t0, x0) ∈ D tothe state x(t0 + τ ). Furthermore,

x(t0 + τ ) = φ(τ , t0, x0) = x0 +∫ t0+τ

t0

f (s, x(s)) ds

solves the IVP (2.16) on some time interval. The flow is often a conve-nient description of a system trajectory subject to an initial condition.It emphasizes the role of an ODE in transporting initial conditions toendpoints of trajectories (as solutions to the associated well-posed IVPs).

Remark 2.4 (Group Laws) The flow φ(τ , t, x) of the system x = f (t, x)satisfies the group laws 17

φ(0, t, x) = x and φ(τ + σ , t, x) = φ(σ , t + τ ,φ(τ , t, x)) (2.25)

for all (t, x) ∈ D and increments τ , σ . �

Remark 2.5 (Flow of Autonomous System) When considering an ODE ofthe form x = f (x), where the function f : Rn → Rn does not depend on tand is Lipschitz and bounded, one can set the initial time t0 to zerowithout any loss in generality. The flow, with simplified notation φ(t, x),describes the time-t value of a solution to an IVP starting at the point x.The group laws (2.25) for the flow of this autonomous system can bewritten in the more compact form

φ(0, x) = x and φ(t + s, x) = φ(s,φ(t, x)), (2.26)

for all x ∈ Rn and all s, t ∈ R (figure 2.6). �

2.2.5 Exact ODEs and Potential FunctionA vector-valued function v = (v0, v1, . . . , vn) : D → R1+n has a potential(on D) if there exists a real-valued potential function V : D → R suchthat

(Vt(t, x), Vx(t, x)) = v(t, x), ∀ (t, x) ∈ D.

17. In algebra, a group (G, ◦) consists of a set G together with an operation ◦ that combinesany two elements a, b of G to form a third element a ◦ b, such that for all a, b, c ∈ G thefollowing four conditions (group axioms) are satisfied: (1) (closure) a ◦ b ∈ G; (2) (associa-tivity) (a ◦ b) ◦ c = a ◦ (b ◦ c); (3) (identity) ∃ e ∈ G : e ◦ a = a ◦ e = e; (4) (inverse) ∃ a−1 ∈ Gsuch that a−1 ◦ a = a ◦ a−1 = e, where e is the identity. For the group (D, +) with D =R

1+n the function φ : R × D → D with (τ , (t, x)) �→ φ(τ , (t, x)) = (t + τ ,φ(τ , t, x)) forms aone-parameter group action, with identity φ(0, ·) and such that φ(τ + σ , ·) = φ(σ , φ(τ , ·)).

Page 54: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 41

Figure 2.6Group laws for the flow of a two-dimensional autonomous system.

Consider now the ODE (in implicit representation)

g(t, x) + 〈h(t, x), x〉 = 0, (2.27)

where g : D → R and h : D → Rn are given continuously differentiablefunctions. This ODE is called exact if the function v(t, x) = ( g(t, x), h(t, x))has a potential on D.

Proposition 2.6 (Poincaré Lemma) Let v = (v0, v1, . . . , vn) : D → R1+n

be a continuously differentiable function, defined on a contractibledomain D.18 The function v has a potential (on D) if and only if

∂v0

∂xi= ∂vi

∂tand

∂vi

∂xj= ∂vj

∂xi, ∀ i, j ∈ {1, . . . , n}, (2.28)

on D.

18. A domain is contractible if it can be deformed to a point using a suitable continuousmapping (which is referred to as a homotopy).

Page 55: Optimal Control Theory With Applications in Economics

42 Chapter 2

Proof⇒: Let V be a potential of the function v on D so that the gradi-

ent V( t,x) = v. Since v is by assumption continuously differentiable, thepotential function V is twice continuously differentiable. This impliesthat the Hessian matrix of second derivatives of V is symmetric on D(see, e.g., Zorich 2004, 1:459–460), that is, condition (2.28) holds.

⇐: See, for example, Zorich (2004, 2:353–354). n

Remark 2.6 (Computation of Potential Function) If a function

v = (v0, v1, . . . , vn)

has a potential V on the simply connected domain D, then

V(t, x) − V(t0, x0) =∫ (t,x)

(t0,x0)〈v(τ , ξ ), d(τ , ξ )〉 =

∫ 1

0〈v(γ (s)), dγ (s)〉, (2.29)

where the integration is carried out along any differentiable path γ :[0, 1] → D which is such that γ (0) = (t0, x0) and γ (1) = (t, x). �

An exact ODE can be written equivalently in the form

dV(t, x)dt

= Vt(t, x) + 〈Vx(t, x), x〉 = 0.

As a result, V(t, x(t)) ≡ V(t0, x0) for any solution x(t) of an exact ODEthat satisfies the initial condition x(t0) = x0. The potential function for anexact ODE is often referred to as a first integral (of (2.27)). Note that a firstintegral confines trajectories to an (n − 1)-dimensional subset (manifold)of the state space. For a given ODE of the form (2.3) it may be possibleto find n − 1 different exact ODEs of the form (2.27) with first integrals,in which case the vector field f in (2.3) is called completely integrable. Thecomplete integrability of vector fields is related to the controllability ofnonlinear systems.19

Remark 2.7 (Integrating Factor / Euler Multiplier) Given a function vwhich does not have a potential, it is sometimes possible to finda (continuously differentiable) integrating factor (or Euler multiplier)μ : D → R such that μv has a potential. By the Poincaré lemma thisis the case if and only if

∂(μv0)∂xi

= ∂(μvi)∂t

and∂(μvi)∂xj

= ∂(μvj)∂xi

, ∀ i, j ∈ {1, . . . , n}. (2.30)

19. For details see the references in section 3.2.2.

Page 56: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 43

In order to find an integrating factor, it is often useful to assumethat μ is separable in the different variables, for instance, μ(t, x) =μ0(t)μ1(x1) · · ·μn(xn). �

Example 2.7 (Potential for Linear First-Order ODE) Consider the linearfirst-order ODE (2.6) on the simply connected domain D = R2, whichcan equivalently be written in the form

g(t)x − h(t) + x = v0(t, x) + v1(t, x)x = 0,

where v0(t, x) = g(t)x − h(t) and v1(t, x) = 1. With the use of an Eulermultiplier μ(t), condition (2.30) becomes

μ(t)g(t) = μ(t).

The latter is satisfied as long as the (nonzero) integrating factor is ofthe form

μ(t) = μ0 exp[∫ t

t0

g(s) ds]

,

where μ0 �= 0. Thus, integrating the exact ODE μv0 +μv1x = 0 alongany path from the initial point (t0, x0) ∈ D to the point (t, x) ∈ D asin (2.29) yields the potential function

V(t, x) =∫ t

t0

v0(s, x0)μ(s) ds +∫ x

x0

v1(t, ξ )μ(t) dξ

= μ(t)x −μ0x0 −∫ t

t0

μ(s)h(s) ds.

On any solution x(t) to the linear IVP (2.8), the potential functionV(t, x(t)) stays constant, so that V(t, x(t)) ≡ V(t0, x0) = 0, which impliesthe Cauchy formula in proposition 2.1,

x(t) = μ0x0

μ(t)+∫ t

t0

μ(s)μ(t)

h(s) ds

= x0 exp[−∫ t

t0

g(s) ds]

+∫ t

t0

exp[−∫ t

sg(θ ) dθ

]h(s) ds,

for all t ≥ t0. �

Page 57: Optimal Control Theory With Applications in Economics

44 Chapter 2

2.2.6 Autonomous SystemsAn autonomous system is such that the right-hand side of the ODE (2.3)does not depend on t. It is represented by the ODE

x = f (x),

where f : D ⊂ Rn → Rn. Note that the domain D of f does not containtime and is therefore a nonempty connected open subset of Rn. In thespecial case where n = 2,

dx2

dx1= f2(x)

f1(x)

describes the phase diagram.

Example 2.8 (Predator-Prey Dynamics) Consider the evolution of twointeracting populations, prey and predator. The relative growth of thepredator population depends on the availability of prey. At the sametime, the relative growth of the prey population depends on the presenceof predators. Lotka (1920) and Volterra (1926) proposed a simple linearmodel of such predator-prey dynamics. Let ξ1(τ ) and ξ2(τ ) be sizes ofthe prey and predator populations at time τ ≥ 0, respectively, with givenpositive initial sizes of ξ10 and ξ20. The Lotka-Volterra predator-preyIVP is of the form

ξ1 = ξ1(a − bξ2), ξ1(0) = ξ10,

ξ2 = ξ2(cξ1 − d), ξ2(0) = ξ20,

where a, b, c, d are given positive constants. To reduce the number ofconstants to what is necessary for an analysis of the system dynamics,it is useful to first de-dimensionalize the variables using a simple lineartransformation.20 For this, one can set

t = aτ , x1(t) = (c/d)ξ1(τ ), x2(t) = (b/a)ξ2(t),

which yields the following equivalent but much simplified IVP:

20. Afundamental result in dimensional analysis is the Theorem�by Buckingham (1914),which (roughly speaking) states that if in a mathematical expression n variables are mea-sured in k (≤ n) independent units, then n − k variables in that expression may be rendereddimensionless (generally, in more than one way). For the significance of this result in thetheory of ODEs, see Bluman and Kumei (1989). Exercise 2.2a gives another example ofhow to de-dimensionalize a system.

Page 58: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 45

x1 = x1(1 − x2), x1(0) = x10, (2.31)

x2 = αx2(x1 − 1), x2(0) = x20, (2.32)

where α = d/a > 0, and x0 = (x10, x20) = (cξ10/d, bξ20/b).Solving the IVP (2.31)–(2.32) is difficult, but one can use a simple

state-space analysis to reduce the dimensionality of the system to 1.Then, by finding a first integral, it is possible to determine the state-space trajectories of the predator-prey system. Provided that x2 �= 1, theODE

dx2

dx1= α

(x1 − 1

x1

)(x2

1 − x2

)

has a separable right-hand side, so (via straightforward integration)

V(x1, x2) ≡ αx1 + x2 − ln xα1 x2 = C, (2.33)

where C ≥ 1 +α is a constant, describes a phase trajectory (figure 2.7).21

Note that V(x1, x2) in (2.33) is a potential function for the exact ODEderived from (2.31)–(2.32),

μ(x1, x2)x1(1 − x2)

x1 − μ(x1, x2)αx2(x1 − 1)

x2 = 0,

where μ(x1, x2) = α(x1 − 1)(1 − x2) is an integrating factor. �

2.2.7 Stability AnalysisAn equilibrium (also referred to as steady state or stationary point) of theODE x = f (t, x) (with domain D = R+ × X ) is a point x ∈ {ξ : (τ , ξ ) ∈D for some τ ∈ R+} = D0 such that

f (t, x) = 0, ∀ t ∈ {τ : (τ , x) ∈ D} = D0. (2.34)

If instead of (2.34) there exists a nontrivial interval I such that I × {x} ⊂D and f (t, x) = 0 for all t ∈ I, then the point x is a (temporary) equilibriumon the time interval I. For autonomous systems, where the right-handside of the ODE does not depend on time, any temporary equilibriumis also an equilibrium. To obtain a good qualitative understanding ofthe behavior of a given dynamic system, it is important to examine thetrajectories in the neighborhood of its equilibria, which is often referredto as (local) stability analysis. For practical examples of stability analyses,see exercises 2.2–2.4.

21. The lowest possible value for C can be determined by minimizing the left-hand sideof (2.33). It is approached when (x1, x2) → (1, 1).

Page 59: Optimal Control Theory With Applications in Economics

46 Chapter 2

Figure 2.7Periodic solutions x(t; C1), x(t; C2), x(t; C3) of the Lotka-Volterra predator-prey IVP (2.31)–(2.32), characterized by a constant potential V(x) = C ∈ {C1, C2, C3} in (2.33) with 1 +α <

C1 < C2 < C3 < ∞.

Intuitively, at an equilibrium the system described by an ODE cancome to a rest, so that (at least for some time interval) a solution of thesystem stays at the same state.22 An equilibrium can be found by settingall components of the system function f (x) = (f1(x), . . . , fn(x)) to zero.Each such equation fi(x) = 0, i ∈ {1, . . . , n}, defines a nullcline. For exam-ple, when n = 2 each nullcline usually corresponds to a line in the plane,with system equilibria appearing at intersection points. In exercises 2.2–2.4, which all deal with two-dimensional systems, nullclines play animportant role.

22. As seen in example 2.5, the solution to an ODE may not be unique. According tothe definition of a (temporary) equilibrium x, the IVP for some initial data (t0, x) has thesolution x(t) = x (on a nontrivial time interval that contains t0), but it may have othersolutions as well. Requiring IVPs to be well-posed (see section 2.2.3) leads to uniquenessof trajectories, increasing the significance of an equilibrium for the description of thesystem dynamics.

Page 60: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 47

We assume that the system function f (t, x) is locally Lipschitz in xand that x with [c, ∞) × {x} ⊂ D, for some c ∈ R, is an equilibrium. Theequilibrium x is said to be stable if for any t0 ≥ c and any solution x(t) ofthe ODE (2.3) for t ≥ t0,

∀ ε > 0, ∃δ > 0 : ‖x(t0) − x‖ < δ ⇒ ‖x(t) − x‖ < ε, ∀ t ≥ t0.

Otherwise the equilibrium x is called unstable. Finally, if there existsa δ > 0 (independent of t0) such that

‖x(t0) − x‖ < δ ⇒ limt→∞ x(t) = x,

then the equilibrium x is called asymptotically stable;23 an asymptoticallystable equilibrium x is said to be exponentially stable if there exist δ, C, λ >0 (independent of t0) such that

‖x(t0) − x‖ < δ ⇒ ‖x(t0) − x‖ ≤ Ce−λ(t−t0), ∀ t ≥ t0.

Example 2.9 In the Vidale-Wolfe advertising model (see example 2.4) itis easy to verify that the sales limit x∞ of a stationary advertising policyis an exponentially stable equilibrium. It is in fact globally exponentiallystable, since limt→∞ φ(t, x0) = x∞ for any initial sales level x0 ≥ 0. �

Figure 2.8 provides an overview of possible system behaviors close toan equilibrium. To establish the stability properties of a given equi-librium, it is useful to find continuously differentiable real-valuedfunctions V : D → R that are monotonic along system trajectories, atleast in a neighborhood of the equilibrium point. It was noted in sec-tion 2.2.5 that if such a function is constant along system trajectories(and otherwise nonconstant), it is a first integral of the ODE (2.3) andcan be interpreted as a potential function of an associated exact ODE.On the other hand, if V is nonincreasing along system trajectories inthe neighborhood of an equilibrium x where V also has a local mini-mum, then it is referred to as a Lyapunov function (figure 2.9). Lyapunov(1892) realized that the existence of such functions provides valuableinformation about the stability of an equilibrium.

Stability in Autonomous Systems Now consider the stability of agiven equilibrium for an autonomous system, described by the ODE

x = f (x), (2.35)

23. For nonautonomous systems it is common to add the word uniformly to the terms stableand asymptotically stable to emphasize the fact that the definitions are valid independentof the starting time t0.

Page 61: Optimal Control Theory With Applications in Economics

48 Chapter 2

(a) Stable Node

(d) Stable Focus (f) Unstable Focus

(b) Saddle (c) Unstable Node

(e) Center

Figure 2.8Classification of the local system behavior in the neighborhood of an equilibrium x accord-ing to the eigenvalues λi , i ∈ {1, 2}, of the system matrix A = fx(x): (a) stable node (λi realand negative); (b) saddle (λi real, with different signs); (c) unstable node (λi real and pos-itive); (d) stable focus (λi conjugate complex, with negative real parts); (e) center (λi , withzero real parts); ( f ) unstable focus (λi conjugate complex, with positive real parts).

where f : D ⊂ Rn → Rn is the system function. The key to establish-ing stability results for autonomous systems is to find a function V(x),defined on D or a suitable neighborhood of the equilibrium, which issuch that its values decrease along a system trajectory x(t), that is,

V(x(t)) ≡ 〈Vx(x(t)), x(t)〉 = 〈Vx(x(t)), f (x(t))〉is nonpositive, or even negative, in a neighborhood of the equilibrium.

Proposition 2.7 (Local Stability) (Lyapunov 1892) Let x be an equi-librium point of an autonomous system, and let V : D → R be acontinuously differentiable function.24 (1) If there exists ε > 0 such that

24. Note that V(x) = 〈Vx(x), f (x)〉 is interpreted as a function of x ∈ D (i.e., it does notdepend on t). Thus, while V(x) can have a local minimum at the point x, it is at thesame time possible that V(x) ≤ 0 in a neighborhood of x. This implies that V(x(t)) isnonincreasing along system trajectories in the neighborhood of x.

Page 62: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 49

Figure 2.9A Lyapunov function V(x) decreases along a trajectory x(t) of an (autonomous) systemx = f (x) in the neighborhood of an asymptotically stable equilibrium point x, so V(x(t)) =〈Vx(x(t)), f (x(t))〉 < 0.

0 < ‖x − x‖ < ε ⇒ V(x) < V(x) and V(x) ≤ 0,

then x is locally stable. (2) If there exists ε > 0 such that

0 < ‖x − x‖ < ε ⇒ V(x) < V(x) and V(x) < 0,

then x is asymptotically stable. (3) If for all ε > 0 the set Sε = {x ∈ Rn :‖x − x‖ < ε, V(x) > V(x)} is nonempty and V > 0 on Sε \ {x},25 then x isunstable (Cetaev 1934).

Proof (1) Without loss of generality, let ε > 0 be such that the closedball Bε(x) = {ξ ∈ Rn : ‖ξ − x‖ ≤ ε} is contained in D; otherwise just selectan appropriate ε1 ∈ (0, ε]. Furthermore, let mV be equal to the minimumof V(x) on the boundary of Bε(x), so that

mV = minx∈∂Bε (x)

V(x) > V(x),

since V(x) > V(x) for x �= x by assumption. Now let μ = (mV − V(x))/2,which implies that there exists δ ∈ (0, ε) such that

Bδ(x) ⊆ {ξ ∈ Bε(x) : V(ξ ) ≤ μ} � Bε(x),

the last (strict) inclusion being implied by the fact that μ < mV . SinceV(x) ≤ 0, it is not possible that any trajectory starting at an arbitrary

25. The set Sε denotes the closure of Sε .

Page 63: Optimal Control Theory With Applications in Economics

50 Chapter 2

point x ∈ Bδ(x) leaves the set Bε(x), which implies that the equilibriumx is stable.

(2) Along any trajectory x(t) = φ(t, x) starting at a point x ∈ Bε(x)(with ε small enough as in part (1)) the function V(x(t)) decreases. Since Vis bounded from below by V(x), it follows that limt→∞ V(x(t)) = V∞exists and that V∞ ≥ V(x). If V∞ > V(x), then let

ν = max{V(ξ ) : V∞ ≤ V(ξ ), ξ ∈ Bε(x)}be the largest gradient V evaluated at points at which V is not smallerthan V∞ (which excludes a neighborhood of x). Thus, ν < 0 and, usingthe fundamental theorem of calculus (proposition A.6),

V∞ − V(x) ≤ V(φ(t, x)) − V(x) =∫ t

0V(φ(s, x)) ds ≤ νt < 0,

which leads to a contradiction for times t > (V(x) − V∞)/( − ν) ≥ 0.Hence, V∞ = V(x), which by strictness of the minimum of V at x im-plies that limt→∞ φ(t, x) = x for all x ∈ Bε(x); thus x is asymptoticallystable.

(3) Let ε > 0 be small enough, so that Sε ⊂ D. The set Sε is open,since with any point x ∈ Sε points close enough to x also satisfythe strict inequalities in the definition of that set. Thus, fixing anypoint x0 ∈ Sε , the flow φ(t, x0) stays in Sε for some time τ > 0, andit is V(φ(τ , x0)) > V(x), since V > 0 on Sε . Set ν = inf{V(x) : V(x) ≥V(x0), x ∈ Sε}; then ν > 0, and

V(φ(t, x0)) − V(x0) =∫ t

0V(φ(s, x0)) ds ≥ νt > 0,

for all t > 0, so that, because V is bounded on Sε , the trajectory musteventually leave the set Sε . Since V(φ(t, x0)) > V(x0) > V(x), this cannotbe through the boundary where V(x) = V(x) but must occur throughthe boundary of Sε where ‖x − x‖ = ε. This is true for all (small enough)ε > 0, so the equilibrium x must be unstable. n

Note that proposition 2.7 can be applied without the need for explicitsolutions to an ODE.

Example 2.10 Let n = 1, and consider an autonomous system, de-scribed by the ODE x = f (x), where xf (x) < 0 for all x �= 0 (which impliesthat f (0) = 0). Then

Page 64: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 51

V(x) = −∫ x

0f (ξ ) dξ

is a Lyapunov function for the equilibrium point x = 0, since V(x) >V(x) = 0 for all x �= x. Furthermore,

V(x) = Vx(x)f (x) = −f 2(x) < 0

for all x �= x, which by virtue of proposition 2.7 implies asymptoticstability of x. �

Example 2.11 (Stability of Linear System) Consider the linear ODE (withconstant coefficients)

x = Ax, (2.36)

where A ∈ Rn×n is a given nonsingular system matrix. To examine thestability properties of the only equilibrium point x = 0,26 consider aquadratic Lyapunov function V(x) = x′Qx, where Q is a symmetric pos-itive definite matrix. This guarantees that V(x) > V(x) = 0 for all x �= x.Then

V(x) = x′Qx + x′Qx = x′QAx + x′A′Qx = x′ (QA + A′Q)

x.

Thus, if Q is chosen such that the (symmetric) matrix R = −(QA + A′Q) ispositive definite, then V(x) < 0 for all x �= x, and the origin is an asymp-totically stable equilibrium. As shown in section 2.3.2 (see equation 2.57),the flow of a linear system is

φ(t, x) = eAtx = [PeJtP−1]x = P

[r∑

i=1

eJit

]P−1x, ∀ t ≥ 0, (2.37)

where P ∈ Cn×n is a (possibly complex) nonsingular matrix that trans-forms the system into its Jordan canonical form,27

J = P−1AP = block diag( J1, . . . , Jr).

26. If the system matrix A is singular (i.e., det (A) = 0), then A has a nontrivial null space,every point of which is an equilibrium. In particular, the set of equilibria is itself a linearsubspace and in every neighborhood of an equilibrium there exists another equilibrium(i.e., no equilibrium can be isolated).27. For any matrix A and any nonsingular matrix P, the matrix P−1AP has the sameeigenvalues as A, which is why this procedure is referred to as similarity transform. Notealso that the Jordan canonical form is a purely conceptual tool and is not used in actualcomputations, as the associated numerical problem tends to be ill-conditioned.

Page 65: Optimal Control Theory With Applications in Economics

52 Chapter 2

For an eigenvalue λi (of multiplicity mi) in A’s set {λ1, . . . , λr} of r ∈{1, . . . , n} distinct complex eigenvalues (such that m1 + · · · + mr = n), thecorresponding Jordan block Ji is

Ji =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

λi 1 0 . . . . . . 00 λi 1 0 . . . 0...

. . ....

.... . . 0

.... . . 1

0 . . . . . . . . . 0 λi

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

∈ Cmi×mi ,

and therefore

eJit = eλi t

⎡⎢⎢⎢⎢⎢⎢⎢⎣

1 t t2

2! · · · tmi−1

(mi−1)!0 1 t · · · tmi−2

(mi−2)!...

. . ....

0 · · · 0 1 t0 0 · · · 0 1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

. (2.38)

For convenience, the linear system (2.36) or the system matrix A is calledstable (asymptotically stable, unstable) if the equilibrium x = 0 is sta-ble (asymptotically stable, unstable). The representation (2.37) directlyimplies the following characterization for the stability of A in terms ofits eigenvalues being located in the left half of the complex plane or not.

Lemma 2.2 (Stability of Linear System) Let {λ1, . . . , λr} be the set of rdistinct eigenvalues of the nonsingular system matrix A ∈ Rn×n. (1) A isstable if and only if Re(λi) ≤ 0 for all i ∈ {1, . . . , r}, and Re(λi) = 0 ⇒ Ji =λi. (2) A is asymptotically stable (or A is Hurwitz) if and only if Re(λi) < 0for all i ∈ {1, . . . , r}.Proof (1) If A has an eigenvalue λi with positive real part, then it isclear from (2.37) that eJit is unbounded for t → ∞. Hence, a necessarycondition for stability is that Re(λi) ≤ 0 for all i ∈ {1, . . . , r}. Part (2) ofthis proof implies that when all these inequalities are strict, one obtainsstability. Consider now the case where Re(λi) = 0 for some eigenvalue λi

with multiplicity mi > 1. Then, as can be seen from (2.38), the term eJit

becomes unbounded for t → ∞, so the implication Re(λi) = 0 ⇒ Ji = λi

(i.e., mi = 1) is necessary for the stability of A as well.(2) Asymptotic stability of A obtains if and only if φ(t, x) = eAtx → 0

as t → ∞ for any starting point x ∈ Rn. From the representation (2.37) of

Page 66: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 53

the flow in Jordan canonical form it is immediately clear that the origin isan asymptotically stable equilibrium, provided that all eigenvalues of Ahave negative real parts. If there exists an eigenvalue λi with Re(λi) ≥ 0,then not all trajectories are converging to the origin. n

For a linear system asymptotic stability is equivalent to global asymp-totic stability, in the sense that limt→∞ φ(t, x) = 0 independent of theinitial state x. The following result links the stability of the systemmatrix A to the Lyapunov functions in proposition 2.7.

Lemma 2.3 (Characterization of Hurwitz Property) Let R be an arbi-trary symmetric positive definite matrix (e.g., R = I). The systemmatrix A of the linear system (2.36) is Hurwitz if and only if thereis a symmetric positive definite matrix Q that solves the Lyapunovequation,

QA + A′Q + R = 0. (2.39)

Furthermore, if A is Hurwitz, then (2.39) has a unique solution.

Proof Let the symmetric positive definite matrix R be given.⇐: If Q solves the Lyapunov equation (2.39), then, as shown ear-

lier, V(x) = x′Qx is a Lyapunov function and by proposition 2.7(2) theorigin x = 0 is asymptotically stable, so that by lemma 2.2 the systemmatrix A is Hurwitz.

⇒ : Let

Q =∫ ∞

0eA′tR eAtdt.

Since A is Hurwitz, one can see from the Jordan canonical form in (2.37)that the matrix Q is well-defined (the integral converges); it is alsopositive semidefinite and symmetric. Furthermore,

QA + A′Q =∫ ∞

0

(eA′tR eAtA + A′eA′tQ eAt

)dt

=∫ ∞

0

ddt

eA′tR eAtdt =[eA′tR eAt

]∞0

= −R,

so that the Lyapunov equation (2.39) is satisfied. Last, the matrix Qis positive definite because otherwise there is a nonzero vector x suchthat x′Qx = 0, which implies that eAtx ≡ 0 on R+. But this is possibleonly if x = 0, a contradiction, so Q is positive definite.

Page 67: Optimal Control Theory With Applications in Economics

54 Chapter 2

Uniqueness: Given two symmetric positive definite solutions Q, Q tothe Lyapunov equation (2.39), it is

(Q − Q)A + A′(Q − Q) = 0,

so

0 = eA′t[(Q − Q)A + A′(Q − Q)]eAt = ddt

eA′t(Q − Q)eAt,

for all t ≥ 0. Thus, eA′t(Q − Q)eAt is constant for all t ≥ 0, so

eA′t(Q − Q)eAt = eA′0(Q − Q)eA0 = Q − Q = limt→∞ eA′t(Q − Q)eAt = 0,

which implies that Q = Q. n

When f (x) is differentiable, it is possible to linearize the system in aneighborhood of an equilibrium x by setting

A = fx(x),

so that f (x) = A(x − x) + O((x − x)2

), and then to consider the linearized

system

x = A(x − x)

instead of the nonlinear system, in a neighborhood of the equilibrium x.Figure 2.8 classifies an equilibrium x in the Euclidean plane accordingto the eigenvalues λ1, λ2 of the system matrix A, for n = 2.

The following linearization criterion, which allows statements aboutthe local stability of a nonlinear system based on the eigenvalues of itslinearized system matrix at an equilibrium, is of enormous practicalrelevance. Indeed, it proves very useful for the stability analyses inexercises 2.2–2.4.

Proposition 2.8 (Linearization Criterion) Assume that f (x) = 0. (1) Iffx(x) is Hurwitz, then x is an asymptotically stable equilibrium. (2) Iffx(x) has an eigenvalue with positive real part, then x is an unstableequilibrium.

Proof (1) If the matrix A = fx(x) is Hurwitz, then by lemma 2.3 for R = Ithere exists a unique symmetric positive definite solution Q to theLyapunov equation (2.39), so that V(ξ ) = ξ ′Qξ , with ξ = x − x, is a natu-ral Lyapunov-function candidate for the nonlinear system in a neigh-borhood of the equilibrium x. Indeed, if one sets �(ξ ) = f (ξ ) − Aξ =O(‖ξ‖2

2), its total derivative with respect to time is

Page 68: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 55

V(ξ ) = ξ ′Qf (ξ ) + f ′(ξ )Qξ

= ξ ′Q(Aξ +�(ξ )

)+ (ξ ′A′ +�′(ξ ))

= ξ ′(QA + A′Q)ξ + 2ξ ′Q�(ξ )

= − ξ ′Rξ + 2ξ ′Q�(ξ )

= −‖ξ‖22 + O(‖ξ‖3

2),

that is, negative for ξ �= 0 in a neighborhood of the origin. Therefore, byproposition 2.7 the equilibrium ξ = 0 is asymptotically stable, that is, xin the original coordinates is asymptotically stable.

(2) Assuming initially no eigenvalues on the imaginary axis, the mainidea is to decompose the system into a stable and an unstable part bya similarity transform (to a real-valued Jordan canonical form) suchthat

SAS−1 =[

−A+ 00 A−

],

with S ∈ Rn×n nonsingular and the square matrices A+, A− both Hurwitz(corresponding to the eigenvalues with positive and negative real parts,respectively). Using the new dependent variable

η =[η+η−

]= Sξ ,

whereη+, η− are compatible with the dimensions of A+, A−, respectively,one obtains

η = SAS−1η+ S�(S−1η) =[

−A+ 00 A−

][η+η−

]+[�+(η)�−(η)

],

so

η+ = − A+η+ + �+(η) = −A+η+ + O(‖η‖22),

η− = A−η− + �−(η) = A−η− + O(‖η‖22).

Since A+ and A− are Hurwitz, by lemma 2.3 there exist unique sym-metric positive definite matrices Q+, Q− such that Q+A+ + A′+Q+ =Q−A− + A′−Q− = −I. With these matrices, introduce the function

V(η) = η′[

Q+ 00 Q−

]η = η′

+Q+η+ − η′−Q−η−,

Page 69: Optimal Control Theory With Applications in Economics

56 Chapter 2

which (for η �= 0) is positive on the subspace {η ∈ Rn : η− = 0} and neg-ative on the subspace {η ∈ Rn : η+ = 0}. Hence, for all ε > 0 the set Sε ={η ∈ Rn : ‖η‖2 < ε, V(η) > V(0) = 0} is nonempty, and (with comput-ations as in part (1) of the proof),

V(η) = −η′+(Q+A+ + A′

+Q+)η+ − η′−(Q−A− + A′

−Q−)η+

+ 2(η′+Q+�+(η) − η′

−Q−�−(η))

= ‖η+‖22 + ‖η−‖2

2 + O(‖η‖32) = ‖η‖2

2 + O(‖η‖32) > 0

on Sε \ {0} as long as ε > 0 is small enough. Thus, by proposition 2.7(3),η = 0, and thus ξ = 0, or equivalently, x = x in the original coordinatesis unstable. The case where A has eigenvalues on the imaginary axis (inaddition to the ones with positive real parts) can be treated in the sameway as before by considering Aδ = A − δI instead for some small δ �= 0,so Aδ does not have any eigenvalues on the imaginary axis. Then all theprevious arguments remain valid and all (strict) inequalities continueto hold for δ → 0, which concludes the proof. n

Example 2.12 (Generic Failure of Linearization Criterion) Consider the non-linear system x = αx3. The point x = 0 is the only equilibrium. Lin-earization at that point yields A = fx(x) = 3αx2|x=x = 0, so nothingcan be concluded from proposition 2.8. Using the Lyapunov func-tion V(x) = x 4 gives V = 4αx 6. Thus, by proposition 2.7 the equilib-rium x is stable if α ≤ 0, unstable if α > 0, and asymptotically stableif α < 0. �

A (nonempty) set S ⊂ X is called invariant (with respect to theautonomous system (2.35)) if any trajectory starting in S remains therefor all future times, that is, if

x(t0) = x0 ∈ S ⇒ φ(t, x0) ∈ S, ∀ t ≥ t0.

For example, given any equilibrium point x of the system, the single-ton S = {x} must be invariant. The following remark shows that, moregenerally, the set of points from which trajectories converge to x (whichincludes those trajectories) is invariant.

Remark 2.8 (Region of Attraction) The region of attraction A(x) of anequilibrium x is defined as the set of points from which a trajectorywould asymptotically approach x. That is,

A(x) = {ξ ∈ Rn : limt→∞φ(t, ξ ) = x}.

Page 70: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 57

Clearly, if x is not asymptotically stable, then its region of attraction isa singleton containing only x. On the other hand, if x is asymptoticallystable, then A(x) is an invariant open set and its boundary is formed by systemtrajectories. The fact that A(x) is invariant is trivial. To show that theregion of attraction is open, pick an arbitrary point x0 ∈ A(x). Since x isasymptotically stable, there exists ε > 0 such that

‖x − x‖ < ε ⇒ x ∈ A(x). (2.40)

Since limt→∞ φ(t, x0) = x, there exists a time T > 0 such that ‖φ(T, x0) −x‖ < ε/2 (figure 2.10). By continuous dependence on initial conditions(proposition 2.5) there exists δ > 0 such that

‖x0 − x0‖ < δ ⇒ ‖φ(T, x0) −φ(T, x0)‖ ≤ ε/2,

so

‖x0 − x0‖<δ⇒ ‖φ(T, x0) − x‖ ≤ ‖φ(T, x0) −φ(T, x0)‖+ ‖φ(T, x0) − x‖<ε. (2.41)

Figure 2.10The region of attraction A(x) of an asymptotically stable equilibrium x is an open set.

Page 71: Optimal Control Theory With Applications in Economics

58 Chapter 2

Hence, from (2.40) and (2.41) it follows that with any x0 ∈ A(x) an openδ-neighborhood of x0 is also contained in A(x), so the region of attrac-tion is indeed an open set. The fact that the boundary of A(x) is formedby trajectories can be seen as follows. Consider any boundary pointx ∈ ∂A(x). Since x /∈ A(x), by the definition of the region of attrac-tion A(x), φ(t, x) /∈ A(x) for all t ≥ 0. On the other hand, taking anysequence of points, {xk}∞k=0 ⊂ A(x) such that xk → x as k → ∞, by thecontinuity of the flow (which is guaranteed by proposition 2.5) it is

limk→∞

φ(t, xk) = φ(t, x), ∀ t ≥ 0,

which implies that φ(t, x) ∈ ∂A(x) for all t ≥ 0. �

Proposition 2.9 (Global Asymptotic Stability) (Barbashin and Kraso-vskii 1952) Assume that x is an equilibrium of an autonomous sys-tem, and let V : Rn → R be a continuously differentiable function,which is coercive in the sense that V(x) → ∞ as ‖x‖ → ∞. If V(x) >V(x) and V(x) < 0 for all x �= x, then A(x) = Rn, that is, x is globallyasymptotically stable.

Proof For any initial point x ∈ Rn the set � = {ξ ∈ Rn : V(ξ ) ≤ V(x)} iscompact, because V is coercive. It is also invariant; no trajectory start-ing inside the set can increase the value of V. The rest of the proof isanalogous to part (2) of the proof of proposition 2.7. n

Stability in Time-Variant Systems Let D = R+ × X . Now considerthe stability properties of equilibria of the time-variant system

x = f (t, x), (2.42)

where f : D → Rn is continuous and locally Lipschitz with respect to x.The discussion is limited to asymptotic stability of an equilibrium.

Proposition 2.10 (Asymptotic Stability) Let x be an equilibrium ofthe time-variant system (2.42), and let V : D → R be a continuouslydifferentiable function. If there exist δ > 0 and continuous,28 increasingfunctions ρi : [0, δ) → R+ satisfying ρi(0) = 0, i ∈ {1, 2, 3}, such that

ρ1(‖x − x‖) ≤ V(t, x) ≤ ρ2(‖x − x‖)

and

28. Instead of being continuous on their domain, it is enough if all ρi, i ∈ {1, 2, 3}, areright-continuous at the origin.

Page 72: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 59

V(t, x) = Vt(t, x) + 〈Vx(t, x), f (t, x)〉 ≤ −ρ3(‖x − x‖)

for all (t, x) ∈ D with t ≥ t0 and ‖x − x‖ < δ, then x is asymptoticallystable.

Proof Fix ε ∈ [0, δ), and for any time t ≥ 0, let �t,ε = {x ∈ X : V(t, x) ≤ρ1(ε)} be the set of states x ∈ X at which the function V(t, x) does notexceed ρ1(ε). By assumption ρ1 ≤ ρ2 on [0, ε], so

ρ2(‖x − x‖) ≤ ρ1(ε) ⇒ x ∈ �t,ε

⇒ ρ1(‖x − x‖) ≤ V(t, x) ≤ ρ1(ε)

⇒ ‖x − x‖ ≤ ε,

for all (t, x) ∈ D. Since V(t, x) < 0 for all x ∈ X with 0 < ‖x − x‖ ≤ ε, forany initial data (t0, x0) ∈ D the set �t0,ε is invariant, that is,

x0 ∈ �t0,ε ⇒ φ(t − t0, t0, x0) ∈ �t0,ε, ∀ t ≥ t0,

where φ(t − t0, t0, x0) denotes the flow of the vector field f (t, x) startingat (t0, x0). Without loss of generality assume that ρ2(‖x0 − x‖) ≤ ρ1(ε)and x(t0) = x0 for some t0 ≥ 0. Thus, using the invariance of �t0,ε andthe fact that

ρ−12 (V(t, x(t))) ≤ ‖x(t) − x‖

gives

V(t, x(t)) ≤ −ρ3(‖x(t) − x‖) ≤ −ρ3(ρ−12 (V(t, x(t)))), ∀ t ≥ t0. (2.43)

Let ρ : [0, ρ1(ε)] → R+ be an increasing, locally Lipschitz function thatsatisfies ρ(0) = 0 and ρ ≤ ρ3 ◦ ρ−1

2 . From (2.43) it follows that if ν(t) is asolution to the autonomous IVP

ν = −ρ(ν), ν(t0) = ν0, (2.44)

for t ≥ t0, and ν0 = V(t0, x0) ∈ [0, ρ1(ε)], then it is V(t, x(t)) ≤ ν(t) forall t ≥ t0. The ODE in (2.44) has a separable right-hand side, so (seesection 2.2.2)

ν(t) ={

H−1(t − t0; ν0) if ν0 > 0,0 if ν0 = 0,

where H(ν; ν0) = ∫ ν0ν

dξρ(ξ ) > 0 is a decreasing function on (0, ν0) for 0 <

ν0 ≤ ρ1(ε), so its inverse, H−1(·; ν0), exists. Note that since limν→0+

Page 73: Optimal Control Theory With Applications in Economics

60 Chapter 2

H(ν; ν0) = ∞,29 any trajectory ν(t) starting at a point ν0 > 0 convergesto the origin as t → ∞, that is,

limt→∞ ν(t) = lim

t→∞ H−1(t − t0; ν0) = 0.

Hence, it has been shown that

‖φ(t − t0, t0, x0) − x‖ = ‖x(t) − x‖ ≤ ρ−11 (V(t, x(t))) ≤ ρ−1

1 (ν(t))

→ ρ−11 (0) = 0

as t → ∞, that is, limt→∞ φ(t − t0, t0, x0) = x, which implies that x isasymptotically stable. n

Remark 2.9 (Global Asymptotic Stability) If the assumptions of proposi-tion 2.10 are satisfied for all δ > 0, and ρ1, ρ2 are coercive in the sensethat limξ→∞ ρi(ξ ) = ∞ for i ∈ {1, 2}, then limξ→∞ ρ−1

2 (ρ1(ξ )) = ∞, soany x ∈ X is contained in�t,ε for large enough ε > 0, which implies thatthe equilibrium x is globally asymptotically stable. Moreover, if ρi(ξ ) = αiξ

c

for some positive c andαi, i ∈ {1, 2, 3}, then x is (globally) exponentially sta-ble, since (see the proof of proposition 2.10) ρ(ξ ) = ρ3(ρ−1

2 (ξ )) = (α3/α2)ξis locally Lipschitz and

‖φ(t − t0, t0, x0) − x‖ ≤(α3

α1

)1/c

‖x0‖ exp[− α3

cα2(t − t0)

],

for all t ≥ t0. �

Example 2.13 (Linear System) Consider the linear ODE (with variablecoefficients)

x = A(t)x, (2.45)

for t ≥ t0 ≥ 0, where the matrix function A(t) is continuous with valuesin Rn×n for all t ≥ 0. The point x = 0 is an equilibrium of (2.45), andin analogy to the autonomous case (see example 2.11), one tries theLyapunov function candidate V(t, x) = x′Q(t)x, for all (t, x) ∈ D,where Q(t) is continuous, with symmetric positive definite values.Then

V(t, x) = x′ (Q(t) + Q(t)A(t) + A′(t)Q(t))

x

29. The notation g(ξ+0 ) = limξ→ξ+

0g(ξ ) = limε→0 g(ξ0 + ε2) denotes a right-sided limit as ξ ∈

R approaches the point ξ0 on the real line from the right. A left-sided limit g(ξ−0 ) is defined

analogously.

Page 74: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 61

is negative if Q(t) solves the Lyapunov differential equation30

Q + QA(t) + A′(t)Q + R(t) = 0 (2.46)

for some continuous symmetric positive definite matrix R(t) (e.g.,R(t) ≡ I). If in addition there exist positive constants α1,α2,α3 suchthat α1I ≤ Q(t) ≤ α2I and R(t) ≥ α3I for all t ≥ t0, then α1‖x‖2

2 ≤ V(t, x) ≤α2‖x‖2

2 and V(t, x) ≤ −α3‖x‖22 for all (t, x) ∈ D, so by proposition 2.10 and

remark 2.9 the equilibrium x = 0 is globally asymptotically stable. �

Proposition 2.11 (Generalized Linearization Criterion) Let x be anequilibrium of the time-variant system (2.42). Assume that f is con-tinuously differentiable in a neighborhood of x and that the Jacobianmatrix fx is bounded and Lipschitz with respect to x. Set A(t) = fx(t, x)for all t ≥ 0; then x is exponentially stable if it is an exponentially stableequilibrium of the linear time-variant system x = A(t)(x − x).

Proof Let Q(t) be a symmetric positive definite solution to the Lya-punov differential equation (2.46) for R(t) ≡ I such that α1I ≤ Q(t) ≤ α2Ifor some α1,α2 > 0, and let V(t, x) = x′Q(t)x, for all t ≥ 0. Then, with theabbreviation �(t, x) = f (t, x) − A(t)(x − x), and given some δ > 0,

V(t, x) = (x − x)′Q(x − x) + (x − x)′Q(t)f (t, x) + f ′(t, x)Q(t)(x − x)

= (x − x)′(Q(t) + Q(t)A(t) + A′(t)Q(t)

)(x − x) + 2x′Q(t)�(t, x)

= − (x − x)′R(t)(x − x) + 2(x − x)′Q(t)�(t, x)

≤ −‖x − x‖22 + 2α2L‖x − x‖3

2

≤ − (1 − 2α2kδ)‖x‖22, (2.47)

for all‖x‖ ≤ δ, as long as δ < 1/(2α2k); realizing that�(t, x) as a truncatedTaylor expansion can be majorized by k‖x − x‖2

2 for some k > 0. Thus,by remark 2.9 the point x is (locally) exponentially stable. n

2.2.8 Limit Cycles and InvarianceIn many practically important situations system trajectories do notconverge to an equilibrium but instead approach a certain periodictrajectory, which is referred to as a limit cycle. For instance, in the Lotka-Volterra predator-prey system of example 2.8 every single trajectory is

30. One can show that if A(t) and R(t) are also bounded, then (2.46) has a solution (for t ≥t0), which is of the form Q(t) = ∫∞

t �′(s, t)Q(s)�(s, t) ds, where �(t, t0) is the fundamentalmatrix of the homogeneous linear system (2.45); see also section 2.3.2.

Page 75: Optimal Control Theory With Applications in Economics

62 Chapter 2

a limit cycle. For a given initial state x, the set

L+x = {ξ ∈ Rn : ∃ {tk}∞k=0 ⊂ R s.t. lim

k→∞tk = ∞ and lim

k→∞‖φ(tk, x) − ξ‖ = 0}

is called the positive limit set of x. It is useful to characterize such a limitset, particularly when it describes a limit cycle instead of an isolatedequilibrium.

Lemma 2.4 If for a given x ∈ D the trajectory φ(t, x) is bounded andlies in D for all t ≥ 0, then the positive limit set L+

x is nonempty, compact,and invariant.

Proof (1) Nonemptiness. Since the trajectory φ(t, x) is by assumptionbounded, by the Bolzano-Weierstrass theorem (proposition A.2) there isa sequence {tk}∞k=0 of time instances tk, with tk < tk+1 for all k ≥ 0, suchthat limk→∞ x(tk) exists, which implies that the positive limit set L+

x isnonempty.

(2) Compactness. By the previous argument the set L+x is also

bounded. Consider now any converging sequence {xj}∞j=0 ⊂ L+x such that

limj→∞ ‖x − xj‖ = 0 for some state x. Then (omitting some of the de-tails) in every neighborhood of x there must be infinitely many pointsof the trajectory, which implies that x ∈ L+

x , so L+x is also closed and

thus compact.(3) Invariance. Let x ∈ L+

x and consider the trajectory φ(t, x), whichneeds to stay in L+

x for all t ≥ 0 to obtain invariance. Since x is in thepositive limit set L+

x , there exists a sequence {tk}∞k=0 of time instances tk,with tk < tk+1 for all k ≥ 0, such that limk→∞ ‖x −φ(tk, x)‖ = 0. Hence,by the group laws for the flow (see remark 2.5) it is

φ(t + tk , x) = φ(t,φ(tk, x)) = φ(t, x(tk)),

which by continuity of the flow (see proposition 2.5) implies that

limk→∞

φ(t + tk , x) = limk→∞

φ(t, x(tk)) = φ(t, x),

which concludes the proof. n

Proposition 2.12 (Invariance Principle) (LaSalle 1968) Let � ⊂ Rn bea compact invariant set and V : � → R be a continuously differen-tiable function such that V(x) ≤ 0 for all x ∈ �. If M denotes the largestinvariant set in {ξ ∈ � : V(ξ ) = 0}, then31

31. A trajectory x(t) = φ(t, x) converges to a (nonempty) set M for t → ∞ iflimt→∞ infx∈M ‖x −φ(t, x)‖ = 0.

Page 76: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 63

φ(t, x) → M (as t → ∞), ∀ x ∈ �.

Proof Let x ∈ �. By invariance of�, the system trajectory x(t) = φ(t, x)stays in � for all t ≥ 0. Since V is by assumption nonpositive on �,the function V(x(t)) is nondecreasing in t. In addition, the continuousfunction V is bounded from below on the compact set�, so V(x(t)) has alimit V∞ as t → ∞. Since� is closed, it contains the positive limit set L+

x .Thus, for any x ∈ L+

x there exists a sequence {tk}∞k=0 with limk→∞ tk = ∞and limk→∞ φ(tk, x) = x. Hence, by continuity of V,

V∞ = limk→∞

V(φ(tk , x)) = V(

limk→∞

φ(tk , x))

= V(x).

By lemma 2.4 the set L+x is invariant, so V vanishes on L+

x . As a result,it must be that L+

x ⊂ M ⊂ {ξ ∈ � : V(ξ ) = 0} ⊂ �. Since � is boundedand invariant, it is x(t) → L+

x , which implies that x(t) → M as t → ∞.n

If, for some T > 0, a system trajectory x(t) satisfies the relation

x(t) = x(t + T), ∀ t ≥ 0,

then it is called a closed (or periodic) trajectory, and the curve γ = x([0, ∞))is referred to as a periodic orbit (or limit cycle).

When the state space is two-dimensional, it is possible to concludefrom a lack of equilibria in a positive limit set that this set must be aperiodic orbit. The following result is therefore very useful in practice(see e.g., exercise 2.3c).32

Proposition 2.13 (Poincaré-Bendixson Theorem) Let n = 2. If the pos-itive limit set L+

x , for some x, does not contain any equilibrium, then itis a periodic orbit (limit cycle).

Proof The proof proceeds along a series of four claims. A transver-sal � ⊂ D of the autonomous system x = f (x) is a segment of a linedefined by a′x − b = 0 for some a ∈ R2 and b ∈ R such that a′f (x) �= 0for all x ∈ �. Consider now any such transversal, assumed to be opensuch that it does not contain any endpoints.

32. The Poincaré-Bendixson theorem does not generalize to state spaces of dimensiongreater than 2. Indeed, the well-known Lorenz oscillator (Lorenz 1963), described by x1 =σ (x2 − x1), x2 = x1(ρ− x3) − x2, and x3 = x1x2 −βx3, where, in the original interpretationas an atmospheric convection system, σ > 0 is the Prandtl number, ρ > 0 a (normalized)Rayleigh number, and β > 0 a given constant, produces trajectories that do not convergeto a limit cycle (e.g., (σ , ρ,β) = (10, 28, 8/3), as discussed by Lorenz himself). The positivelimit sets of such systems are often referred to as strange attractors.

Page 77: Optimal Control Theory With Applications in Economics

64 Chapter 2

Claim 1 For any x ∈ � and any ε > 0 there exists δ > 0 such that ‖x −x(0)‖ < δ implies that x(t) ∈ � for some t ∈ (−ε, ε). If the trajectory isbounded, then the intersection point x(t) tends to x as ε → 0+. Let x ∈ Dand set g(t, x) = a′φ(t, x) − b. Thus, φ(t, x) ∈ � if and only if g(t, x) =0. Since x ∈ �, it is g(0, x) = 0 and gt(0, x) = a′f (φ(0, x)) = a′f (x) �= 0.Thus, by the implicit function theorem (proposition A.7) the continu-ous map τ (x) can for some ρ > 0 be implicitly defined by g(τ (x), x) = 0for ‖x − x‖ ≤ ρ. By continuity of τ for any ε > 0 there exists δ > 0 suchthat

‖x − x‖ < δ ⇒ |τ (x) − τ (x)| = |τ (x)| < ε.

Furthermore, choose δ ∈ (0, min{ε, ρ}) and let μ = sup{‖ f (φ(t, x))‖ :‖x − x‖ ≤ ρ, t ∈ R} (which is finite by assumption); then

‖x −φ(τ (x), x)‖ ≤ ‖x − x‖ + ‖x −φ(τ (x), x)‖ < ε+μ|τ (x)| < (1 +μ)ε,

because ‖φ(0, x) −φ(τ (x), x)‖ ≤ μ|τ (x)|.Claim 2 If there exist t0 < t1 < t2 such that {t ∈ [t0, t2] : x(t) ∈ �} ={t0, t1, t2}, then x(t1) = (1 − λ)x(t0) + λx(t2) for someλ ∈ (0, 1). Let xk = x(tk)for k ∈ {0, 1, 2}. If the claim fails, then without any loss of generality onecan assume that λ > 1, that is, x2 lies between x0 and x1. Let γ denote theclosed curve that is obtained by joining the trajectory segment x([t0, t1])with the line segment x1x0 (figure 2.11). By the Jordan curve theorem(proposition A.4) γ has an inside S and an outside R2 \ S. Assumingthat f points into S on the transversal�,33 the set S is invariant and thuscontains the trajectory segment φ((t1, t2), x). But since f (x2) points into S,φ(t2 − ε, x) must lie on the outside of γ , which is a contradiction.

Claim 3 If x ∈ L+x for some x, thenφ(t, x) can intersect� in at most one point.

Suppose that for some x ∈ L+x the trajectoryφ(t, x) intersects the transver-

sal� in two points, x1 �= x2, that is, {x1, x2} ⊆ φ(R+, x) ∩�. Now choosetwo open subintervals �1,�2 ⊂ � such that xj ∈ �j for j ∈ {1, 2}. Givena small ε > 0, define for each j ∈ {1, 2} the flow box Bj = {φ(t, ξ ) : t ∈(−ε, ε), ξ ∈ �j}.34 Since x1, x2 ∈ L+

x , by definition there exists an increas-ing sequence {tk}∞k=1 with tk → ∞ as k → ∞ such that φ(t2k−1, x) ∈ B1

and φ(t2k, x) ∈ B2 for all k ≥ 1. Because of the unidirectionality of the

33. If f points toward the outside, the same reasoning applies by reversing the orientationof γ and accordingly switching the inside and the outside of the closed curve.34. These flow boxes exist as a consequence of the rectifiability theorem for vector fields(see, e.g., Arnold and Il’yashenko 1988, 14).

Page 78: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 65

Figure 2.11Intuition for claim 2 in the proof of the Poincaré-Bendixson theorem.

flow in each flow box, one can without any loss of generality assumethat φ(tk , x) ∈ � for all k ≥ 1. But note that even though tk < tk+1 < tk+2,there exists no λ ∈ (0, 1) such that φ(tk+1, x) = (1 − λ)φ(tk, x) + λφ(tk+2, x),a contradiction to claim 2. Hence x1 = x2, and by claim 1, as ε → 0+ theintersection points φ(tk , x) → x1 = x2 as k → ∞. Thus, the set φ(R+, x) ∩� must be a singleton, equal to {x1}.Claim 4 If the positive limit set L+

x of a bounded trajectory (starting atsome point x ∈ D) contains a periodic orbit γ , then L+

x = γ . Let γ ⊆ L+x be a

periodic orbit. It is enough to show that φ(t, x) → γ because then neces-sarily L+

x ⊆ γ , and thus γ = L+x . Let x ∈ γ and consider a transversal �

with x ∈ �. Thus, since x ∈ L+x , there exists an increasing and diverging

sequence {tk}∞k=0 such that

φ(R+, x) ∩� = {φ(tk , x)}∞k=0 ⊂ �,

and limk→∞ φ(tk, x) = x. By claim 2, the sequence {φ(tk , x)}∞k=0 must bemonotonic in � in the sense that any point φ(tk, x) (for k ≥ 2) cannotlie strictly between any two previous points of the sequence. Moreover,φ(tk, x) → x monotonically as k → ∞. Because γ is by assumption a peri-odic orbit, there is T > 0 such that φ(T, x) = φ(0, x) = x. Fix a small ε > 0and take δ from claim 1. Then there exists k > 0 such that

k ≥ k ⇒ ‖x −φ(t, x k)‖ ≤ δ,

Page 79: Optimal Control Theory With Applications in Economics

66 Chapter 2

where xk = φ(tk , x), so that as a consequence of claim 1, φ(t + T, x k) ∈ �

for some t ∈ ( − ε, ε). Hence, tk+1 − tk ≤ T + ε for all k ≥ k. Becauseof the continuous dependence of solutions to ODEs on initial data(proposition 2.5), for any ε > 0 there exists δ > 0 such that

‖x − x k‖ ≤ δ ⇒ ‖φ(t, x) −φ(t, x k)‖ < ε, ∀ t : |t| ≤ T + ε.

Thus, by choosing ε = ε and δ ∈ [0, δ] it is

k ≥ k0 ⇒ ‖x − x k‖ ≤ δ ⇒ ‖φ(t, x) −φ(t, x k)‖ < ε, ∀ t : |t| ≤ T + ε,

for some k0 ≥ k. For t ∈ [tk , tk+1] and k ≥ k0 therefore,

infy∈γ ‖y −φ(t, x)‖ ≤ ‖φ(t − tk, x) −φ(t, x)‖ = ‖φ(t − tk , x) −φ(t − tk , x k)‖<ε,

since |t − tk| ≤ T + ε and φ(t, x) = φ(t − tk,φ(tk, x)) = φ(t − tk, x k) by thegroup laws for flows in remark 2.5. Hence, it has been shownthat φ(t, x) → γ as t → ∞, whence γ = L+

x .

Proof of Proposition 2.13 By lemma 2.4, the positive limit set L+x is a

nonempty compact invariant set. Let x ∈ L+x and let y ∈ L+

x ⊆ L+x ,

and consider a transversal � that contains the point y, which byassumption cannot be an equilibrium. Claim 3 implies that φ(t, x) canintersect � in at most one point. In addition, there is an increasingsequence {tk}∞k=0 with tk → ∞ as k → ∞ such that limk→∞ φ(tk , x) = y.The trajectory φ(t, x) must therefore cross the transversal � infinitelymany times, so that (taking into account that L+

x does not contain anyequilibrium) necessarily φ(t, x) = φ(t + T, x) for some T > 0. Claim 4now implies that L+

x is a periodic orbit, for it contains a periodicorbit. n

In order to exclude the existence of limit cycles for a given two-dimensional system, the following simple result can be used.35

Proposition 2.14 (Poincaré-Bendixson Criterion) Let f : D → R2 becontinuously differentiable. If on a simply connected domain D ⊂ R2

the function div f = ∂f1/∂x1 + ∂f2/∂x2 does not vanish identically andhas no sign changes, then the system has no periodic orbit in D.

35. In physics, an electrostatic field E(x) = (E1(x), E2(x)), generated by a distributed chargeof density ρ(x) ≥ 0 (not identically zero), satisfies the Maxwell equation div(E(x)) = ρ(x)and therefore cannot produce closed trajectories (field lines). By contrast, a magneto-static field B(x) = (B1(x), B2(x)) satisfies the Maxwell equation divB(x) = 0 for all x anddoes always produce closed field lines (limit cycles).

Page 80: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 67

Proof Assume that γ : [0, 1] → D is a periodic orbit in D (i.e., withγ (0) = γ (1)). Then, by Green’s theorem (see J. M. Lee 2003, 363) it is

0 =∫γ

(f2(x) dx1 − f1(x) dx2

) =∫

Sdiv f (x)dx,

where the surface S is the interior of the curve γ such that ∂S = γ ([0, 1]).But the last term cannot vanish, since by assumption div f (x) either > 0or < 0 on a nonzero-measure subset of S (using the continuity of thederivatives of f ). This is a contradiction, so there cannot exist a periodicorbit that lies entirely in D. n

2.3 Higher-Order ODEs and Solution Techniques

Let n, r ≥ 1. An rth order ODE is of the form

F(t, x(t), x(t), x(t), . . . , x(r)(t)) = 0, (2.48)

where x(k)(t) = dkx(t)/dtk, k ∈ {1, . . . , r}, is the kth total derivative of x(t)with respect to t, using the convenient abbreviations x(1) = x and x(2) = x.The function F : R1+n+nr → Rn is assumed to be continuously differ-entiable (at least on a nonempty connected open subset of R1+n+nr).As before, the (real-valued) dependent variable x(t) = (x1(t), . . . , xn(t))varies in Rn as a function of the independent variable t ∈ R.

2.3.1 Reducing Higher-Order ODEs to First-Order ODEsAny rth order ODE in the (implicit) form (2.48) can be transformedto an equivalent first-order ODE; by introducing r functions y1, . . . , yr

(setting x(0) = x) such that

yk = x(k−1), k ∈ {1, . . . , r}, (2.49)

one can rewrite (2.48) in the form

F(t, y, y) =

⎡⎢⎢⎢⎢⎢⎢⎣

y1 − y2

y2 − y3

...yr−1 − yr

F(t, y1, . . . , yr, yr)

⎤⎥⎥⎥⎥⎥⎥⎦ = 0,

where y = (y1, . . . , yr) ∈ Rn with n = nr is the independent variable,and F : R1+2n → Rn is a continuously differentiable function.

Page 81: Optimal Control Theory With Applications in Economics

68 Chapter 2

Remark 2.10 When F(t, x, x, x, . . . , x(r)) = x(r) − f (t, x, x, x, . . . , x(r−1)) forsome continuous function f : R1+nr → Rn, the variables introducedin (2.49) can be used to transform a higher-order ODE into explicitform,

x(r) = f (t, x, x, x, . . . , x(r−1)),

to a first-order ODE in explicit form,

y = f (t, y),

where the continuous function f : R1+n → Rn is such that

f (t, y) =

⎡⎢⎢⎢⎣

y2

...yr

f (t, y1, . . . , yr)

⎤⎥⎥⎥⎦ .

Since it is possible to reduce higher-order ODEs to first-order ODEs,all techniques described for n > 1 in section 2.2 in principle may also beapplied to higher-order ODEs.

2.3.2 Solution TechniquesGeneral analytical solution techniques are available for linear systemsof the form

x = A(t)x + b(t), (2.50)

where A(t) and b(t) are continuous functions with values in Rn×n and Rn,respectively. When A, b are constant, the linear ODE (2.50) is said tohave constant coefficients (see example 2.11). Using appropriate variabletransformations, nonlinear systems sometimes can be reduced to linearsystems.

Example 2.14 (Riccati Equation) Consider a matrix Riccati IVP of theform

X + XA(t) + A′(t)X + XP(t)X = R(t), X(t0) = K,

where A(t), P(t), R(t) ∈ Rn×n are continuous, bounded matrix functionssuch that P(t) and R(t) are symmetric positive definite, for all t ≥ t0.The matrix K ∈ Rn×n of initial values is assumed to be symmetricand positive definite. Because of the structure of the ODE, a solu-tion X(t) of the matrix Riccati IVP will have symmetric positive definite

Page 82: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 69

values. Moreover, the solution can be written in the form X = ZY−1,where Y(t), Z(t), with values in Rn×n, solve the linear IVP[

YZ

]=[

A(t) P(t)R(t) −A′(t)

] [YZ

],

[Y(t0)Z(t0)

]=[

IK

],

for all t ≥ t0. Indeed, by direct differentiation,

X = ddt

[ZY−1] = ZY−1 − ZY−1YY−1

= (R(t)Y − A′(t)Z

)Y−1 − ZY−1 (A(t)Y + P(t)Z

)Y−1

= R(t) − A′(t)X − XA(t) + XP(t)X,

for all t ≥ t0, and X(t0) = KI = Z(t0)Y−1(t0). �

Linear Systems with Variable CoefficientsAs in the solution of the linear ODE for n = 1 (see section 2.2.2) the linearsystem (2.50) is homogeneous when b(t) ≡ 0 and otherwise inhomogeneous.

Homogeneous Linear System Given the initial data (t0, x0) ∈ R1+n,consider the linear homogeneous IVP,

x = A(t)x, x(t0) = x0. (2.51)

By proposition 2.3 a solution to (2.51) exists and is unique. To find thesolution, it is convenient to study the solutions of the linear matrix IVP

X = A(t)X, X(t0) = I, (2.52)

where X(t) has values in Rn×n for all t ∈ R. A solution �(t, t0) to (2.52) iscalled fundamental matrix (or state-transition matrix), and its determinant,det�(t, t0), is usually referred to as Wronskian.

Lemma 2.5 (Properties of Fundamental Matrix) Let s, t, t0 ∈ R. Thenthe (unique) fundamental matrix �(t, t0) satisfies the following prop-erties:

1. �t(t, t0) = A(t)�(t, t0), �(t0, t0) = I.

2. �t0 (t, t0) = −�(t, t0)A(t0).

3. �(t, t) = I.

4. �(t0, t) = �−1(t, t0).

5. �(t, t0) = �(t, s)�(s, t0).

6. det�(t, t0) = exp[∫ tt0

trace A(θ ) dθ ] > 0. (Liouville Formula)

Page 83: Optimal Control Theory With Applications in Economics

70 Chapter 2

Proof The six properties are best proved in a slightly different order.The statement of property 1 is trivial, since by assumption for any t, t0 ∈R the fundamental matrix �(t, t0) solves (2.52). Property 3 followsdirectly from property 1. Property 5 is a direct consequence of the grouplaws for flows of nonautonomous systems (see remark 2.4). To proveproperty 4, note that I = �(t0, t0) = �(t0, t)�(t, t0) by properties 3 and 5.To prove property 2, note first that by the chain rule of differentiation

0 = ddt0

I = ddt0

�(t, t0)�−1(t, t0)

= �t0 (t, t0)�−1(t, t0) +�(t, t0)(∂

∂t0�−1(t, t0)

).

On the other hand, switching t and t0 in property 1, multiplying by�(t, t0) from the left, and using property 4 yields

�(t, t0)�t0 (t0, t) = �(t, t0)(∂

∂t0�−1(t, t0)

)= �(t, t0)A(t0)�−1(t, t0),

which, together with the preceding relation, implies the result. Finally,to prove property 6: The matrix �(s, t0) corresponds to the flow of thematrix ODE in (2.52) at time s, starting from the initial value�(t0, t0) = Iat time t0. By linearity, the solution to the IVP

X = A(t)X, X(s) = �(s, t0),

is�(t, s)�(s, t0), and by uniqueness of the solution to the IVP (2.52) (usingproposition 2.3) this is equal to �(t, t0).36 Denote by �i the ith columnvector of �, so that �(t, t0) = [�1(t, t0), . . . ,�n(t, t0)]; it is

∂tdet�(t, t0) =

n∑i=1

det[�1(t, t0), . . . ,�i−1(t, t0),

∂�i(t, t0)∂t

,

�i+1(t, t0), . . . ,�n(t, t0)]

.

Furthermore, using the initial condition �i(t0, t0) = ei (with ei the ithEuclidean unit vector) together with property 1 gives

∂t

∣∣∣∣t=t0

det�(t, t0) =n∑

i=1

det [e1, . . . , ei−1, A(t0)ei, ei+1, . . . , en] = trace A(t0).

36. This is also consistent with the group laws for flows in remark 2.4.

Page 84: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 71

By properties 4 and 5, �(t0, t) = �(t0, s)�(s, t) = �−1(t, s), so

trace A(t) = ∂

∂t0

∣∣∣∣t0=t

det�(t0, t) =∂∂t0

∣∣∣t0=t

det�(t0, s)

det�(t, s)=

∂∂t det�(t, s)det�(t, s)

.

Thus, setting s = t0 and ϕ(t) = det�(t, t0), and using property 3, oneobtains that ϕ solves the linear IVP

ϕ = trace A(t)ϕ, ϕ(t0) = det�(t0, t0) = 1,

which, using the Cauchy formula in proposition 2.1, has the uniquesolution

ϕ(t) = exp[∫ t

t0

trace A(s)ds]

,

This in turn implies that the Wronskian associated with the fundamentalmatrix �(t, t0) is always positive. n

Remark 2.11 (Peano-Baker Formula) In general, for an arbitrary A(t), itis not possible to directly compute the fundamental matrix �(t, t0) inclosed form. An approximation can be obtained by recursively solvingthe fixed-point problem

X(t) = I +∫ t

t0

A(s)X(s) ds,

which is equivalent to the matrix IVP (2.52). This results in the Peano-Baker formula for the fundamental matrix,

�(t, t0) = I +∞∑

k=0

∫ t

t0

⎛⎝∫ s1

t0

· · ·⎛⎝∫ sk−1

t0

⎡⎣ k∏

j=1

A(sj)

⎤⎦ dsk

⎞⎠ · · · ds2

⎞⎠ ds1,

for all t, t0 ∈ R. Truncating the series on the right-hand side yields anestimate for the fundamental matrix. �

Given the fundamental matrix �(t, t0) as solution to the matrixIVP (2.52), the homogeneous solution to the linear IVP (2.51) is

xh(t) = �(t, t0)x0, (2.53)

for all t ∈ R.

Inhomogeneous Linear System Consider the inhomogeneous linearIVP

Page 85: Optimal Control Theory With Applications in Economics

72 Chapter 2

x = A(t)x + b(t), x(t0) = x0. (2.54)

As in the case where n = 1 (see section 2.2.2), the homogeneous solutionin (2.53) is useful to determine the solution to (2.54) by appropriatesuperposition.

Proposition 2.15 (Generalized Cauchy Formula) The unique solutionto the inhomogeneous linear IVP (2.54) is

x(t) = �(t, t0)x0 +∫ t

t0

�(t, s)b(s) ds,

for all t ∈ R, provided that the fundamental matrix �(t, t0) solves thematrix IVP (2.52) on R.

Proof This proof uses the same variation-of-constants idea as the proofof the one-dimensional Cauchy formula in section 2.2.2. Let C = C(t),and consider xh(t; C) = �(t, t0)C(t) as a candidate for a particular solu-tion xp(t) of the ODE x = Ax + b. Substituting xp(t) = xh(t; C) into thisODE yields, by virtue of lemma 2.5(4), that

C(t) = �−1(t, t0)b(t) = �(t0, t)b(t),

and using the initial condition C(t0) = 0, it is

C(t) =∫ t

t0

�(t0, s)b(s) ds.

Thus, taking into account lemma 2.5(5), the particular solution becomes

xp(t) = xh(t; C(t)) =∫ t

t0

�(t, t0)�(t0, s)b(s) ds =∫ t

t0

�(t, s)b(s) ds,

whence the generalized Cauchy formula is obtained by superposition,

x(t) = xh(t; x0) + xp(t) = �(t, t0)x0 +∫ t

t0

�(t, s)b(s) ds,

as the (by proposition 2.3, unique) solution to the inhomogeneous linearIVP (2.54). n

Linear Systems with Constant Coefficients When the ODE (2.50)has constant coefficients, the fundamental matrix can be determinedexplicitly by direct integration,

Page 86: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 73

�(t, t0) = eAt =∞∑

k=0

Ak tk

k! , (2.55)

where e At is a so-called matrix exponential, defined by a power series.37

Using the generalized Cauchy formula in proposition 2.15, the uniquesolution to the inhomogeneous linear IVP with constant coefficients,

x = Ax + b(t), x(t0) = x0, (2.56)

becomes

x(t) = eAtx0 +∫ t

t0

eA(t−θ )b(θ ) dθ , (2.57)

for all t ∈ R.

Remark 2.12 (Nilpotent Linear Systems) A linear system (with constantcoefficients) is nilpotent if there exist a real number r ∈ R and a positiveinteger κ such that (A − rI)κ = 0.38 It is clear that for nilpotent systemsthe series expansion for the matrix exponential in (2.55) has only a finitenumber of terms, since

eAt = erte(A−rI)t = ert

(I + (A − rI)t + (A − rI)2 t2

2+ · · · + (A − rI)κ−1 tκ−1

(κ − 1)!

).

Note that a matrix is nilpotent if and only if all its eigenvalues arezero. �

Remark 2.13 (Laplace Transform) Instead of solving a system of linearODEs directly, it is also possible to consider the image of this problemunder the Laplace transform. Let x(t), t ≥ t0, be a real-valued functionsuch that |x(t)| ≤ Keαt for some positive constants K and α. The Laplacetransform of x(t) is given by

X(s) = L[x](s) =∫ ∞

t0

e−stx(t) dt, s ∈ C, Re(s) > α.

The main appeal of the Laplace transform for linear ODEs stems fromthe fact that the Laplace transform is a linear operator and that L[x] =sX(s) − x(t0). Thus, an IVP in the original time domain becomes an alge-braic equation after applying the Laplace transform. Solving the latter

37. Moler and Van Loan (2003) provide a detailed survey of various numerical methodsto compute eAt.38. That is, the matrix A − rI is nilpotent.

Page 87: Optimal Control Theory With Applications in Economics

74 Chapter 2

Table 2.1Properties of the Laplace Transform (α,β, τ ∈ R with τ > 0; t0 = 0)

Property t-Domain s-Domain

1 Linearity αx(t) +βy(t) αX(s) +βY(s)2 Similarity x(αt), α �= 0 (1/α)X(s/α)3 Right-translation x(t − τ ) e−τ sX(s)4 Left-translation x(t + τ ) eτ s (X(s) − ∫ τ0 e−stf (t) dt

)5 Frequency shift ertx(t) X(s + r), r ∈ C

6 Differentiation x(t) sX(s) − x(0+)

7 Integration∫ t

0 f (θ ) dθ (1/s)X(s)

8 Convolution (x ∗ y)(t) = ∫ t0 x(t − θ )y(θ ) dθ X(s)Y(s)

Table 2.2Common Laplace Transforms (α, τ ∈ R with τ > 0; n ∈ N; t0 = 0)

x(t) X(s) Domain

1 δ(t −α) exp (−αs) s ∈ C

2 (tn/n!) exp (−αt) 1R+ 1/(s +α)n+1 Re(s) > −α3 tα 1R+ s−(1+α)�(1 +α) Re(s) > 04 ln (t/τ ) 1R+ −(τ/s)

(ln (τ s) + γ

)Re(s) > 0

equation for X(s), it is possible to obtain the original solution using theinverse Laplace transform,

x(t) = L−1[X](t) = 12π i

∫ c+i∞

c−i∞estX(s) ds,

where c is such that X(s) is analytic for Re(s) ≥ 0. The most importantproperties of the Laplace transform are listed in table 2.1 and commontransform pairs in table 2.2.39 �

Example 2.15 (Linear System with Constant Coefficients) Consider the lin-ear IVP with constant coefficients (2.56) for t0 = 0, which, using theLaplace transform (with properties 1 and 6 in table 2.1), can be writtenin the s-domain as

sX(s) − x0 = AX(s) + B(s),

39. More detailed tables are readily available (see, e.g., Bronshtein et al. 2004, 708–721,1061–1065). The Laplace transform and its inverse can also be applied when x(t) is vector-valued, in which case X(s) is also vector-valued, as in example 2.15.

Page 88: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 75

where B(s) = L[b](s). Solving this algebraic equation for the unknownX(s) yields

X(s) = − (A − sI)−1 (B(s) + x0)

.

By the linearity property and entry 1 in table 2.2, L−1[B + x0](t) =b(t) + x0δ(t), where δ(t) is a Dirac distribution.40 In addition, the matrixanalogue of entry 2 in table 2.2 yields that L−1[−(A − sI)−1](t) = exp (At).Hence, properties 1 and 8 in table 2.1, in conjunction with the filterproperty of the Dirac distribution, immediately imply (2.57). �

2.4 Notes

The theory of ODEs is well developed (see, e.g., Coddington andLevinson 1955; Pontryagin 1962; Hartman 1964; Petrovski[i] 1966). Thepresentation in this chapter is motivated in part by Arnold (1973), Wal-ter (1998), and Khalil (1992). Arnold (1988) discusses more advancedgeometrical methods. Godunov (1994) gives a very detailed account oflinear systems with constant coefficients, and Sachdev (1997) providesa compendium of solution methods for nonlinear ODEs.

There are three main situations when the standard theory of ODEsneeds to be extended. The first is when the function f on the right-handside of (2.3) is non-Lipschitz, and thus the problem may not be well-posed in the sense that small changes in initial conditions may leadto arbitrarily large changes in solutions and also to nonuniqueness ofsolutions. Wiggins (2003) gives an introduction to nonlinear dynamicsand chaos.

The second situation occurs when considering the dependence ofsystems on parameters, which raises questions about the structural sta-bility of solutions. Thom (1972) provided the first coherent account onthe matter, which essentially founded the field of catastrophe theory.Arnold (1992) is an excellent nontechnical introduction, while Arnoldet al. (1999) is a more rigorous account.41

The third situation which requires additional theory is when theright-hand side of (2.34) is discontinuous (Filippov 1988) or set-valued(Smirnov 2002). This third situation is of some importance for the control

40. The Dirac distribution is defined as the limit δ(t) = limε→0+ (1/ε) max{0, 1 − |x/ε|}. Itscharacteristic properties are (1) (normalization)

∫ ε−ε δ(t) dt = 1 for any ε > 0; and (2) (filter

property)∫ ε−ε δ(t)y(t)dt = y(0) for any continuous function y and any ε > 0.

41. Arnold (2000) compares rigid versus flexible models in mathematics with the aid ofseveral interesting examples.

Page 89: Optimal Control Theory With Applications in Economics

76 Chapter 2

of dynamic systems. Discontinuous controls arise naturally as solutionsto standard optimal control problems (see chapter 3). More generally,the relevant dynamic system, when allowing for a choice of controlsfrom a certain set, is described by a differential inclusion (featuring aset-valued right-hand side).

Many of the examples in this chapter have their roots in theoreticalbiology; see Murray (2007) for further discussion in that context.

2.5 Exercises

2.1 (Growth Models) Given the domain D = R2++ and initial data(t0, x0) ∈ D such that (t0, x0) � 0,42 solve the initial value problem

x = f (t, x), x(t0) = x0,

where f (t, x) : D → R is one of the continuous functions given in partsa–c. For this, assume that x > x0 is a finite carrying capacity andthat α,β, γ > 0 are given parameters.

a. (Generalized Logistic Growth) f (t, x) = αγ (1 − ( xx

)1/γ )x.

b. (Gompertz Growth); f (t, x) = αx ln( x

x

). Show that Gompertz growth

is obtained from generalized logistic growth in part a as γ → ∞.

c. (Bass Diffusion) f (t, x) = (1 − x

x

)(αx +β)ρ(t), where ρ(t) = 1 + δu(t).

The continuous function u(t) with values in [0, u] describes the relativegrowth of advertising expenditure for some u > 0, and the constant δ ≥0 determines the sensitivity of the Bass product-diffusion process toadvertising.43

d. (Estimation of Growth Models) Let δ = 0. Find some product-diffusion data for your favorite innovation, and estimate the values ofthe parameters α,β, γ for the models in parts a–c. Plot the three differ-ent growth curves together with your data. (There is no need to developeconometrically rigorous estimation procedures; the point of this exer-cise is to become familiar with some software that can produce graphicaloutput and to experiment with the different growth models.)

42. Given two vectors a, b ∈ Rn, write that a = (a1, . . . , an) � (b1, . . . , bn) = b if and only

if ai > bi for all i ∈ {1, . . . , n}.43. This diffusion model was developed by Frank M. Bass in 1969 for δ = 0 (see exam-ple 2.1). The original model fits diffusion data surprisingly well, as shown by Basset al. (1994). It was included in the Management Science special issue on “Ten MostInfluential Titles of Management Science’s First Fifty Years.”

Page 90: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 77

2.2 (Population Dynamics with Competitive Exclusion) Consider theevolution of two interacting populations, ξ1(τ ) and ξ2(τ ). At time τ ≥ 0,the evolution of the populations, with given positive initial sizes of ξ10

and ξ20, is described by the following initial value problem:

ξ1 = a1ξ1

(1 − ξ1

ξ1− b12

ξ2

ξ1

), ξ1(0) = ξ10,

ξ2 = a2ξ2

(1 − ξ2

ξ2− b21

ξ1

ξ2

), ξ2(0) = ξ20,

where the constants ai, ξi > ξi0, i ∈ {1, 2} are positive and b12, b21 ∈ R.

a. Using a linear variable transformation, de-dimensionalize the model,such that the evolution of the two populations is described by func-tions x1(t) and x2(t) that satisfy

x1 = x1 (1 − x1 −β12x2) , x1(0) = x10, (2.58)

x2 = αx2 (1 − x2 −β21x1) , x2(0) = x20, (2.59)

where α,β12,β21, x10, x20 are appropriate positive constants. Interpret themeaning of these constants. The description (2.58)–(2.59) is used forparts b–c below.

b. Determine any steady states of the system (2.58)–(2.59). Show thatthere is at most one positive steady state (i.e., strictly inside the positivequadrant of the (x1, x2)-plane). Under what conditions on the parametersdoes it exist?

c. Determine the stability properties of each steady state determined inpart b.

d. Under the assumption that there is a positive steady state, draw aqualitative diagram of the state trajectories in phase space (given anyadmissible initial conditions).

e. How does the phase diagram in part d support the conclusion (fromevolutionary biology) that when two populations compete for the samelimited resources, one population usually becomes extinct ? Discuss yourfindings using a practical example in economics.

2.3 (Predator-Prey Dynamics with Limit Cycle) Consider two inter-acting populations of prey x1(t) and predators x2(t), which for t ≥ 0evolve according to

Page 91: Optimal Control Theory With Applications in Economics

78 Chapter 2

x1 = x1(1 − x1) − δx1x2

x1 +β, x1(0) = x10, (2.60)

x2 = αx2

(1 − x2

x1

), x2(0) = x20, (2.61)

where α,β, δ are positive constants, and x0 = (x10, x20) � 0 is a giveninitial state.

a. Find all steady states of the system (2.60)–(2.61). Show that there is aunique positive steady state x∗.

b. What is the smallest value δ for which the system can exhibit unstablebehavior around the positive steady state?

c. Show that if the positive steady state is unstable, there exists a (sta-ble?) limit cycle. For that case, draw a qualitative diagram of solutiontrajectories in the phase space.

d. Discuss qualitative differences of the diagram obtained in part c fromthe Lotka-Volterra predator-prey model. Which model do you expect tomore robustly fit data? Explain.

2.4 (Predator-Prey Dynamics and Allee Effect) Consider the co-evolution of two related technologies. Let xi(t) denote the installed base(the number of users) for technology i ∈ {1, 2}. The dynamic interactionof the two user populations for times t ≥ 0 is described by the systemof ODEs

x1 = x1(g(x1) − x2

),

x2 = x2(x1 − h(x2)

),

where the functions g, h : R+ → R+ have the following properties:

• g( · ) has a single peak at x1, is concave on [0, x1], g(x1) > g(0) > 0, andlimx1→∞ g(x1) = 0.• h( · ) is concave, increasing, and h(0) > 0.

The initial state of the system, x(0) = x0 > 0, is known.

Part 1: System Description

a. Sketch the functions g and h, and briefly discuss how the behavior ofthis system qualitatively differs from a classical Lotka-Volterra predator-prey system.

Page 92: Optimal Control Theory With Applications in Economics

Ordinary Differential Equations 79

b. Describe why this system exhibits the Allee effect.44

Part 2: Stability Analysis

c. Is the origin a stable or an unstable equilibrium? Prove yourstatement.

d. Show that there exists a unique positive equilibrium x = (x1, x2), andexamine its stability properties. Determine conditions on g and h thatwould guarantee stability or instability. (Hint: Write the system in thestandard form x = f (x), and draw the nullclines in the state space; alsodistinguish the cases where x1 < x1 and where x1 > x1.)

Part 3: Equilibrium Perturbations and Threshold Behavior

e. Consider the case where x = (x1, x2) � 0 is an equilibrium suchthat x1 > x1. Assume that the system is perturbed away from x so that attime t = 0 it starts at the state x0 > x. Provide an intuitive proof for thefact that the system behavior becomes qualitatively different as ‖x0 − x‖increases and passes a threshold. How does this threshold relate to theAllee effect?

f. Consider the case where x = (x1, x2) � 0 is an equilibrium suchthat x1 < x1. Based on your answer in part d, discuss what happenswhen the system is perturbed away from the steady state x. (Hint: ThePoincaré-Bendixson theorem might be of help.)

Part 4: Interpretation and Verification

g. Interpret the insights obtained from the qualitative analysis in parts1–3 in the context of technological innovation. Try to find a particularreal-world example to illustrate your argument.

h. Provide suitable specifications for the functions g and h to illustratethe system behavior and your earlier conclusions using computer-generated phase diagrams (including trajectories). Describe any inter-esting observations that the preceding qualitative analysis may havemissed.

44. The effect is named after Allee (1931; 1938); see also the discussion by Stephens etal. (1999). It refers to the observation that (at least below a certain population threshold)the growth rate of a population increases with its size.

Page 93: Optimal Control Theory With Applications in Economics
Page 94: Optimal Control Theory With Applications in Economics

3 Optimal Control Theory

Imagination is good.But it must always be criticallycontrolled by the available facts.

—Albert Einstein

3.1 Overview

Chapter 2 discussed the evolution of a system from a known initialstate x(t0) = x0 as a function of time t ≥ t0, described by the differentialequation x(t) = f (t, x(t)). For well-posed systems the initial data (t0, x0)uniquely determine the state trajectory x(t) for all t ≥ t0. In actual eco-nomic systems, such as the product-diffusion process in example 2.1,the state trajectory may be influenced by a decision maker’s actions, inwhich case the decision maker exerts control over the dynamic process.For example, product diffusion might depend on the price a firm chargesand on its marketing effort. The lower the price, or the higher the market-ing effort, the faster one expects the product’s consumer base to increase.

When all the decision maker’s choice variables are collected in avector-valued control u(t), the evolution of the state is now describedby an augmented ordinary differential equation (ODE),

x(t) = f (t, x(t), u(t)),

which includes the impact of the control. Through the choice of the con-trol variable, the decision maker may be able to steer the system fromthe given initial state x0 at time t0 to a desired final state xT at time T.For example, by charging a very low price and investing in a marketingcampaign, a firm might double its consumer base within 1 year.

If a given system can be steered from any x0 to any xT in finite timeusing an admissible control u(t), t ∈ [t0, T], it is said to be controllable.

Page 95: Optimal Control Theory With Applications in Economics

82 Chapter 3

Not every system is controllable, especially if the set of feasible controlsis constrained. For example, when a firm’s marketing budget is smalland a product’s elevated marginal cost imposes a high minimum price,it may well be impossible to double the current consume base withinany amount of time, especially if consumers are quick to discard theproduct. Exercise 3.1 discusses this example in greater detail. Section 3.2provides a characterization of controllability for linear systems, bothwith bounded and unbounded sets of feasible controls. Controllabilityproperties of general nonlinear systems are more difficult to establishand are therefore examined here using ad hoc techniques.

Beyond the question of the mere possibility of steering a system fromone given state to another, a decision maker is often interested in find-ing an input trajectory u(t), t ∈ [t0, T], so as to maximize the objectivefunctional

J(u) =∫ T

t0

h(t, x(t), u(t)) dt + S(T, x(T)),

where h(t, x(t), u(t)) is an instantaneous payoff and S(T, x(T)) is a termi-nal value, for instance, of selling the firm at time T (see remark 3.14).Section 3.3 discusses this simplest version of an optimal control problemand presents necessary as well as sufficient conditions for its solution.The maximum principle developed by Lev Pontryagin and his studentsprovides necessary conditions. The main idea of these conditions is thatthey define the joint evolution of the state x(t) and a co-state ψ(t) (alsoknown as adjoint variable), where the co-state represents the per-unitvalue (or shadow price) of the current velocity of the system at time t.Sufficient conditions, on the other hand, can be formulated in termsof the so-called Hamilton-Jacobi-Bellman equation, which describes thedecrease of the value function V(t, x) with respect to time t as the maxi-mum attainable sum of instantaneous payoff and the value of the currentvelocity of the system. The value function V(t, x) represents the optimalpayoff of the system over the interval [t, T] when started at the time-tstate x(t) = x. Because this payoff decreases to zero, or more precisely,to the system’s terminal value, it makes sense that at the optimum thedecrease of this value be as fast as possible. If a (necessarily unique)value function is found that solves the Hamilton-Jacobi-Bellman equa-tion, then it is also possible to derive from it the solution to the optimalcontrol problem.

Section 3.4 considers the general finite-horizon optimal control prob-lem with endpoint constraints and state-control constraints, which can

Page 96: Optimal Control Theory With Applications in Economics

Optimal Control Theory 83

be effectively dealt with using a more general version of the Pontryaginmaximum principle (PMP), the proof of which is discussed in section 3.6.Section 3.5 deals with optimal control problems where the decision hori-zon T goes to infinity. While one might argue that in practice it seemshard to imagine that the decision horizon would actually be infinity, set-ting T = ∞ removes end-of-horizon effects and thus expresses a goingconcern by the decision maker. For example, when considering the opti-mal control problem of fishing in a lake over a finite time horizon, then atthe end of that horizon the lake would usually contain no fish (providedit is cheap enough to catch the fish) because there is no future value ofconservation. An infinite-horizon version of this problem would resultin a solution that would most likely approach a steady state (also referredto as a turnpike) where the fish reproduction and the catch balance outso as to guarantee an optimal stream of catch. Determining such a steadystate is usually much easier than solving the entire optimal control prob-lem. Section 3.7 discusses the existence of solutions to optimal controlproblems, notably the Filippov existence theorem.

3.2 Control Systems

A control system � (figure 3.1) may be viewed as a relation between m+ l signals, whereby each signal is a real-valued function, defined ona common nonempty time set T ⊂ R.1 The first m signals, denotedby u1, . . . , um, are referred to as inputs (or controls), and the last l signals,denoted by y1, . . . , yl, are called outputs. Thus, a system� = {(u1, . . . , um;y1, . . . , yl)} specifies what (m + l )-tuples of signals are admissible. Notethat in principle a system need have neither inputs nor outputs.

Based on the theory developed in chapter 2, the focus here is oncontinuous-time control systems, where the time set T is a nontrivialinterval I ⊂ R, and where the relation between the m-dimensionalcontrol u = (u1, . . . , um) (with m ≥ 1) and the l-dimensional output y =(y1, . . . , yl) (with l ≥ 1) is given in state-space form,

x(t) = f (t, x(t), u(t)), (3.1)

y(t) = g(t, x(t), u(t)), (3.2)

for all t ∈ I. The continuous function f : I × X × U → Rn is referred toas the system function, and the continuous function g : I × X × U → Rl is

1. This definition includes discrete-time systems, where T ⊂ Z. Such discrete-time systemscan be obtained by sampling a continuous-time system at countably many time instants.The word relation is to be interpreted as follows: given m + l ≥ 0 signals, it can be decided(based on the relation) whether they are consistent with the system or not.

Page 97: Optimal Control Theory With Applications in Economics

84 Chapter 3

Input System Output

Figure 3.1A system with input and output.

called the output function. The nonempty convex open set X ⊂ Rn is thestate space, and the nonempty convex compact set U ⊂ Rm is the controlset (or control-constraint set). The special case where U is a singletonand g(t, x(t), u(t)) ≡ x(t) represents a control system without inputs andwith state output (see chapter 2).

The system equation (3.1) determines the evolution of the state x(t)at time t given a control u(t). The output equation (3.2) determines theoutput y(t) at time t as a function of the state x(t) and the control u(t).Fundamental for the analysis of systems are the notions of controllability,reachability, and observability. A system in state-space form is controllableif it can be steered from any state x ∈ X to any other state x ∈ X in finitetime. A system is said to be reachable if all its states can be reached froma particular state (say, the origin) in finite time, that is, if it is controllablefrom that state. A controllable system is reachable from any state. Asystem is observable if, when its output is recorded over some finite timeinterval, its initial state can be determined.

Remark 3.1 In this section the controllability and reachability proper-ties are particularly needed because the focus is on systems that can besteered from any given state to a particular optimal state. The observabil-ity property plays a lesser role here as it is trivially satisfied in economicsystems where the states are readily available and can therefore beconsidered as output. �

Example 3.1 (Linear Control System) A linear time-invariant2 controlsystem is of the form

x = Ax + Bu, (3.3)

y = Cx + Du, (3.4)

2. To avoid any possible confusion, in the context of control systems the term time-invariantis preferred to autonomous (which was used in chapter 2 for time-invariant ODEs). Thereason is that the term autonomous system is sometimes used in the literature for a controlsystem without inputs.

Page 98: Optimal Control Theory With Applications in Economics

Optimal Control Theory 85

where A ∈ Rn×n, B ∈ Rn×m, C ∈ Rl×n, and D ∈ Rl×m are given matri-ces. The question naturally arises, Under what conditions is this linearcontrol system controllable, reachable, and observable? Note that thenotions of controllability and reachability coincide for linear systemsbecause the set of states that can be reached in finite time, say, from theorigin, is a linear subspace of Rn. �

3.2.1 Linear Controllability and ObservabilityThe following result characterizes the controllability and observabilityof linear systems in terms of a simple algebraic condition.

Proposition 3.1 (Linear Controllability and Observability) Considera linear control system � described by (3.3)–(3.4). (1) � is controllableif and only if the controllability matrix R[A, B] = [B, AB, A2B, . . . , An−1B]has rank n. (2) � is observable if and only if the observability matrix

O[A, C] =

⎡⎢⎢⎢⎣

CCA

...CAn−1

⎤⎥⎥⎥⎦

has rank n.

Proof (1) First observe that for any row vector v ∈ Rn it is

veAtB =∞∑

k=0

vAkBtk

k! . (3.5)

Furthermore, by the Cayley-Hamilton theorem3 it is An = ∑n−1k=0 αkAk ,

where A0 = I, and where α0, . . . ,αn−1 are the coefficients of the charac-teristic polynomial of A, det (sI − A) = sn −∑n−1

k=0 αksk. Thus, any powerAk for k ≥ n can be written as a linear combination of A0, . . . , An−1. Onecan therefore conclude that

rank R[A, B] < n ⇔ ∃ v ∈ Rn : vAkB = 0, ∀ k ≥ 0. (3.6)

Let RT(x) ⊂ Rn be the set of all states that can be reached by the system�

in time T ≥ 0 starting from x, that is, for any x ∈ RT(x) there exists

3. The Cayley-Hamilton theorem states that every square matrix A ∈ Rn×n satisfies its own

characteristic equation, i.e., if p(λ) = det (A − λI) is its characteristic polynomial as afunction of λ, then p(A) = 0 (Hungerford 1974, 367).

Page 99: Optimal Control Theory With Applications in Economics

86 Chapter 3

a bounded measurable control u : [0, T] → Rm such that φu(T, x) = x,where φu denotes the flow of the system under the control u. Then

� is controllable ⇔ R1(0) = Rn. (3.7)

The last statement is true, since for any T > 0 the set RT(0) is a linearsubspace of the set R(0) of all states that are reachable from the originin any time, and the latter is equivalent to RT(0) = R(0) for all T > 0.Now the task is to establish the equivalence of the rank condition andcontrollability of �.

⇒: If � is not controllable, then by (3.7) there is a nonzero vector v ∈Rn such that v ⊥ R(0), that is, 〈v, x〉 = 0 for all x ∈ R(0). Consider thecontrol u(t) = B′eA′(1−t)v′ for all t ∈ [0, 1]. Then

0 = 〈v, x〉 =∫ 1

0〈v, eA(1−t)Bu(t)〉dt =

∫ 1

0〈v, eAsBu(1 − s)〉ds

=∫ 1

0〈v, eAsBB′eA′sv′〉ds =

∫ 1

0‖veAsB‖2

2ds,

so veAtB = 0 for all t ∈ [0, 1], which by virtue of (3.6) implies thatrank R[A, B] < n, that is, the rank condition fails.

⇐: If rank R[A, B] < n, then by (3.6) there exists a nonzero v ∈ Rn

such that vAkB = 0 for all k ≥ 0. Then, given any admissible control u(t)for t ∈ [0, 1], it is by (3.5)

〈v,φu(1, 0)〉 =⟨v,∫ 1

0eAtBu(1 − t) dt

⟩=∫ 1

0〈v, eAtBu(1 − t)〉dt = 0,

so v ⊥ R(0) �= Rn, that is, by (3.7) the system is not controllable.(2) Note first that (O[A, C])′ = R[A′, C′], namely, by part (1) the observ-

ability of � = (A, B) can be understood in terms of the controllability ofthe dual system �′ = (A′, C′). Let T > 0. Given an (n − 1)-times contin-uously differentiable control u : [0, T] → Rm, one can differentiate theoutput relation (3.4) successively to obtain that⎡⎢⎢⎢⎢⎢⎢⎣

yyy...

y(n−1)

⎤⎥⎥⎥⎥⎥⎥⎦ = O[A, C]x +

⎡⎢⎢⎢⎢⎢⎢⎣

D 0 · · · 0CB D 0 · · · 0

CAB CB D 0 · · · 0...

......

......

CAn−2B CAn−3B · · · CAB CB D

⎤⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎣

uuu...

u(n−1)

⎤⎥⎥⎥⎥⎥⎥⎦

Page 100: Optimal Control Theory With Applications in Economics

Optimal Control Theory 87

on the time interval [0, T]. The preceding relation can be used to inferthe evolution of the state x(t) on [0, T] if and only if O[A, C] has rank n(by using any left-inverse of O[A, C]). n

Example 3.2 (1) The linear control system (harmonic oscillator)

x1 = x2 + u,

x2 = − x1,

with A =[

0 1−1 0

]and B =

[10

]is controllable because R[A, B] =[

1 00 −1

]has full rank. (2) The linear control system

x1 = x1 + u,

x2 = x2,

with A = I and B =[

10

]is not controllable because

rank R[A, B] = rank[

1 10 0

]= 1 < n = 2.

The system in fact can be controlled only in the direction of x1, thatis, the set of all states that are reachable from the origin is R(0) =R × {0}. �

In general, the control u(t) can take on values only in the boundedcontrol set U ⊂ Rm, which may severely limit the set of states that canbe reached from the origin. The set of all U-controllable states is denotedby RU (0) ⊂ Rn.

Proposition 3.2 (Bounded Controllability) Let the control set U bebounded and such that it contains a neighborhood of the origin. Thenthe linear control system � described by (3.3)–(3.4) is controllable ifand only if the controllability matrix R[A, B] has rank n and the systemmatrix A has no eigenvalues with negative real part.

Proof See Sontag (1998, 117–122). n

The intuition for this result is that since RU (0) ⊂ R(0), the control-lability matrix must have full rank, as in proposition 3.1. Furthermore,if A has a negative eigenvalue, then there will be a system trajectory thatwould tend toward the origin if the applied control is too small, limiting

Page 101: Optimal Control Theory With Applications in Economics

88 Chapter 3

the reachability of states far enough away from the origin. To be able toreach those states, the system matrix cannot have asymptotically stablecomponents. On the other hand, if these two properties are satisfied,then any state in Rn can be reached using controls with values in thebounded set U .

Remark 3.2 (Output Controllability) A system � is called output control-lable if it is possible to steer it from any point y ∈ Rl in finite time to anyother point y ∈ Rl. In the case of the linear time-invariant system (3.3)–(3.4) with D = 0 it is clear, based on proposition 3.1, that output control-lability obtains if and only if the output controllability matrix CR[A, B] =[CB, CAB, . . . , CAn−1B] has rank l. �

3.2.2 Nonlinear ControllabilityThe question of whether a general nonlinear system � describedby (3.1)–(3.2) is controllable is difficult. Good answers are available onlywhen the system has a special structure, for example, when it is affinein the control. The tools needed for the corresponding analysis (e.g., Liealgebra and differential geometry) are beyond the scope of this book.Both Isidori (1995) and Sontag (1998) provide good introductory pre-sentations of this advanced topic, establishing local results, at least fortime-invariant, control-affine systems.

Remark 3.3 (Controllability of Time-Varying Systems) To extend the notionof controllability to time-varying systems, it is useful to think of events(t, x) instead of points in the state space. An event (t, x) can be controlledto an event (t, x) (with t < t) if (using a feasible control input) there is asystem trajectory x(s), s ∈ [t, t], such that x(t) = x and x(t) = x. Hence,the state x can be controlled to the state x if there exist t, T ≥ 0 such thatthe event (t, x) can be controlled to the event (t + T, x). �

3.3 Optimal Control—A Motivating Example

3.3.1 A Simple Optimal Control ProblemTo motivate the discussion of optimal control theory, first consider asimple finite-horizon dynamic optimization problem for a system withunconstrained states. Given an interval I = [t0, T] with the finite timehorizon T > t0 ≥ 0, and a system in state-space form with state output,that is, where y = x, a decision maker would like to find a bounded(measurable) control u : [t0, T] → U , where U is a nonempty convex

Page 102: Optimal Control Theory With Applications in Economics

Optimal Control Theory 89

compact control set, so as to maximize an objective (functional) of theform

J(u) =∫ T

t0

h(t, x(t), u(t)) dt, (3.8)

where the continuous function h : [t0, T] × X × U → R describes a deci-sion maker’s instantaneous benefit at time t, given the state x(t) and thecontrol u(t). The latter corresponds to an action taken by the decisionmaker at time t. The fact that there are no state constraints is taken intoaccount by the assumption that X = Rn.4

Given an initial state x0 ∈ X , the decision maker’s optimal controlproblem (OCP) can therefore be written in the form

J(u) −→ maxu(·)

, (3.9)

x(t) = f (t, x(t), u(t)), (3.10)

x(t0) = x0, (3.11)

u(t) ∈ U , (3.12)

for all t ∈ [t0, T]. In other words, the optimal control problem consistsin maximizing the objective functional in (3.9) (defined by (3.8)), sub-ject to the state equation (3.10), the initial condition (3.11), the controlconstraint (3.12), and possibly an endpoint constraint,

x(T) = xT , (3.13)

for a given xT ∈ X , which is relevant in many practical situations.

3.3.2 Sufficient Optimality ConditionsTo derive sufficient optimality conditions for the optimal control prob-lem (3.9)–(3.12), note first that given any continuously differentiablefunction V : [t0, T] × X → R, which satisfies the boundary condition

V(T, x) = 0, ∀ x ∈ X , (3.14)

it is possible to replace h(t, x, u) by

h(t, x, u) = h(t, x, u) + V(t, x) = h(t, x, u) + Vt(t, x) + 〈Vx(t, x), f (t, x, u)〉,

4. A weaker assumption is to require that the state space X be invariant given any admis-sible control input u, which includes the special case where X = R

n.

Page 103: Optimal Control Theory With Applications in Economics

90 Chapter 3

without changing the problem solution. This is true because thecorresponding objective functional,

J(u) =∫ T

t0

h(t, x(t), u(t)) dt =∫ T

t0

h(t, x(t), u(t)) dt + V(T, x(T)) − V(t0, x0)

= J(u) − V(t0, x0), (3.15)

is, up to the constant V(t0,x0), identical to the original objective func-tional J(u). Moreover, if the function V is such that, in addition to theboundary condition (3.14), it satisfies the so-called Hamilton-Jacobi-Bell-man (HJB) inequality

0 ≥ h(t, x, u) + V(t, x) = h(t, x, u) + Vt(t, x) + 〈Vx(t, x), f (t, x, u)〉for all (t, x, u) ∈ [t0, T] × X × U , then by integration

0 ≥ J(u) = J(u) − V(t0, x0)

for any admissible control u.5 Hence, the constant V(t0, x0) is an upperbound for the objective functional J(u), which, when attained forsome admissible control trajectory u∗(t), t ∈ [t0, T], would imply theoptimality of that trajectory. The optimal control u∗ then renders theHJB inequality binding, maximizing its right-hand side. The followingsufficient optimality condition has therefore been established.

Proposition 3.3 (HJB Equation) (1) Let V : [t0, T] × X → R be a con-tinuously differentiable function such that it satisfies the HJB equation

− Vt(t, x) = maxu∈U

{h(t, x, u) + 〈Vx(t, x), f (t, x, u)〉},∀ (t, x) ∈ [t0, T] × X , (3.16)

together with the boundary condition (3.14). Take any measurable feed-back law μ : [t0, T] × X such that

μ(t, x) ∈ arg maxu∈U

{h(t, x, u) + 〈Vx(t, x), f (t, x, u)〉},

∀ (t, x) ∈ [t0, T] × X , (3.17)

and let x∗(t) be a solution to the corresponding initial value problem(IVP)

x∗(t) = f (t, x∗(t),μ(t, x∗(t))), x(t0) = x0, (3.18)

5. The term admissible control is defined in assumption A2 in section 3.4.

Page 104: Optimal Control Theory With Applications in Economics

Optimal Control Theory 91

(a) (b)

Figure 3.2The boundary condition (3.14) for the HJB equation can be either (a) active, when theterminal state x(T) is free, or (b) inactive, in the presence of the endpoint constraint (3.13).

for all t ∈ [t0, T]. Then u∗(t) = μ(t, x∗(t)), t ∈ [t0, T], is a solution to theoptimal control problem (3.9)–(3.12), and

V(t, x∗(t)) =∫ T

th(s, x∗(s), u∗(s)) ds, ∀ t ∈ [t0, T], (3.19)

with V(t0, x0) = J(u∗) its (unique) optimal value for any initial data (t0, x0)∈ [0, T] × X . (2) If a continuously differentiable function V : [t0, T] ×X → R solves the HJB equation (3.16) (without boundary condition),the state-control trajectory (x∗(t), u∗(t)), determined by (3.17)–(3.18)with u∗(t) = μ(t, x∗(t)), subject to(3.13), solves the endpoint-constrainedoptimal control problem (3.9)–(3.13).

The intuition for why the boundary condition (3.14) becomes inactivein the presence of the endpoint constraint (3.13) is that the endpointconstraint forces all trajectories to the given point xT at the end of thehorizon. This is illustrated in figure 3.2.

Remark 3.4 (Uniqueness of Solution to HJB Equation) Any solution V(t, x)to the HJB equation (3.16) with boundary condition (3.14) is unique on [t0, T] ×X . To see this, consider without loss of generality the case where t0 = 0.This implies by proposition 3.3 that V(τ , ξ ) is the optimal value of anoptimal control problem (of the type (3.9)–(3.12)) with initial data (τ , ξ ) ∈(0, T] × X . Since any such value is unique and V is continuous on [0, T] ×X , the function V(t, x) is uniquely determined on [0, T] × X . �

Page 105: Optimal Control Theory With Applications in Economics

92 Chapter 3

Remark 3.5 (Principle of Optimality) The HJB equation (3.16) impliesthat the optimal policy u∗(t) = μ(t, x∗(t)) does not depend on how aparticular state x∗(t) was reached; it requires only that all subsequentdecisions be optimal. This means that an optimal policy is fundamen-tally obtained by backward induction, starting at the end of the timehorizon where value of decision vanishes and the boundary condi-tion (3.14) holds. The principle of optimality is usually attributed toBellman (1957, ch. III.3). The idea of backward induction was mentionedearlier (see chapter 1, footnote 14). �

Remark 3.6 (HJB Equation with Discounting and Salvage Value) Let t0 =0.6 If the objective functional in (3.8) is replaced with the seeminglymore general

J(u) =∫ T

0e−rth(t, x(t), u(t)) dt + e−rTS(x(T)),

where r ≥ 0 is a given discount rate and S : X → R is a continuouslydifferentiable function that can be interpreted as the terminal value (or sal-vage value) of the state at the end of the time horizon, then proposition 3.3can still be used to obtain a sufficient optimality condition.

Let

h(t, x, u) = e−rth(t, x, u) + 〈e−rTSx(x), f (t, x, u)〉;the resulting objective functional J differs from the original objectivefunctional J only by a constant:

J(u) =∫ T

0h(t, z(t), u(t)) dt =

∫ T

0

(e−rth(t, x(t), u(t)) + e−rT d

dtS(x(t))

)dt

= J(u) − e−rTS(x0),

so the optimization problem remains unchanged when substituting hfor h. By Proposition 3.3 the HJB equation becomes

−Vt(t, x) = maxu∈U

{h(t, x, u) + 〈Vx(t, x), f (t, x, u)〉}

= maxu∈U

{e−rth(t, x, u) + 〈Vx(t, x) + e−rTSx(x), f (t, x, u)〉},

for all (t, x) ∈ [0, T] × X , with boundary condition V(T, x) = 0 for allx ∈ X . Multiplying the HJB equation by ert on both sides and setting

6. In the case where t0 �= 0 it is enough to multiply all objective values by ert0 .

Page 106: Optimal Control Theory With Applications in Economics

Optimal Control Theory 93

V(t, x) ≡ ert(V(t, x) + e−rTS(x)) yields the following HJB equation withdiscounting and salvage value:

rV(t, x) − Vt(t, x) = maxu∈U

{h(t, x, u) + 〈Vx(t, x), f (t, x, u)〉}, (3.20)

for all (t, x) ∈ [0, T] × X , with boundary condition

V(T, x) = S(x), ∀ x ∈ X . (3.21)

Note that along an optimal state-control trajectory (x∗(t), u∗(t)),

V(t, x∗(t)) =∫ T

te−r(s−t)h(s, x∗(s), u∗(s)) ds + e−r(T−t)S(x∗(T)),

for all t ∈ [0, T], so V(t, x∗(t)) is the optimal discounted time-t value. �

Example 3.3 (Linear-Quadratic Regulator) Consider the problem of steer-ing a linear time-variant system,

x = f (t, x, u) = A(t)x + B(t)u,

where A(t), B(t) are continuous bounded matrix functions, with valuesin Rn×n and Rn×m, respectively, from a given initial state x(t0) = x0 ∈ Rn

(for some t0 ≥ 0) over the finite time horizon T > t0 so as to maximizethe discounted (with rate r ≥ 0) quadratic objective functional

e−rt0J(u) = −∫ T

t0

e−rt (x′(t)R(t)x(t) + u′(t)S(t)u(t))

dt − e−rTx′(T)K x(T),

where R(t) and S(t) are continuous bounded matrix functions, withsymmetric positive definite values in Rn×n and Rm×m, respectively. Theterminal-cost matrix K ∈ Rn×n is symmetric positive definite. Assumingthat the convex compact control set U is large enough (so as to allowfor an unconstrained maximization), the corresponding optimal controlproblem is of the form (3.9)–(3.12), and the HJB equation (3.20) becomes

rV(t, x) − Vt(t, x) = maxu∈U

{−x′R(t)x − u′S(t)u + 〈Vx(t, x), A(t)x + B(t)u〉},

for all (t, x) ∈ [t0, T] × Rn. Recall that in the analysis of the stability prop-erties of linear time-variant systems, a quadratic Lyapunov functionproved to be useful (see example 2.13). It turns out that essentially thesame functional form can be used to solve the HJB equation for thelinear-quadratic regulator problem. Let

V(t, x) = −x′Q(t)x,

Page 107: Optimal Control Theory With Applications in Economics

94 Chapter 3

for all (t, x) ∈ [t0, T] × Rn, where Q(t) is a continuously differentiablematrix function with symmetric positive definite values in Rn×n. Sub-stituting V(t, x) into the HJB equation yields

x′Qx = maxu∈U

{x′ (rQ − QA(t) − A′(t)Q − R(t))

x − u′S(t)u − 2x′QB(t)u},

for all (t, x) ∈ [t0, T] × Rn. Performing the maximization on the right-hand side provides the optimal control in terms of a linear feedback law,

μ(t, x) = −S−1(t)B′(t)Q(t)x, ∀ (t, x) ∈ [t0, T] × Rn,

so that the HJB equation is satisfied if and only if Q(t) solves the Riccatidifferential equation

Q − rQ + QA(t) + A′(t)Q + QB(t)S−1(t)B′(t)Q = −R(t), (3.22)

for all t ∈ [t0, T]. The boundary condition (3.14) is satisfied if V(T, x) =−x′Q(T)x = −x′Kx for all x ∈ Rn, or equivalently, if

Q(T) = K. (3.23)

As shown in chapter 2, the solution to this Riccati IVP can be written inthe form Q(t) = Z(t)Y−1(t), where the matrices Y(t), Z(t), t ∈ [0, T], solvethe linear IVP[

YZ

]=[

A(t) − (r/2)I B(t)S−1(t)B′(t)−R(t) −A′(t) + (r/2)I

] [YZ

],[

Y(T)Z(T)

]=[

IK

].

As an illustration consider the special case where the system is one-dimensional (with m = n = 1) and time-invariant, and the cost is alsotime-invariant (except for the discounting), so that A = α < 0, B =β �= 0, R = ρ > 0, S = σ > ρ(β/α)2, and K = k = 0. The correspondingRiccati IVP becomes

q + 2aq + bq2 = −ρ, q(T) = k,

for all t ∈ [t0, T], where a = α− (r/2) < 0 and b = β2/σ > 0. Thus, settingq(t) = z(t)/y(t) leads to the linear IVP[

yz

]=[

a b−ρ −a

] [yz

],

[y(T)z(T)

]=[

1k

],

on the time interval [t0, T], which has the unique solution[y(t)z(t)

]= exp

([a b

−ρ −a

](T − t)

)[1k

].

Page 108: Optimal Control Theory With Applications in Economics

Optimal Control Theory 95

Using the abbreviation κ = √a2 − ρb for the absolute value of the

eigenvalues of the system and the fact that by assumption k = 0 yields

q(t) = z(t)y(t)

=(

ρ

κ − a

)1 − e−2κ(T−t)

1 + ( κ+aκ−a

)e−2κ(T−t)

,

for all t ∈ [t0, T]. Therefore, the optimal state feedback is μ(t, x) =−(β/σ )q(t)x, and in turn the optimal state trajectory becomes

x∗(t) = x0 exp(α(t − t0) − β

σ

∫ t

t0

q(s) ds)

,

so the optimal control trajectory is

u∗(t) = μ(t, x∗(t)) = −x0

(β q(t)σ

)exp

(α(t − t0) − β

σ

∫ t

t0

q(s) ds)

,

for all t ∈ [t0, T]. �

3.3.3 Necessary Optimality ConditionsIn principle, the HJB equation in proposition 3.3 requires too much inthat it must hold for all (t, x) ∈ [t0, T] × X . But it is clear that for optimal-ity it needs to hold only in a neighborhood of an optimal trajectory x∗(t),t ∈ [t0, T].7 This is illustrated in figure 3.3. This section examines thelocal properties of the HJB equation in a neighborhood of a solution tothe optimal control problem (3.9)–(3.12) in order to derive the necessaryoptimality conditions.

Differentiating the HJB equation (3.16) with respect to x the envelopetheorem (proposition A.15 in appendix A) yields that

0 = hx(t, x,μ(t, x)) + Vtx(t, x) + Vxx(t, x)f (t, x,μ(t, x))

+ Vx(t, x)fx(t, x,μ(t, x))

= hx(t, x,μ(t, x)) + Vx(t, x) + Vx(t, x)fx(t, x,μ(t, x)), (3.24)

for all (t, x) ∈ [t0, T] × X . Similarly, differentiating (3.16) with respect to tand applying the envelope theorem once more gives

0 = ht(t, x,μ(t, x)) + Vtt(t, x) + 〈Vtx(t, x), f (t, x,μ(t, x))〉+ 〈Vx(t, x), ft(t, x,μ(t, x))〉

= ht(t, x,μ(t, x)) + Vt(t, x) + 〈Vx(t, x), ft(t, x,μ(t, x))〉, (3.25)

7. The fact that the HJB equation needs to hold on an optimal state trajectory x∗(t), t ∈[0, T] follows immediately by substituting the optimal value-to-go V(t, x∗(t)) as definedin equation (3.19).

Page 109: Optimal Control Theory With Applications in Economics

96 Chapter 3

Figure 3.3Development of necessary optimality conditions, such as transversality, in the neighbor-hood of an optimal state trajectory.

for all (t, x) ∈ [t0, T] × X . To formulate necessary optimality condi-tions it is convenient to introduce the so-called Hamilton-Pontryaginfunction (or Hamiltonian, for short) H : [t0, T] × X × U × Rn → R suchthat

(t, x, u,ψ) �→ H(t, x, u,ψ) = h(t, x, u) + 〈ψ , f (t, x, u)〉for all (t, x, u,ψ) ∈ [t0, T] × X × U × Rn.

Consider now an optimal state-control tuple (x∗(t), u∗(t)) on [t0, T], andintroduce the adjoint variables ψ(t) = Vx(t, x∗(t)) and ψ0(t) = Vt(t, x∗(t)).Then equation (3.24) implies the adjoint equation

ψ(t) = −Hx(t, x∗(t), u∗(t),ψ(t)), ∀ t ∈ [t0, T], (3.26)

and the boundary condition (3.14) gives the transversality condition

ψ(T) = 0. (3.27)

The maximization in (3.17) implies the maximality condition

u∗(t) ∈ arg maxu∈U

H(t, x∗(t), u,ψ(t)), ∀ t ∈ [t0, T]. (3.28)

Page 110: Optimal Control Theory With Applications in Economics

Optimal Control Theory 97

Finally, equation (3.25), in which −ψ0(t) by virtue of the HJB equa-tion (3.16) is equal to the maximized Hamiltonian

H∗(t) = −Vt(t, x∗(t)) = H(t, x∗(t), u∗(t),ψ(t)),

gives the envelope condition

H∗(t) = Ht(t, x∗(t), u∗(t),ψ(t)), ∀ t ∈ [t0, T]. (3.29)

Provided that a continuously differentiable solution to the HJB equa-tion (3.16) with boundary condition (3.14) exists, the following resulthas been established.8

Proposition 3.4 (Pontryagin Maximum Principle) (1) Given an admis-sible state-control trajectory (x∗(t), u∗(t)), t ∈ [t0, T], that solves the opti-mal control problem (3.9)–(3.12), there exists an absolutely continuousfunction ψ : [t0, T] → Rn such that conditions (3.26)–(3.29) are satisfied.(2) If (x∗(t), u∗(t)), t ∈ [t0, T], is an admissible state-control trajectory thatsolves the endpoint-constrained optimal control problem (3.9)–(3.13),then there exists an absolutely continuous functionψ : [t0, T] → Rn suchthat conditions (3.26) and (3.28)–(3.29) are satisfied.

Remark 3.7 (Endpoint Constraints) Analogous to the brief discussionafter proposition 3.3 (see figure 3.2), the endpoint constraint (3.13) affectsthe transversality condition. Without endpoint constraint, one findsfrom the HJB boundary condition (3.14) that ψ(T) = Vx(x∗(T), T) = 0,which corresponds to the transversality condition (3.27). The endpointconstraint (3.13), however, frees up V(x, T) for all x �= xT , so that thetransversality condition Vx(x, T) = 0 is no longer valid, and consequ-ently ψ(T) does not have to vanish. This insight applies to each con-strained component xi(T) of x(T) and the corresponding componentψi(T) of ψ(T), for i ∈ {1, . . . , n}. Furthermore, inequality constraints onxi(T) translate to inequality constraints onψi(T). Section 3.4 discusses thegeneral finite-horizon optimal control problem and formulates the cor-responding necessary optimality conditions provided by the PMP. �

Example 3.4 (Optimal Consumption) Consider an investor who, at timet = 0, is endowed with an initial capital of x(0) = x0 > 0. At any time t ∈[0, T] (where T > 0 is given) he decides about his rate of consumption

8. The precise proof of the PMP is more complicated; see section 3.6 and some furtherremarks in section 3.8. The informal derivation given here provides the full intuition butmay be technically incorrect if the value function V(t, x) is not continuously differentiable(which is possible, even for problems with smooth primitives).

Page 111: Optimal Control Theory With Applications in Economics

98 Chapter 3

c(t) ∈ [0, c], where c > 0 is a large maximum allowable rate of consump-tion.9 Thus, his capital stock evolves according to

x = αx − c(t),

where α > 0 is a given rate of return. The investor’s time-t utility forconsuming at a rate c(t) is U(c(t)), where U : R+ → R is his increasing,strictly concave utility function. The investor’s problem is to find a con-sumption plan c(t), t ∈ [0, T], so as to maximize his discounted utility

J(c) =∫ T

0e−rtU(c(t)) dt,

where r ≥ 0 is a given discount rate, subject to the solvency constraintthat the capital stock x(t) must stay positive for all t ∈ [0, T).10 To dealwith the solvency constraint, first observe that any optimal state trajec-tory x∗(t) takes on its smallest value xT at the end of the horizon, that is,xT = min{x∗(t) : t ∈ [0, T]}. Consuming at the maximum rate c impliesthat x(t) = x0 − (c/α)(1 − e−αt), whence it is possible to reach x(T) = 0within the given time horizon if and only if c ≥ αx0/(1 − e−αT). Assumethat the last inequality holds (otherwise c∗(t) ≡ c is the unique optimalconsumption plan and x∗(t) = x0 − (c/α)

(1 − e−αt

) ≥ 0 on [0, T]) and thatthe investor maximizes his discounted utility subject to the endpointconstraint x(T) = 0. He therefore faces an endpoint-constrained optimalcontrol problem, of the form (3.9)–(3.13). The PMP in proposition 3.4(2)provides the corresponding necessary optimality conditions. Giventhe Hamiltonian H(t, x, c,ψ) = e−rtU(c) +ψ (αx − c) on an optimal state-control trajectory (x∗(t), c∗(t)), t ∈ [0, T], the adjoint equation (3.26)becomes

ψ(t) = −Hx(t, x∗(t), c∗(t),ψ(t)) = −αψ(t), t ∈ [0, T],so ψ(t) = ψ0e−αt, where ψ0 = ψ(0) > 0 is an initial value that will bedetermined by the endpoint constraint. Positivity of ψ0 is consistentwith the maximality condition (3.28), which yields

c∗(t;ψ0) = min{U−1c (ψ0e−(α−r)t), c} ∈ arg max

c∈[0,c]{e−rtU(c) −ψ0e−αtc},

9. One practical reason for such an upper bound may be that authorities tend to investigatepersons believed to be living beyond their means, relative to declared income.10. The solvency constraint is used to simplify the problem so that it can be tackled withthe optimality conditions already discussed. When the solvency constraint is relaxed tothe nonnegativity constraint x(t) ≥ 0 for all t ∈ [0, T], then it may be optimal to consumeall available capital by the time T < T, so the spending horizon T ∈ [0, T] becomes subjectto optimization as well.

Page 112: Optimal Control Theory With Applications in Economics

Optimal Control Theory 99

for all t ∈ [0, T]. Based on the last optimality condition the Cauchyformula in proposition 2.1 provides the optimal state trajectory,11

x∗(t;ψ0) = x0eαt −∫ t

0eα(t−s)c∗(s;ψ0) ds, ∀ t ∈ [0, T].

The missing constant ψ0 is determined by the endpoint con-straint x∗(T;ψ0) = 0, namely,

x0 =∫ T

0e−αt min{U−1

c (ψ0e−(α−r)t), c}dt

=

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

(1 − e−ατ )+ ∫ T

τ

e−αtU−1c (ψ0e−(α−r)t) dt if α < r,

U−1c (ψ0)(1 − e−αT)/α if α = r,∫ τ

0e−αtU−1

c (ψ0e−(α−r)t) dt + cα

(e−ατ − e−αT) otherwise,

(3.30)

where the switching time for α �= r,

τ = min

{T,

[ln(ψ0/Uc(c)

)α− r

]+

},

depends on ψ0 and lies in (0, T) if ψ0, as a solution to (3.30), isstrictly between Uc(c) and Uc(c)e(α−r)T . If τ ∈ {0, T}, then the solutionis interior in the sense that the control constraint c∗(t) ≤ c is automati-cally satisfied; in other words, the optimal control remains unchangedwhen the constraint is relaxed. In the special case where α = r, oneobtains from (3.30) that ψ0 = Uc

(αx0/(1 − e−αT)

)> Uc(c), which yields

the optimal state-control trajectory (x∗, c∗) with

c∗(t) = αx0

1 − e−αTand x∗(t) = 1 − e−α(T−t)

eαT − 1x0,

11. Implicitly assume here that x∗(t) > 0 for all t ∈ [0, T). If, contrary to this assumption,it is best for the investor to consume all his wealth before the end of the horizon, thennecessarily the utility of consuming at a rate of zero must be finite, i.e., U(0) > −∞.By considering U(c) = U(c) − U(0) instead of U(c), and solving the investor’s problemover the interval [0, T] instead of [0, T], with a variable horizon T ∈ [0, T] subject to opti-mization, it is possible to leave all arguments (and assumptions) of the analysis in place(with T replaced by T) and to simply add an additional optimality constraint for T. Thecorresponding optimal control problem with general endpoint constraints is discussed insection 3.4. The resulting additional optimality condition amounts to requiring that theHamiltonian vanish at t = T∗ if T∗ ∈ (0, T) is an optimal (interior) value for T.

Page 113: Optimal Control Theory With Applications in Economics

100 Chapter 3

for all t ∈ [0, T]. Interior solutions to the investor’s optimal consumptionproblem for α �= r are discussed in example 3.5. �

Remark 3.8 (Calculus of Variations) The PMP can be used to reestablishwell-known classical optimality conditions in the following simplestproblem of the calculus of variations,

∫ T

0F(t, x(t), x(t)) dt −→ max

x(·), (3.31)

subject to x(0) = x0 and x(T) = xT , where boundary points x0, xT ∈ Rn

and the time horizon T > 0 are given. The function F : R1+2n → R isassumed twice continuously differentiable, bounded, and strictly con-cave in x. Introducing the control u = x on [0, T], by setting h(t, x, u) ≡F(t, x, u) and f (t, x, u) ≡ u one obtains an endpoint-constrained opti-mal control problem of the form (3.9)–(3.13), where the control con-straint (3.12) is assumed inactive at the optimum. The Hamilton-Pontryagin function is

H(t, x, u,ψ) = F(t, x, u) + 〈ψ , u〉,so along an optimal state-control trajectory (x∗(t), u∗(t)), t ∈ [0, T], theadjoint equation (3.26) becomes

ψ(t) = −Fx(t, x∗(t), u∗(t)), ∀ t ∈ [0, T],and the maximality condition (3.28) yields

Fx(t, x∗(t), u∗(t)) +ψ(t) = 0, ∀ t ∈ [0, T].Differentiating the last relation with respect to time gives the Euler equa-tion along an optimal solution x∗(t), t ∈ [0, T], of the initial problem,

Fx(t, x∗(t), x∗(t)) = ddt

Fx(t, x∗(t), x∗(t)), ∀ t ∈ [0, T], (3.32)

provided the time-derivative x∗(t) = u∗(t) is absolutely continuous. TheEuler equation in its integral form,

Fx(t, x∗(t), x∗(t)) =∫ t

0Fx(s, x∗(s), x∗(s)) ds + C, ∀ t ∈ [0, T], (3.33)

where C is an appropriate constant, is often referred to as the DuBois-Reymond equation and is valid even when x∗(t) = u∗(t) exhibits jumps. �

Page 114: Optimal Control Theory With Applications in Economics

Optimal Control Theory 101

Figure 3.4Geometric intuition for a (maximized) Hamiltonian (see remark 3.9).

Remark 3.9 (Geometric Interpretation of Hamiltonian) Building on thecalculus-of-variations problem (3.31), consider the Lagrange problem,

−∫ T

0F(t, x(t), x(t)) dt −→ max

x(·),

with the initial condition x(0) = x0, subject to the state equation x =f (t, x, u) and the control constraint x = u ∈ U , where U ⊂ R is a non-empty convex compact control set. Denote by

F∗(t, x, v) = infu∈U

{F(t, x, u) : v = f (t, x, u)}

the lower envelope of F with respect to the feasible slopes of the statetrajectory. The corresponding (maximized) Hamiltonian is then

H∗(t, x,ψ) = maxu∈U

{〈ψ , f (t, x, u)〉 − F(t, x, u)} = supv∈Rn

{〈ψ , v〉 − F∗(t, x, v)}.

The right-hand side of the last relation is also called the dual (or Young-Fenchel transform) of F∗ (see, e.g., Luenberger 1969; Magaril-Il’yaev andTikhomirov 2003). Figure 3.4 provides the geometric intuition (with v∗ =f (t, x, u∗)). Thus, the maximized Hamiltonian H∗ can be interpreted asthe dual of F∗.12 �

Example 3.5 (Optimal Consumption, Revisited) To see the use of the Eulerequation introduced in remark 3.8, consider interior solutions to theoptimal consumption problem discussed in example 3.4. Indeed, the

12. Gamkrelidze (2008) provides another interesting perspective on this duality relation.

Page 115: Optimal Control Theory With Applications in Economics

102 Chapter 3

investor’s endpoint-constrained problem of maximizing his discountedutility can be rewritten in the form (3.31) for

F(t, x, x) = e−rtU(αx − x)

with the constraints x(0) = x0 and x(T) = 0. The Euler equation (3.32)becomes

αe−rtUc(c(t)) = − ddt

[e−rtUc(c(t))] = re−rtUc(c(t)) − e−rtUcc(c(t)) c(t),

for all t ∈ [0, T], provided that c(t) is interior, namely, that 0 < c(t) < calmost everywhere. This optimality condition can also be written as anautonomous ODE, in the form

ρA(c) c = −Ucc(c)Uc(c)

c = α− r,

where ρA(c) = −Ucc(c)/Uc(c) is the investor’s (Arrow-Pratt) coefficient ofabsolute risk aversion. This condition means that it is optimal for theinvestor to consume such that the (absolute) growth, c, of the opti-mal consumption path is equal to the ratio of excess return, α− r, andthe investor’s absolute risk aversion, ρA(c) > 0. Thus, with the strictlyincreasing first integral of absolute risk aversion,

RA(c; c0) =∫ c

c0

ρA(ζ ) dζ ,

for c ∈ (0, c) and some reference level c0, the inverse R−1A (· ; c0) exists, and

the optimal consumption path becomes

c∗(t; c0) = R−1A ((α− r)t; c0),

where c0 = c∗(0) ∈ [0, c] is a constant that can be determined from thezero-capital endpoint constraint,

x0 =∫ T

0e−αtc∗(t; c0) ds.

For example, if the investor has constant absolute risk aversion (CARA)so that ρA(c) ≡ ρ > 0, then RA(c; c0) = ρ(c − c0) and c∗(t; c0) = c0 + (α−r)(t/ρ), that is, the optimal consumption path is linearly increasing(resp. decreasing) if the excess return α− r is positive (resp. negative).If the investor has constant relative risk aversion (CRRA) so that cρA(c) ≡

Page 116: Optimal Control Theory With Applications in Economics

Optimal Control Theory 103

ρR > 0, then RA(c; c0) = ρR ln (c/c0) and c∗(t; c0) = c0 exp [(α− r)(t/ρR)],that is, the optimal consumption path is exponentially increasing(resp. decreasing) if the excess return α−r is positive (resp. nega-tive).13 �

3.4 Finite-Horizon Optimal Control

Consider the following general finite-horizon optimal control problem:

J(u,ω) =∫ T

t0

h(t, x(t), u(t)) dt + K0(ω) −→ maxu(·),ω

, (3.34)

x(t) = f (t, x(t), u(t)), x(t0) = x0, x(T) = xT , (3.35)

K1(ω) ≥ 0, K2(ω) = 0, (3.36)

R(t, x, u1) ≥ 0, (3.37)

u = (u1, u2), u2(t) ∈ U2(t) ∀t, (3.38)

t ∈ [t0, T], t0 < T, (3.39)

where ω = (t0, x0; T, xT) is the vector of endpoint data, x = (x1, . . . , xn)with values in Rn is the state variable, and u = (u1, . . . , um) with valuesin Rm is the control variable. The control variable u is represented as atuple of the form14

u = (u1, u2),

where u1 = (u1, . . . , um1 ) and u2 = (um1+1, . . . , um) with m1 ∈ {1, . . . , m}.The first control component, u1, appears in the state-control con-straint (3.37), and the second control component, u2, satisfies thegeometric control constraint (3.38). The general optimal control problem(3.35)–(3.39) is considered under the following assumptions.

A1. The functions h : R1+n+m → R, f : R1+n+m → Rn, R : R1+n+m1 → RkR

are continuously differentiable with respect to (x, u) (resp.(x, u1)) for a.a.(almost all) t, and measurable in t for any (x, u), where kR = dim(R).On any bounded set these functions and their partial derivatives with

13. A CARA investor has a utility function of the form U(c) = −e−ρc, whereas a CRRAinvestor has a utility function of the form U(c) = c1−ρR/(1 − ρR) forρR �= 1 and U(c) = ln (c)for ρR = 1. In either case, the investor’s utility function is determined only up to a positiveaffine transformation αU(c) +β with α > 0 and β ∈ R.14. This division of the control variable into two vectors goes back to a seminal paper byDubovitskii and Milyutin (1965).

Page 117: Optimal Control Theory With Applications in Economics

104 Chapter 3

respect to (x, u) (resp. (x, u1)) are bounded and continuous, uniformly in(t, x, u).

A2. The set of admissible controls is

{u( · ) = (u1( · ), u2( · )) ∈ L∞ : u2(t) ∈ U2(t) ∀ t},where t �→ U2(t) ⊆ Rm−m1 is a measurable set-valued mapping suchthat U2(t) �= ∅ for all t.15

A3. The functions Kj : R2(1+n) → Rkj , for j ∈ {0, 1, 2}, are continuously

differentiable, where kj = dim(Kj).

A4. The endpoint constraints (3.36) are regular, that is, for any ω =(t0, x0; T, xT) satisfying (3.36) it is rank(K2

ω(ω)) = dim (K2) and thereexists ω = (t0, x0; T, xT) ∈ R2(1+n) such that

K1i (ω) = 0 ⇒

⟨ω,∂K1

i

∂ω(ω)

⟩> 0, i ∈ {1, . . . , k1},

and

∂K2i (ω)∂ω

ω = 0, i ∈ {1, . . . , k2}.A5. The state-control constraints (3.37) are regular in the sense thatfor any c > 0 there exists δ > 0 such that for any x, u1 and a.a. t forwhich

‖x‖ ≤ c, ‖u1‖ ≤ c, |t| ≤ c, Rj(t, x, u1) ≥ −δ, j ∈ {1, . . . , kR},there exists v = v(t, x, u1) ∈ Rm1 with ‖v‖ ≤ 1 satisfying

Rj(t, x, u1) ≤ δ ⇒⟨v,∂Rj

∂u1 (t, x, u1)⟩

≥ δ, j ∈ {1, . . . , kR}.

Remark 3.10 If R and the gradient Ru1 are continuously differentiable,the condition in A5 can be reformulated as follows: if (t, x, u1) satis-fies (3.37), then the set of vectors{∂Rj

∂u1 (t, x, u1) : Rj(t, x, u1) = 0, j ∈ {1, . . . , kR}}

15. A measurable function σ ( · ) is called a measurable selector of the set-valued map-ping U2( · ) if σ (t) ∈ U2(t) for a.a. t. A set-valued mapping U ( · ) is called measurable ifthere exists a sequence of measurable selectors {σ k( · )}k∈N such that the set {σ k(t)}k∈N iseverywhere dense in U (t) for a.a. t. A set S is dense in U (t) if every open set that containsa point of U (t) has a nonempty intersection with S.

Page 118: Optimal Control Theory With Applications in Economics

Optimal Control Theory 105

is positive-linearly independent, that is, there exists v(t) ∈ Rm1 suchthat

Rj(t, x, u1) = 0 ⇒⟨v(t),

∂Rj

∂u1 (t, x, u1)⟩> 0, j ∈ {1, . . . , kR},

for (almost) all t.

The optimal control problem (3.34)–(3.39) is very general, except forthe possibility of state constraints (see remark 3.12). AssumptionsA1–A3ensure the regularity of the primitives of the problem. Assumptions A4and A5, also referred to as Mangasarian-Fromovitz conditions, guaranteethat the endpoint and state-control constraints (3.36)–(3.37) are regu-lar in the sense that whenever a constraint is binding, the gradient ofthe relevant constraint function with respect to the decision variablesis nonzero. As a result, the decision maker is never indifferent aboutthe choice of control or endpoints whenever the constraints are bind-ing. Note that when endpoint constraints are imposed, one implicitlyassumes that the underlying dynamic system is sufficiently controllableover the available time horizon (see the discussion of controllabilityissues in section 3.2).

Remark 3.11 (Lagrange, Mayer, and Bolza Problems) Depending on theform of the objective functional in (3.34), one may distinguish severalcases: the Lagrange problem when K0 = 0, the Mayer problem when h = 0,and the Bolza problem in the general (mixed) case. �

Remark 3.12 (State Constraints) This book intentionally avoids stateconstraints because reasonable economic problems can usually be for-mulated without them. Some state constraints appear because theydescribe undesirable possibilities. For instance, an investor’s accountbalance may be constrained to be nonnegative in order to ensure liquid-ity (see example 3.4). In that situation, it is typically possible to toleratea violation of the constraint at an appropriate cost: the investor maybe able to borrow additional funds at higher interest rates. Those rateswould realistically escalate as the borrowed amount increases, whicheffectively replaces the state constraint with a barrier that imposes acost penalty, which in turn will naturally tend to move the optimalstate-control path away from the state constraint. Other state constraintsexpress fundamental impossibilities. For instance, when considering theevolution of the consumer base for a given product (e.g., when designingan optimal dynamic pricing policy), it is impossible for this consumer

Page 119: Optimal Control Theory With Applications in Economics

106 Chapter 3

base to become smaller than zero or larger than 100 percent (see exer-cise 3.1). Yet, in those cases the system dynamics usually support the setof reasonable states X as an invariant set. For example, as the consumerbase approaches 100 percent, the speed of adoption tends to zero. Asimilar phenomenon occurs when the consumer base is near zero: thediscard rate must tend to zero because there are fewer and fewer con-sumers who are able to discard the product. Hence, in this example thestate space X = [0, 1] must be invariant under any admissible control.16

From a theoretical point of view, the PMP with state constraints (formu-lated for the first time by Gamkrelidze (1959) and more completely byDubovitskii and Milyutin (1965)) involves the use of measures and cancreate discontinuities in the adjoint variable. In the presence of state con-straints, the optimality conditions provided by the PMP are usually notcomplete; only rarely can they be used (at least with reasonable effort)to construct an optimal solution.17 �

Pontryagin et al. (1962) formulated the necessary optimality condi-tions for the optimal control problem (3.34)–(3.39); the version providedhere corresponds to the one given by Arutyunov (2000), which is in partbased on Dubovitskii and Milyutin (1965; 1981). The following threeadditional assumptions simplify its statement.

• Condition S (smoothness) The optimal control problem (3.34)–(3.39) issaid to satisfy the smoothness condition if the functions f (t, x, u), h(t, x, u),and R(t, x, u1) are continuously differentiable (in all arguments) and themultivalued mapping U2( · ) is constant.• Condition B (boundedness)An admissible process (x( · ), u( · ),ω) is said tosatisfy the boundedness condition if all sets U(t, x) = U1(t, x) × U2(t), with

U1(t, x) = {u1 ∈ Rm1 : R(t, x, u1) ≥ 0},are uniformly bounded relative to all (t, x) that lie in neighborhoods ofthe points (t0, x0) and (T, xT).• Condition C (compactness) An admissible process (x( · ), u( · ),ω) is saidto satisfy the (additional) compactness condition if there are neighborhoods

16. This simple insight is very broad. Because of the inherent finiteness of all humanendeavors, any realistic finite-dimensional state space can reasonably be assumedbounded (typically even compact) and invariant.17. Hartl et al. (1995) review results concerning necessary optimality conditions forstate-constrained optimal control problems. Even though the conditions provided byDubovitskii, Milyutin, and their students for problems with state constraints may appearcomplete, their application is usually very difficult and impractical.

Page 120: Optimal Control Theory With Applications in Economics

Optimal Control Theory 107

of the time endpoints t0 and T such that for a.a. t belonging to [t0, T] thesets U2(t) are compact. The multivalued mapping U2( · ) and the map-ping g = (f , h) are left-continuous at the point t0 and right-continuousat the point T, and the sets g(t0, x0, U(t0, x0)) and g(T, xT , U(T, xT)) areconvex (see footnote 30).

The Hamiltonian (or Hamilton-Pontryagin function) is given by

H(t, x, u,ψ , λ0) = λ0h(t, x, u) + 〈ψ , f (t, x, u)〉, (3.40)

where λ0 ∈ R is a constant andψ ∈ Rn is the adjoint variable. The (small)Lagrangian is given by

L(ω, λ) = λ0K0(ω) + 〈λ1, K1(ω)〉 + 〈λ2, K2(ω)〉, (3.41)

where λ1 ∈ Rk1 , λ2 ∈ Rk2 , ω = (t0, x0; T, xT). The necessary optimalityconditions for the general optimal control problem (3.34)–(3.39) can nowbe formulated as follows.

Proposition 3.5 (Pontryagin Maximum Principle: General OCP) Letassumptions A1–A5 be satisfied, and let (x∗( · ), u∗( · ),ω∗), with theendpoint data ω∗ = (t∗0, x∗

0; T∗, x∗T), be a solution to the optimal control

problem (3.34)–(3.39), such that conditions B and C hold. (1) There exista vector λ ∈ R1+k1+k2 , a measurable, essentially bounded function ρ :[t∗0, T∗] → RkR , and an absolutely continuous function ψ : [t∗0, T∗] → Rn

such that the following optimality conditions are satisfied:

• Adjoint equation:

−ψ(t) = Hx(t, x∗(t), u∗(t),ψ(t), λ0) + ρ(t)Rx(t, x∗(t), u1∗(t)), (3.42)

for all t ∈ [t∗0, T∗].• Transversality

ψ(t∗0) = −Lx0 (ω∗, λ) and ψ(T∗) = LxT (ω∗, λ). (3.43)

• Maximality

u∗(t) ∈ arg maxu∈U (t,x∗(t))

H(t, x∗(t), u,ψ(t), λ0), (3.44)

ρj(t) ≥ 0 and ρj(t)Rj(t, x∗(t), u1∗(t)) = 0, ∀ j ∈ {1, . . . , kR}, (3.45)

Hu1 (t, x∗(t), u∗(t),ψ(t), λ0) − ρ(t) Ru1 (t, x∗(t), u1∗(t)) = 0, (3.46)

for a.a. t ∈ [t∗0, T∗].

Page 121: Optimal Control Theory With Applications in Economics

108 Chapter 3

• Endpoint optimality

λ0 ≥ 0, λ1 ≥ 0, ∀ j ∈ {1, . . . , k1} : λ1,j K1j (ω∗) = 0, (3.47)

supu∈U (t∗0,x∗

0)H(t∗0, x∗

0, u, −Lx0 (ω∗, λ), λ0)− Lt0(ω∗, λ) = 0, (3.48)

supu∈U (T∗,x∗

T )H(T∗, x∗

T , u, LxT (ω∗, λ), λ0)+ LT(ω∗, λ) = 0. (3.49)

• Nontriviality18

‖λ‖ + ‖ψ(t)‖ �= 0, ∀ t ∈ [t∗0, T∗]. (3.50)

(2) If condition S also holds, then in addition to these optimalityconditions, the following obtains:

• Envelope condition

H(t, x∗(t), u∗(t),ψ(t), λ0)

= Ht(t, x∗(t), u∗(t),ψ(t), λ0) + 〈ρ(t), Rt(t, x∗(t), u∗(t))〉, (3.51)

for a.a. t ∈ [t∗0, T∗].Proof See section 3.6. n

The first five optimality conditions are sometimes termed the weak-ened maximum principle. The envelope condition can also be viewedas an adjoint equation with respect to time, where ψ0(t) = −H(t, x∗(t),u∗(t),ψ(t), λ0) is the corresponding adjoint variable (see section 3.3.3,where the same adjoint variable appears).

Remark 3.13 (Simplified OCP) In many practical models, the initial dataare fixed, there are no state-control constraints, and the control set isconstant (U(t, x) ≡ U), which yields the following simplified optimalcontrol problem:

J(u, T, xT) =∫ T

t0

h(t, x(t), u(t)) dt −→ maxu(·), T,xT

, (3.52)

x(t) = f (t, x(t), u(t)), x(t0) = x0, x(T) = xT , (3.53)

K1(T, xT) ≥ 0, K2(T, xT) = 0, (3.54)

18. As shown in section 3.6, if condition S holds, the nontriviality condition (3.50) can bestrengthened to λ0 + ‖ψ(t)‖ > 0, for all t ∈ (t∗0, T∗).

Page 122: Optimal Control Theory With Applications in Economics

Optimal Control Theory 109

u(t) ∈ U , ∀ t, (3.55)

t ∈ [t0, T], t0 < T, (3.56)

given (t0, x0).

Note that in the PMP for the simplified optimal control problem (3.52)–(3.56) one can without loss of generality assume that λ0 = 1 becausethe optimality conditions in proposition 3.5 become positively homo-geneous with respect to ψ( · ) and λ0. Thus, λ0 can be dropped fromconsideration without any loss of generality. �

Remark 3.14 (Salvage Value in the Simplified OCP) The objective func-tional (3.52) admits a continuously differentiable salvage value. Let

J(u, T, xT) =∫ T

t0

h(t, x(t), u(t)) dt + S(T, x(T)), (3.57)

where S : R1+n → R is a continuously differentiable function describingthe terminal value. The objective function (3.57) can be written in thesimple integral form (3.52) to obtain

J(u, T, xT) =∫ T

t0

h(t, x(t), u(t)) dt + S(T, x(T))

=∫ T

t0

h(t, x(t), u(t)) dt +∫ T

t0

S(t, x(t)) dt + S(t0, x0)

=∫ T

t0

[h(t, x(t), u(t)) + 〈Sx(t, x(t)), f (t, x(t), u(t))〉 + St(t, x(t))] dt

+ S(t0, x0)

=∫ T

t0

h(t, x(t), u(t)) dt + S(t0, x0)

= J(u, T, xT) + S(t0, x0),

provided one sets h(t, x, u) ≡ h(t, x, u) + 〈Sx(t, x, u), f (t, x, u)〉 + St(t, x).19�

Proposition 3.6 (Pontryagin Maximum Principle: Simplified OCP)Let assumptions A1–A5 be satisfied, and let (x∗( · ), u∗( · ), T∗, x∗

T) bea solution to the optimal control problem (3.52)–(3.56) such that

19. In remark 3.6 a similar idea for substitution was used.

Page 123: Optimal Control Theory With Applications in Economics

110 Chapter 3

Conditions B and C hold. Then there exist a vector λ ∈ R1+k1+k2 and anabsolutely continuous functionψ : [t0, T∗] → Rn such that the followingoptimality conditions are satisfied:

• Adjoint equation

ψ(t) = −Hx(t, x∗(t), u∗(t),ψ(t)), ∀ t ∈ [t0, T∗]. (3.58)

• Transversality

ψ(T∗) = LxT (T∗, x∗T , λ). (3.59)

• Maximality

u∗(t) ∈ arg maxu∈U

H(t, x∗(t), u,ψ(t)), ∀ t ∈ [t0, T∗]. (3.60)

• Endpoint optimality

λ1 ≥ 0, λ1,iK1i (T∗, x∗

T) = 0, ∀ i ∈ {1, . . . , k1}, (3.61)

supu∈U

H(T∗, x∗

T , u, LxT (T∗, x∗T , λ)

)+ LT(T∗, x∗T , λ) = 0. (3.62)

(2) If Condition S also holds, then in addition to these optimalityconditions, the following condition is satisfied:

• Envelope condition

H(t, x∗(t), u,ψ(t)) = Ht(t, x∗(t), u∗(t),ψ(t)), ∀ t ∈ [t0, T∗]. (3.63)

Remark 3.15 (Discounting and Current-Value Formulation) In many eco-nomic applications, the underlying system is time-invariant, that is,described by the IVP x = f (x, u), x(t0) = x0, and the kernel of the objectivefunctional J comprises time only in the form of an exponential discountfactor, so

J(u) =∫ T

t0

e−rth(x(t), u(t)) dt,

where r ≥ 0 is a given discount rate. In that case, the adjoint equation(3.58) (and correspondingly the entire Hamiltonian system (3.53),(3.58)for the variables x and ψ) admits an alternative time-invariant current-value formulation, which may be more convenient. This formulation isoften used when considering optimal control problems with an infinitetime horizon (see section 3.5).

To obtain a current-value formulation of the (simplified) PMP, firstintroduce the current-value adjoint variable

Page 124: Optimal Control Theory With Applications in Economics

Optimal Control Theory 111

ν(t) ≡ ertψ(t).

Then the Hamiltonian (3.40) in problem (3.52)–(3.56) can be written inthe form

H(t, x, u,ψ) = e−rtH(x, u, ν),

where the current-value Hamiltonian H is given by

H(x, u, ν) = h(x, u) + 〈ν, f (x, u)〉. (3.64)

Clearly, maximizing the Hamiltonian H(t, x, u,ψ) in (3.40) with respectto u is equivalent to maximizing the current-value Hamiltonian H(x, u, ν)in (3.64). Hence, the maximality condition (3.60) remains essentiallyunchanged, in the form

u∗(t) ∈ arg maxu∈U

H(x∗(t), u, ν(t)), ∀ t ∈ [t0, T∗]. (3.65)

The current-value version of the adjoint equation (3.58) can be obtainedas follows:

ν(t) = ddt

(ertψ(t)

) = rertψ(t) + ertψ(t) = rν(t) − ertHx(t, x(t), u(t),ψ(t))

= rν(t) − Hx(x(t), u(t), ν(t)). (3.66)

The transversality condition (3.59) becomes

ν(T∗) = erT∗LxT (T∗, x∗

T , λ). (3.67)

Condition (3.62) translates to

supu∈U

H(x∗

T , u, −LxT (T∗, x∗T , λ)

) = −erT∗LT(T∗, x∗

T , λ). (3.68)

To summarize, the necessary optimality conditions (3.58)–(3.62) areequivalent to conditions (3.66), (3.67), (3.65), (3.61), (3.68), respec-tively. �

Remark 3.16 (Fixed Time Horizon) In the case where the time horizon Tis fixed, the optimality conditions in proposition 3.6 specialize to theversion of the PMP formulated in proposition 3.4. �

The PMP can be used to identify solution candidates for a given opti-mal control problem. If a solution exists (see the brief discussion insection 3.8), then uniqueness of the solution candidate also implies itsglobal optimality. The following result by Mangasarian (1966) provides

Page 125: Optimal Control Theory With Applications in Economics

112 Chapter 3

sufficient optimality conditions in the framework of the PMP (ratherthan in the context of the HJB equation).

Proposition 3.7 (Mangasarian Sufficiency Theorem) Consider the ad-missible state-control trajectory (x∗(t), u∗(t)), t ∈ [t0, T], for the optimalcontrol problem (3.52)–(3.56), with fixed finite horizon T > t0 and freeendpoint xT .

(1) If H(t, x, u,ψ) is concave in (x, u), and there exists an absolutelycontinuous function ψ : [t0, T] → Rn such that

ψ(t) = −Hx(t, x∗(t), u∗(t),ψ(t)), ∀ t ∈ [t0, T], ψ(T) = 0,

and

u∗(t) ∈ arg maxu∈U

H(t, x∗(t), u,ψ(t)), ∀ t ∈ [t0, T],

then (x∗(t), u∗(t)), t ∈ [t0, T], is an optimal state-control trajectory.(2) If, in addition to the hypotheses in (1), H(t, x, u,ψ) is strictly

concave in (x, u), then the optimal state-control trajectory (x∗(t), u∗(t)),t ∈ [t0, T], is unique.

Proof (1) The state-control trajectory (x∗(t), u∗(t)), t ∈ [t0, T], is optimalif and only if for any admissible state-control trajectory (x(t), u(t)), t ∈[t0, T],

� =∫ T

t0

h(t, x∗(t), u∗(t)) dt −∫ T

t0

h(t, x(t), u(t)) dt ≥ 0.

But since H(t, x, u,ψ) = h(t, x, u) + 〈ψ , f (t, x, u)〉 = h(t, x, u) + 〈ψ , x〉, it fol-lows that

� =∫ T

t0

[H(t, x∗(t), u∗(t),ψ(t)) − H(t, x(t), u(t),ψ(t)) + 〈ψ(t), x(t) − x∗(t)〉

]dt.

By assumption, the Hamiltonian H(t, x, u,ψ) is concave in (x, u), so⟨∂H(t, x∗, u∗,ψ)

∂(x, u), (x∗, u∗) − (x, u)

⟩≤ H(t, x∗, u∗,ψ) − H(t, x, u,ψ);

the maximality condition implies further that

0 ≤ 〈Hu(t, x∗, u∗,ψ), u∗ − u〉.Combining the last two inequalities with the adjoint equation yields that

〈ψ , x − x∗〉 = 〈Hx(t, x∗, u∗,ψ), x∗ − x〉 ≤ H(t, x∗, u∗,ψ) − H(t, x, u,ψ).

Page 126: Optimal Control Theory With Applications in Economics

Optimal Control Theory 113

Hence, using the transversality condition ψ(T) = 0 and the fact thatx(t0) = x∗(t0) = x0, it is20

� ≥∫ T

t0

[〈ψ(t), x(t) − x∗(t)〉 + 〈ψ(t), x(t) − x∗(t)〉]dt

= 〈ψ(T), x(T) − x∗(T)〉 = 0,

which in turn implies that (x∗(t), u∗(t)), t ∈ [0, T], is an optimal state-control trajectory.

(2) Because of the strict concavity of H(t, x, u,ψ), if there are twooptimal state-control trajectories (x∗(t), u∗(t)) and (x∗(t), u∗(t)) (for t ∈[0, T]), then any convex combination λ(x∗(t), u∗(t)) + (1 − λ)(x∗(t), u∗(t))for λ ∈ (0, 1) yields a strictly higher value of the objective functionalunless (x∗(t), u∗(t)) and (x∗(t),u∗(t)) are the same (possibly up to ameasure-zero set of time instances in [0,T]), which concludes theproof. n

3.5 Infinite-Horizon Optimal Control

In practical decision problems, it may be either difficult or impossible tospecify a plausible planning horizon T. For example, an electric powerutility may not operate with any finite time horizon in mind, reflect-ing a going concern for preserving its economic viability indefinitely.Similarly, when considering the evolution of an economy in terms ofits capital growth, for example, it is natural to assume an infinite timehorizon. Accordingly, consider the following (simple) infinite-horizonoptimal control problem:

J(u) =∫ ∞

t0

e−rth(x(t), u(t)) dt −→ maxu(·)

, (3.69)

x(t) = f (x(t), u(t)), x(t0) = x0, (3.70)

u(t) ∈ U , ∀ t, (3.71)

t ∈ [t0, ∞), (3.72)

given (t0, x0), where r > 0 is a given discount rate, and the controlset U ⊂ Rm is nonempty, convex, and compact. Also assume that

20. Note that this inequality, and with it the Mangasarian sufficiency theorem, remainsvalid if the problem is in part endpoint-constrained.

Page 127: Optimal Control Theory With Applications in Economics

114 Chapter 3

the primitives of the problem satisfy assumptions A1 and A2 (seesection 3.4). In addition, assume either that h is bounded, or thatthe state space X ⊂ Rn is a compact invariant set. In either case theimage h(X , U) is bounded, so the objective functional J(u) must bebounded as well.

An infinite-horizon optimal control problem of this kind was con-sidered by Pontryagin et al. (1962, ch. 4). With the exception of thetransversality condition (3.27), the optimality conditions (3.26)–(3.29) ofthe finite-horizon PMP, as formulated in proposition 3.4, carry over tothe infinite-horizon problem (3.69)–(3.72) by taking the limit for T → ∞.Intuitively, one might expect that

limT→∞

ψ(T) = limT→∞

e−rTν(T) = 0 (3.73)

would be a natural transversality condition for the infinite-horizon opti-mal control problem (3.69)–(3.72). Yet, the following example demon-strates that the natural transversality condition (3.73) should not beexpected to hold in general.

Example 3.6 (Halkin 1974) Assume that the discount rate r is zero, andconsider the infinite-horizon optimal control problem

J(u) =∫ ∞

0(1 − x(t))u(t) dt −→ max

u(·),

x(t) = (1 − x(t))u(t), x(0) = 0,

u(t) ∈ [0, 1], ∀ t,

t ∈ [0, ∞).

Using the state equation and the initial condition, it is clear that

J(u) = limT→∞

∫ T

0x(t) dt = lim

T→∞x(T) ≤ 1.

Indeed, integrating the state equation directly yields

x(t) = 1 − exp[− ∫ t

0u(s) ds

],

so that any u(t) ∈ [0, 1], t ≥ 0, with∫ t

0 u(s) ds → ∞ as t → ∞ is optimal,and for any such optimal control u∗(t), the optimal value of the objec-tive function is J∗ = J(u∗) = 1. For example, if one chooses the constant

Page 128: Optimal Control Theory With Applications in Economics

Optimal Control Theory 115

optimal control u∗(t) ≡ u0 ∈ (0, 1), then by the maximality condition ofthe PMP

u0 ∈ arg maxu∈[0,1]

{(1 − x∗(t))(1 +ψ(t))u},

so ψ(t) ≡ −1. Hence, ψ(t) cannot converge to zero as t → ∞. �

The available transversality condition in infinite-horizon problems isoften weaker (and sometimes stronger) than the natural transversalityin (3.73).21 From a practical point of view it is often useful to first try natu-ral transversality, and then to prove the optimality of the resulting trajec-tory by a different argument, using the specific structure of the problemat hand. The most likely solution scenario is that the candidate solutiondetermined by the conditions of the PMP converges to a (dynamicallyoptimal) steady state x as T → ∞. The latter is determined as part of aturnpike (x, ν; u), which consists of an equilibrium state/co-state (x, ν) ofthe Hamiltonian system and an equilibrium control u, so that

0 = f (x, u), (3.74)

0 = rν− Hx(x, u, ν), (3.75)

u ∈ arg maxu∈U

H(x, u, ν). (3.76)

Intuitively, a turnpike is an optimal equilibrium state, which wouldbe maintained at the optimum should the system be started at thatstate.22 The following remark provides a different perspective on howthe turnpike can be obtained computationally.

Remark 3.17 (Implicit Programming Problem) Feinstein and Luenberger(1981) developed an alternative approach to determining a turnpike.

21. For example, from the Mangasarian sufficiency theorem (proposition 3.7) one canobtain the transversality condition lim infT→∞〈ψ(T), x(T) − x∗(T)〉 = 0, given any admis-sible trajectory x(t), t ≥ 0. Other transversality conditions can be obtained by successiveapproximation of the infinite-horizon problem by a sequence of finite-horizon problems(Aseev 1999). The corresponding results are, because of their technical complexity, atpresent available only in specialized monographs (Aseev and Kryazhimskii 2007; Aseev2009).22. It is important to note the difference between a turnpike and an optimal steady statetuple (x0, u0). The optimal steady state tuple (x0, u0) maximizes h(x, u) subject to f (x, u) = 0and u ∈ U , and is therefore independent of the discount rate. In economics, this is oftenreferred to as the golden rule. At the turnpike, on the other hand, the system evolvesoptimally over time, taking into account that moving to a different state (such as x0,implied by the golden rule) may be too costly (or that moving away from a state such as x0using a nonstationary control may increase the decision maker’s discounted payoff ).

Page 129: Optimal Control Theory With Applications in Economics

116 Chapter 3

Instead of thinking of the turnpike (x, u) as an equilibrium of the Hamil-tonian system, they showed that any solution to the so-called implicitprogramming problem

x ∈ arg max(x,u)∈X×U

h(x, u), (3.77)

subject to

f (x, u) = r(x − x), u ∈ U , (3.78)

constitutes in fact a turnpike. The problem is termed an implicit pro-gramming problem because it essentially determines x as a fixed point,featuring its solution x in (3.77) and in the constraint (3.78). Indeed, theoptimality conditions for the constrained optimization problem (3.77)–(3.78) are equivalent to finding an equilibrium of the Hamiltoniansystem and using the maximality condition, that is, equivalent torelations (3.74)–(3.76). �

Example 3.7 (Infinite-Horizon Optimal Consumption) Consider an infinite-horizon version of the optimal consumption problem in example 3.4. Tokeep things as simple as possible, the present example concentrates onthe special case where the investor’s utility function U(c) = ln (c), andthere is no excess return, so that α = r. Then the investor solves theinfinite-horizon optimal control problem

J(c) =∫ ∞

0e−rt ln (c(t)) dt −→ max

c(·),

x(t) = rx(t) − c(t), x(0) = x0,

c(t) ∈ [0, cmax], ∀ t ≥ 0,

t ∈ [0, ∞),

where the initial capital x0 > 0 and the spending limit cmax ≥ rx0 aregiven constants. The current-value Hamiltonian for this problem isgiven by

H(x, c, ν) = ln (c) + ν(rx − c).

From the PMP, one obtains the adjoint equation ν(t) = rν(t) − rν(t) ≡0, so the current-value adjoint variable is constant, ν(t) ≡ ν(0). By themaximality condition, c(t) = 1/ν(t) ≡ 1/ν(0). Thus, integrating the stateequation over any finite time horizon T > 0, it is

Page 130: Optimal Control Theory With Applications in Economics

Optimal Control Theory 117

x(T) = x0erT −∫ T

0

ert

ν(0)dt = x0erT − erT − 1

rν(0)≡ x0,

for all T > 0, provided that ν(0) = 1/(rx0). Hence, the unique turnpikeis (x, c) = (x0, rx0). The golden rule of optimal consumption for this prob-lem is that when there is no excess return, it is best to maintain a constantcapital and to consume at a rate equal to the interest revenue rx0. �

Example 3.8 (Optimal Advertising) Consider the system

x = (1 − x)aκ −βx (3.79)

with some constants β > 0 and κ ∈ (0, 1). The control variable a ∈ [0, a](with some upper bound a > 0) denotes the intensity of a firm’s adver-tising activity and the state variable x ∈ [0, 1] the installed base, namely,the percentage of the entire population of potential customers who havealready bought the firm’s product. The problem is, given an initial statex(0) = x0 ∈ (0, 1), to maximize the firm’s discounted profits23

J(a) =∫ ∞

0e−rt ((1 − x(t))aκ (t) − ca(t)

)dt,

where c > 0 is the cost of advertising. Note first that a simple integrationby parts using (3.79) yields J(a) = cJ(u) − x0, where u = aκ and

J(u) =∫ ∞

0e−rt (γ x(t) − u1/κ (t)

)dt,

withγ = (r +β)/c. Given an optimal state-control trajectory (x∗(t), u∗(t)),t ≥ 0, by the maximality condition of the PMP it is

u∗(t) = min{aκ , (κ(1 − x(t))ν(t))κ

1−κ } > 0, ∀ t ≥ 0,

and for a > (κν)1/κ , the Hamiltonian system becomes

x∗ = (1 − x∗)1

1−κ (κν)κ

1−κ −βx∗,

ν = − γ + (r +β)ν+ (κ(1 − x∗)) κ

1−κ ν1

1−κ .

This system possesses a unique equilibrium (x, ν), characterized by(x,ν)

∣∣(x,ν) = 0, or equivalently

23. The demand is equal to the positive inflow to the installed base, (1 − x)aκ . The firmis assumed to be a price taker in a market for durable goods, where the price has beennormalized to 1. The products have a characteristic lifetime of 1/β before being discarded.The exponent κ models the effect of decreasing returns to investment in advertising.

Page 131: Optimal Control Theory With Applications in Economics

118 Chapter 3

Figure 3.5Trajectories of a Hamiltonian system (see example 3.8).

κγ =⎛⎝ r +β

(1 − x)1κ

+ βx

(1 − x)1−κ2κ(1−κ)

⎞⎠ (βx)

1−κκ ,

ν = (βx)1−κκ

κ(1 − x)1κ

.

For u large enough there is a (unique) turnpike; a corresponding quali-tative phase diagram is given in figure 3.5. Weber (2006) showed that theadjoint variable ν is bounded. But the only trajectories that can satisfysuch bounds are such that they asymptotically converge to the turnpike.Because the equilibrium (x, ν) is a saddle point, all other trajectories ofthe Hamiltonian system cannot satisfy the bounds on the adjoint vari-able. Thus, uniqueness, asymptotic convergence, and the precise formof the optimal solution are essentially implied when the adjoint variableis bounded.24 �

24. This example is based on Weber’s (2006) discussion of a classic model for optimaladvertising spending by Vidale and Wolfe (1957), as an application of a stronger versionof the maximum principle for a class of infinite-horizon problems, which guarantees(instead of a standard transversality condition) that the current-value adjoint variable νis bounded.

Page 132: Optimal Control Theory With Applications in Economics

Optimal Control Theory 119

Remark 3.18 (Bellman Equation) Because of the time invariance of theinfinite-horizon optimal control problem (3.69)–(3.72), its value func-tion V(x) does not depend on time. Consequently, the relevant HJBequation (3.20) for T → ∞ simplifies to

rV(x) = maxu∈U

{h(x, u) + 〈Vx(x), f (x, u)〉}, ∀ x ∈ X , (3.80)

which is commonly referred to as the Bellman equation. �

Example 3.9 (Infinite-Horizon Linear-Quadratic Regulator) Consider atime-invariant infinite-horizon version of the linear-quadratic regula-tor problem in example 3.3, where all the matrices are constant, andthe time horizon T → ∞. If one sets V(x) = −x′Qx for some positivedefinite matrix Q ∈ Rn×n, then the Bellman equation (3.80) becomes

0 = maxu∈U

{−x′ (R − rQ + QA + A′Q)

x − u′Su}, ∀ x ∈ Rn,

which, as in the finite-horizon case, yields a linear optimal feedback law,

μ(x) = −S−1B′Qx, ∀ x ∈ Rn,

where Q solves the algebraic matrix Riccati equation

−rQ + QA + A′Q + QBS−1B′Q = −R.

Hence, the optimal control becomes u∗(t) = μ(x∗(t)), for all t ≥ 0, which,together with the IVP x∗(t) = Ax∗(t) + Bu∗(t), x∗(0) = x0, determines theoptimal state-control trajectory. �

3.6 Supplement 1: A Proof of the Pontryagin Maximum Principle

This section provides an (almost complete) rigorous proof of the PMP asformulated in proposition 3.5 for the finite-horizon optimal control prob-lem (3.34)–(3.39) with state-control constraints. The proof has two majorparts. The first part establishes a simplified version of the maximumprinciple for a problem without state-control constraints and withoutendpoint constraints; the second part considers the missing constraintsin the linear-concave case, where (strictly speaking) the system func-tion f is linear in the control and the cost function h is concave in thecontrol. In that case it is further assumed that the control-constraint setis always convex and compact.

Both parts of the proof are very instructive. The first part shows howto construct appropriate variations in the endpoint data and the control,

Page 133: Optimal Control Theory With Applications in Economics

120 Chapter 3

which leads to the adjoint equation (including transversality), the maxi-mality condition, and the endpoint-optimality conditions. The commonintuition for these optimality conditions is that at the optimum the vari-ations vanish, quite similar to the standard interpretation of Fermat’slemma in calculus (see proposition A.11). The techniques in the sec-ond part of the proof are geared toward constructing a sequence ofsimplified optimal control problems that converges to the constrainedoptimal control problem (3.34)–(3.39). In the simplified optimal controlproblems, the constraints are relaxed and violations of these constraintsare increasingly penalized. One can then show that the sequence ofoptimality conditions (corresponding to the sequence of simplifiedproblems) converges toward optimality conditions of the constrainedproblem.

Some of the techniques used in the second part of the proof are beyondthe scope of this book. The reader can skip that part without any con-sequence. Restricting attention to a somewhat simpler case clarifies themain intuition and required tools. Note also that all applications withstate-control constraints in this book (see chapter 5) are linear-concave.A complete proof of the PMP for the optimal control problem (3.34)–(3.39) with additional pure state constraints (see remark 3.12) is given,for example, by Arutyunov (2000, ch. 2).

3.6.1 Problems without State-Control ConstraintsConsider a simplified version of the finite-horizon optimal control prob-lem (3.34)–(3.39), where the functions K1, K2, and R vanish, and the con-trol u(t) lies in the uniformly bounded, measurable control-constraintset U(t). The resulting problem is therefore free of constraints on theendpoint data ω and does not exhibit any state-control constraints:

J(u,ω) =∫ T

t0

h(t, x(t), u(t)) dt + K0(ω) −→ maxu(·),ω

, (3.81)

x(t) = f (t, x(t), u(t)), x(t0) = x0, x(T) = xT , (3.82)

u ∈ U(t), ∀ t, (3.83)

t ∈ [t0, T], t0 < T. (3.84)

Let (x∗, u∗,ω∗), with ω∗ = (t∗0, x∗0; T∗, x∗

T), be a solution to the initial valueproblem (3.82), so that

x∗(t) = f (t, x∗(t), u∗(t)), x∗(t∗0) = x∗0, ∀ t ∈ [t∗0, T∗],

Page 134: Optimal Control Theory With Applications in Economics

Optimal Control Theory 121

and by compatibility of the endpoint data with the state trajectory, it isx∗(T∗) = x∗

T . Consider now the adjoint equation (3.42),

ψ(t) = −λ0hx(t, x∗(t), u∗(t)) −ψ ′(t)fx(t, x∗(t), u∗(t)). (3.85)

By proposition 2.15 there exists a solution ψ(t), t ∈ [t∗0, T∗], to this lineartime-varying ODE, with initial condition

ψ(t∗0) = −λ0K0x0

(ω∗). (3.86)

Let λ0 = 1. The following three steps show that the adjoint variable ψ ,which by construction satisfies the adjoint equation, is also compatiblewith the transversality, maximality, and endpoint-optimality conditionsof the PMP in proposition 3.5.

Step 1: Transversality Fix a vector x ∈ Rn. Then, by continuous depen-dence of the solutions to a well-posed ODE with respect to initialconditions,25 for any α ≥ 0 the IVP

x(t) = f (t, x(t), u∗(t)), x(t∗0) = x∗0 +αx (3.87)

has a solution x(t,α), t ∈ [t∗0, T∗]. Let

ω(α) = (t∗0, x(t∗0,α); T∗, x(T∗,α))

be the corresponding vector of endpoint data, and let J∗ = J(u∗,ω∗) bethe maximized objective. Then

J(u∗,ω(α)) − J∗

α≤ 0, ∀α > 0.

Taking the limit for α → 0+ implies that dJ(u∗,ω(α))dα

∣∣∣α=0+ ≤ 0, that is,

〈K0x0

(ω∗), x〉 + 〈K0xT

(ω∗), xα(T∗, 0)〉

+∫ T∗

t∗0〈hx(t, x(t, 0), u∗(t)), xα(t, 0)〉dt ≤ 0. (3.88)

Recall the discussion in chapter 2 (remark 2.3) on the evolution of thesensitivity matrix S(t) = xα(t, 0),26 which satisfies the linear time-variantIVP

25. If one sets y = x −αx, y0 = x∗0 +αx, and f (t, y,α) ≡ f (t, y +αx, u∗(t)), then the IVP

y = f (t, y,α), y(t0) = y0, is equivalent to the IVP (3.87), and proposition 2.5 guaranteescontinuous dependence.26. The sensitivity “matrix” here is just a vector.

Page 135: Optimal Control Theory With Applications in Economics

122 Chapter 3

S = fx(t, x(t, 0), u∗(t))S, S(t∗0) = x. (3.89)

Since x∗(t) = x(t, 0) and ψ(t) satisfies the adjoint equation (3.85),

− ddt

〈ψ(t), S(t)〉 = 〈Hx(t, x∗(t), u∗(t),ψ(t)), S(t)〉−〈ψ(t), fx(t, x∗(t), u∗(t))S(t)〉= 〈hx(t, x∗(t), u∗(t)), S(t)〉,

for all t ∈ [t∗0, T∗]. Integrating the last identity between t = t∗0 and t = T∗,and combining the result with inequality (3.88), yields that

〈ψ(t∗0), S(t∗0)〉 − 〈ψ(T∗), S(T∗)〉 =∫ T∗

t∗0〈hx(t, x∗(t), u∗(t)), S(t)〉dt

≤ −〈K0x0

(ω∗), x〉 − 〈K0xT

(ω∗), S(T∗)〉.Let �(t, t∗0) be the fundamental matrix (see section 2.3.2) sothat �(t∗0, t∗0) = I, and S(T∗) = �(T∗, t∗0)x. Using the initial condi-tion (3.86) and the initial condition in (3.89) gives

0 ≤ 〈ψ(T∗) − K0xT

(ω∗),�(T∗, t∗0)x〉,independent of the chosen x. Hence, the transversality condition,

ψ(T∗) = K0xT

(ω∗),

must necessarily hold because �(T∗, t∗0) is a nonsingular matrix bylemma 2.5(6), which states that the Wronksian, det�(t, t∗0), is positive.

Step 2: Maximality Since (in the absence of state-control constraints)the set-valued mapping U(t) is measurable, there exists a sequence ofmeasurable selectors {σ k( · )}k∈N such that the set {σ k(t)}k∈N is everywheredense in U(t) for a.a. t ∈ [t∗0, T∗] (see footnote 15). It can be shown thatthe maximality condition holds for a given Lebesgue point t ∈ (t∗0, T∗)of the function f (t, x∗(t), u∗(t)).27 For a given integer k ≥ 0 and realnumber α > 0, let

uk(t,α) ={σ k(t) if t ∈ (t −α, t),u∗(t), otherwise,

27. The point t is a Lebesgue point of a measurable function ϕ(t) if

limε→0

(1/ε)∫ t

t−ε‖ϕ(t) −ϕ(t)‖ dt = 0,

that is, ifϕ does not vary too much at that point. Because the control is essentially bounded,almost all points are Lebesgue points.

Page 136: Optimal Control Theory With Applications in Economics

Optimal Control Theory 123

be a needle variation of the optimal control u∗. Let xk(t,α) be the solutionto the corresponding IVP,

x(t) = f (t, x(t), uk(t,α)), x(t∗0) = x∗0;

this solution exists on [t∗0, T∗], as long as α is sufficiently small. Then,because t is a Lebesgue point, it follows that

x∗(t) − x∗(t −α)α

= f (t, x∗(t), u∗(t)) + O(α),

and

xk(t,α) − x∗(t −α)α

= f (t, xk(t,α), σ k(t)) + O(α),

where O( · ) is the familiar Landau notation (see footnote 16). Hence, thelimit

�(t) ≡ limα→0+

xk(t,α) − x∗(t)α

= f (t, x∗(t), σ k(t)) − f (t, x∗(t), u∗(t)) (3.90)

is well-defined. Since the optimal control u∗(t) is applied for t > t, thestate trajectories x∗(t) and xk(t,α) satisfy the same ODE for those t. Bythe same logic as in remark 2.3 (on sensitivity analysis),

�(t) = fx(t, x∗(t), u∗(t))�(t), ∀ t ∈ (t, T∗),

where �(t) = xkα(t, 0). Thus, using the adjoint equation (3.85), it is

ddt

〈ψ(t),�(t)〉 = −〈hx(t, x∗(t), u∗(t)),�(t)〉, ∀ t ∈ (t, T∗).

Integrating the last equation between t = t and t = T∗ and using thetransversality condition (3.86) at the right endpoint (see step 1) yields

〈ψ(t),�(t)〉 =∫ T∗

t〈hx(s, x∗(s), u∗(s)),�(s)〉 ds + 〈K0

xT(ω∗),�(T∗)〉. (3.91)

If one sets J∗ = J(u∗,ω∗), then

0 ≤ limα→0+

J∗ − J(uk,ω(α))α

= limα→0+

∫ t

t−αh(t, x∗(t), u∗(t)) − h(t, xk(t,α), σ k(t))

αdt

Page 137: Optimal Control Theory With Applications in Economics

124 Chapter 3

+ limα→0+

∫ T∗

t

h(t, x∗(t), u∗(t)) − h(t, xk(t,α), u∗(t))α

dt

+ limα→0+

K0(ω∗) − K0(ω(α))α

= h(t, x∗(t), u∗(t)) − h(t, x∗(t), σ k(t)) −∫ T∗

t〈hx(t, x∗(t), u∗(t)),�(t)〉dt

− 〈K0xT

(ω∗),�(T∗)〉= h(t, x∗(t), u∗(t)) − h(t, x∗(t), σ k(t)) − 〈ψ(t),�(t)〉,

where the last equal sign is due to (3.91). Thus substituting the ex-pression for �(t) in (3.90) gives

H(t, x∗(t), σ k(t),ψ(t)) ≤ H(t, x∗(t), u∗(t),ψ(t)),

namely, maximality at t = t. The previous inequality holds for any k ≥ 0.The maximality condition (3.44) is therefore established at t, becausethe sequence {σ k(t)}∞k=0 is by assumption everywhere dense in U(t). Thisimplies that maximality holds a.e. (almost everywhere) on [t∗0, T∗].

Step 3: Endpoint Optimality Now consider the transversality withrespect to the endpoint constraint. If for α > 0 one sets ω(α) = (t∗0, x∗

0;T∗ −α, x∗(T∗ −α)), then

0 ≤ J∗ − J(u∗,ω(α))α

= 1α

∫ T∗

T∗−αh(t, x∗(t), u∗(t)) dt + K0(t∗0, x∗

0; T∗, x∗T)

α

− K0(t∗0, x∗0; T∗ −α, x∗(T∗ −α))

α

= 1α

∫ T∗

T∗−α

(h(t, x∗(t), u∗(t)) + dK0(t∗0, x∗

0; t, x∗(t))dt

)dt

=∫ T∗

T∗−α

H(t, x∗(t), u∗(t), K0xT

(t∗0, x∗0; t, x∗(t))) + K0

T(t∗0, x∗0; t, x∗(t))

αdt

= supu∈U (T∗,x∗

T )H(T∗, x∗

T , u, K0xT

(ω∗)) + K0T(ω∗) + O(α),

Page 138: Optimal Control Theory With Applications in Economics

Optimal Control Theory 125

where J∗ = J(u∗,ω∗). Recall the small Lagrangian in (3.41), L = λ0K0 =K0, so that taking the limit for α → 0+ implies, via Fermat’s lemma, thatthe endpoint-optimality condition (3.49) holds. Condition (3.48) obtainsin a completely analogous manner.

3.6.2 Problems with State-Control ConstraintsNow consider the general finite-horizon optimal control problem (3.34)–(3.39), referred to here as problem (P). The proof outline proceeds in sixsteps (and omits some technical details).

Step 1: Approximate Problem (P) by a Sequence of Problems {(Pk)}∞k=1Choose positive numbers ε, δ, and28

u ≥ 2 + ess supt∈[t∗0,T∗]

‖u∗(t)‖,

relative to which the sequence {(Pk)}∞k=1 of relaxed problems will bedefined. For any k ≥ 1, let

hk(t, x, u) = h(t, x, u) − δ‖u(t) − u∗(t)‖2 − k‖R−(t, x, u1)‖2, (3.92)

where R− = min{0, R} has nonzero (negative) components wheneverthe state-control constraint (3.37) in (P) is violated. Similarly, for anyfeasible boundary data ω ∈ R2(1+n) set

K0,k(ω) = K0(ω) − ‖ω−ω∗‖2 − k(‖K1

−(ω)‖2 + ‖K2(ω)‖2) , (3.93)

where K1− = min{0, K1}, to penalize deviations from the endpoint con-straints (3.36) in (P). For any k ≥ 1 the relaxed problem (Pk) can nowbe formulated, given an optimal solution (x∗, u∗,ω∗) to the originalproblem (P):⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Jk(u,ω) = ∫ Tt0

hk(t, x(t), u(t)) dt + K0,k(ω) −→ maxu(·),ω

s.t.

x(t) = f (t, x(t), u(t)), x(t0) = x0, x(T) = xT ,

ε ≥ ‖x(t) − x∗(t)‖∞ + ‖ω−ω∗‖2, ( ∗ )

u(t) ∈ Uε,u(t, x(t)), ∀ t,

t ∈ [t0, T], t0 < T,

⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭

(Pk)

28. The essential supremum of a nonempty set S ⊂ R, denoted by ess sup S, is the small-est upper bound M such that the set of elements in S greater than M is of measurezero.

Page 139: Optimal Control Theory With Applications in Economics

126 Chapter 3

where (with e = (1, . . . , 1) ∈ RkR+ and ε > 0 small) the relaxed constraint

set is given by

Uε,u(t, x) = {(u1, u2) ∈ Rm1 × U2(t) : ‖(u1, u2)‖ ≤ u, R(t, x, u1) ≥ −εe}.Denote a solution to the relaxed problem (Pk) by (xk, uk ,ωk), where ωk =(tk

0, xk0; Tk, xk

T) is the vector of endpoint data. For all t /∈ [tk0, Tk] extend any

such solution continuously in such a way that the state trajectory xk isconstant outside [tk

0, Tk].

Step 2: Show That the Relaxed Problem (Pk) Has a Solution for Allk ≥ 1 Let {(xk,j, uk,j,ωk,j)}∞j=1 be an admissible maximizing sequence for

the problem (Pk).29 Since uk,j takes values in the closed ball of Rn at 0of radius u, and x(t) lies, by virtue of the constraint (*) in (Pk), in aneighborhood of the (uniformly bounded) x∗(t) for all t, this maximiz-ing sequence is uniformly bounded, which allows the following threeconclusions for an appropriate subsequence (for simplicity the originalmaximizing sequence is identified with this subsequence by relabel-ing the indices if necessary). First, from the definition of an admissiblesequence {xk,j}∞j=1 ⊂ W1,∞ (see remark A.2) and the uniform bounded-ness of {xk,j}∞j=1 this sequence of state trajectories is equicontinuous,30 soby the Arzelà-Ascoli theorem (proposition A.5) it converges uniformlyto xk . Second, one obtains pointwise convergence of ωk,j to ωk as j → ∞.Third, since the space of admissible controls L∞ is a subset of the space L2

(see example A.2), uk,j converges weakly to uk as j → ∞.31

Now one can show that in fact the above limits coincide with thesolution to (Pk), that is,

(xk , uk,ωk) = (xk, uk, ωk). (3.94)

For any t ∈ [tk0, Tk] it is

xk,j(t) = xk,j(tk,j0 ) +

∫ t

tk,j0

f (ϑ , xk,j(ϑ), uk,j(ϑ)) dϑ ,

so taking the limit for j → ∞ gives

29. The maximizing sequence is such that the corresponding sequence of objective val-ues J(uk,j,ωk,j) converges to the optimal value (Gelfand and Fomin, 1963, 193).30. Equicontinuity is defined in appendix A, footnote 10.31. By the Banach-Alaoglu theorem the unit ball in L2 is weakly∗ (and therefore weakly)compact, so that by the Eberlein-Šmulian theorem it is also weakly sequentially compact(Megginson 1998, 229,248). This property of reflexive Banach spaces can also be deducedfrom the uniform boundedness principle (Dunford and Schwartz 1958, ch. 2).

Page 140: Optimal Control Theory With Applications in Economics

Optimal Control Theory 127

xk(t) = xk(tk0) +

∫ t

tk0

f (ϑ , xk(ϑ), uk(ϑ)) dϑ .

The limiting tuple (xk, uk , ωk), with ωk = (tk0, xk

0; Tk, x0T), solves the IVP

˙xk(t) = f (t, x(t), uk(t)), xk(tk0) = xk

0,

for all t ∈ [tk0, Tk]. The state constraint ε ≥ ‖xk − x∗‖∞ + ‖ωk −ω∗‖2 is

satisfied by uniform convergence of the maximizing sequence. Last,the control constraint uk ∈ Uε,u is a.a. satisfied because each uk,j, j =1, 2, . . . , is feasible (u has been chosen appropriately large). The weakconvergence uk,j w→ uk as j → ∞ implies, by Mazur’s compactnesstheorem (Megginson 1998, 254), that there exists a sequence {vk,j}∞j=1

with elements in the convex hull co {uk,j}∞j=1, which converges stronglyto uk in Lm

2 [t∗0, T∗].32 Therefore, equation (3.94) holds, that is, the limitpoint (xk, uk, ωk) of the maximizing sequence describes an admissiblesolution to the relaxed problem (Pk).

Step 3: Show That the Solutions of (Pk)k≥1 Converge to the Solutionof (P) As before, there exists an admissible tuple (x, u, ω) such that xk ⇒x, uk → u (a.e.), and ωk → ω. Now one can show that

(x, u, ω) = (x∗, u∗,ω∗), (3.95)

in particular that xk ⇒ x∗, uk → u∗ (a.e.), and ωk → ω∗.Let Jk(u,ω) = ∫ T

t0hk(t, x(t), u(t)) dt + K0,k(ω), as in step 1. The uniform

boundedness of the state-control trajectories implies that there existsa constant M > 0 such that M ≥ J(uk ,ωk) − J(u∗,ω∗) for all k. SinceJk(uk,ωk) ≥ Jk(u∗,ω∗) = J(u∗,ω∗), it is

Mk

≥∫ Tk

tk0

(δ‖uk−u∗‖2

k+ ‖R−‖2

)dt+‖ωk −ω∗‖2

k+ ‖K1

−(ωk)‖2+‖K2(ωk)‖2

≥ 0.

Taking the limit for k → ∞ we obtain by continuity of K1−( · ) andK2( · ) that ω satisfies the endpoint constraints K1(ω) ≥ 0 and K2(ω) = 0.Moreover,

limk→∞

∫ Tk

tk0

‖R−(t, xk(t), u1,k(t))‖2dt = 0,

32. The convex hull of a set of points ξ 1, . . . , ξ l of a real vector space, denoted byco {ξ 1, . . . , ξ l}, is the minimal convex set containing these points.

Page 141: Optimal Control Theory With Applications in Economics

128 Chapter 3

whence R−(t, x(t), u1(t)) = 0 a.e. on [t∗0, T∗]. Hence, it has been shownthat the limit (x, u, ω) is admissible in problem (P). This implies

J(u∗,ω∗) ≥ J(u, ω). (3.96)

On the other hand, Jk(uk,ωk) ≥ Jk(u∗,ω∗) = J(u∗,ω∗), so

J(uk,ωk) − ‖ωk −ω∗‖2 − δ

∫ Tk

tk0

‖uk(t) − u∗(t)‖2 dt ≥ J(u∗,ω∗),

for all k ≥ 1. Taking the limit for k → ∞ yields

J(u, ω) − ‖ω−ω∗‖2 − δ limk→∞

∫ Tk

tk0

‖uk(t) − u∗(t)‖2 dt ≥ J(u∗,ω∗),

which together with (3.96) implies that ω = ω∗, u = u∗, and

limk→∞

∫ T∗

t∗0‖uk(t) − u∗(t)‖2 dt = 0,

so the sequence {uk}∞k=1 converges to u∗ a.e. on [t∗0, T∗].

Step 4: Show That the Problem (Pk) becomes a Standard Optimal Con-trol Problem (P′

k) for Large k Because of the uniform convergence ofthe optimal state trajectories xk and the pointwise convergence of theboundary data ωk (as k → ∞) to the corresponding optimal state tra-jectory x∗ and optimal boundary data ω∗ of the original problem (P),respectively, the state constraint in the relaxed problem (Pk) is notbinding, namely,

ε > ‖xk − x∗‖∞ + ‖ωk −ω∗‖2,

as long as k is sufficiently large. Hence, for fixed constants ε, δ, and u(see step 1) there exists a k0 = k0(ε, δ, u) ≥ 1 such that for all k ≥ k0 theproblem (Pk) can be rewritten equivalently in the form⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎩

Jk(u,ω) = ∫ Tt0

hk(t, x(t), u(t)) dt + K0,k(ω) −→ maxu(·),ω

s.t.

x(t) = f (t, x(t), u(t)), x(t0) = x0, x(T) = xT ,

u(t) ∈ Uε,u(t, x(t)), ∀ t,

t ∈ [t0, T], t0 < T.

⎫⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎭

(P′k)

Page 142: Optimal Control Theory With Applications in Economics

Optimal Control Theory 129

Necessary optimality conditions for this type of optimal control problemwithout state-control constraints were proved in section 3.6.1.

Step 5: Obtain Necessary Optimality Conditions for (P′k) Let Hk(t, x,

u,ψk , λk0) = λk

0hk(t, x, u) + 〈ψ k, f (t, x, u)〉 be the Hamiltonian associatedwith problem (P

′k), where λk

0 ∈ R is a constant multiplier, and ψk ∈ Rn

is the adjoint variable. The Hamiltonian represents the instantaneouspayoff to the decision maker, including the current benefit of the statevelocities. The shadow price of the instantaneous payoff is λk

0, and theshadow price of the state velocity is given by the adjoint variable ψ k .Before formulating the necessary optimality conditions for problem (P

′k),

note that because of the definition of the constraint set Uε,u(t, xk), whichrelaxes the state-control constraints by a finite increment ε, this set isin fact independent of the state for k large enough. In what follows it istherefore assumed that the sequence of relaxed problems has progressedsufficiently.

Maximum Principle for Problem (P′k) If (xk, uk,ωk) is an optimal solu-

tion for the problem (P′k), then there exist an absolutely continuous

functionψk : [tk0, Tk] → Rn and a constant λk

0 > 0 such that the followingrelations hold:

• Adjoint equation

−ψk(t) = Hkx(t, xk(t), uk(t),ψk(t), λk

0). (3.97)

• Transversality

ψk(tk0) = − λk

0K0,kx0

(ωk), (3.98)

ψk(Tk) = λk0K0,k

xT(ωk). (3.99)

• Maximality

uk(t) ∈ arg maxu∈Uε,u(t,xk (t))

Hk(t, xk(t), u,ψk(t), λk0), (3.100)

a.e. on [tk0, Tk].

• Endpoint optimality

supu∈Uε,u(t0,xk

0)

Hk(tk0, xk

0, u, −λk0K0,k

x0(ωk), λk

0) − λk0K0,k

t0(ωk) = 0, (3.101)

supu∈Uε,u(T,xk

T )

Hk(Tk, xk0, u, λk

0K0,kT (ωk), λk

0) + λk0K0,k

T (ωk) = 0. (3.102)

Page 143: Optimal Control Theory With Applications in Economics

130 Chapter 3

Applying the necessary optimality conditions (3.97)–(3.102) to therelaxed problem (P

′k) yields the adjoint equation

−ψ k = λk0hx(t, xk(t), uk(t)) + (ψk)′fx(t, xk(t), uk(t))

+ ρk(t)Rx(t, xk(t), u1,k(t)), (3.103)

for all t ∈ [t∗0, T∗], where

ρk(t) = −2kλk0R−(t, xk(t), u1,k(t)) ∈ R

kR+ . (3.104)

The maximality condition (3.100) contains a constrained optimizationproblem, for which there exist Lagrange multipliers ζ k(t) ∈ R

kR+ andς k(t) ∈ R+ such that

λk0Hk

u(t, xk(t), uk(t),ψk(t), λk0) + ζ k(t)Ru(t, x∗(t), u1,k(t)) + ς k(t)uk(t) = 0,

with complementary slackness conditions

ζ kj (t)(Rj(t, xk(t), u1,k(t)) − ε) = 0, j ∈ {1, . . . , kR},

and

ς k(t)(‖uk(t)‖ − u) = 0,

for all t ∈ [tk0, Tk] (with the usual extension if needed). By step 3, uk → u∗

a.e. on [t∗0, T∗]. Hence, by Egorov’s theorem (Kirillov and Gvishiani 1982,24), for any δ > 0, there is a subset�δ of [t∗0, T∗] such that

∫[t∗0,T∗]\�δ dt < δ

and uk ⇒ u∗ uniformly on �δ . Since u∗ is feasible, this uniform con-vergence implies that ‖uk‖ < u on �δ for k large enough. By virtue ofcomplementary slackness, the corresponding Lagrange multipliers ζ k

andς k therefore vanish on�δ as long as k is large enough. In other words,

λk0Hk

u(t, xk(t), uk(t),ψk(t), λk0) = 0, (3.105)

a.e. on [t∗0, T∗] as long as k is large enough.

Step 6: Derive Necessary Optimality Conditions for (P) The sequence{λk

0}∞k=1 is uniformly bounded and {ψk}∞k=1 is also equicontinuous. Hence,as in step 2, there exist ψδ,u and λδ,u such that

ψk ⇒ ψδ,u, λk0 → λδ,u.

As already indicated through the notation, the limitsψδ,u and λδ,u gener-ically depend on the constants δ and u. More specifically, these limitscorrespond to the optimal solution to (P) if h is replaced by h − δ‖u − u∗‖2

and the additional constraint ‖u‖ ≤ u is introduced.

Page 144: Optimal Control Theory With Applications in Economics

Optimal Control Theory 131

Adjoint Equation Since by the maximum principle for problem (P′k)

(see step 5) it is λk0 > 0, relations (3.97)–(3.99) are positively homoge-

neous of degree 1 in ψk/λk0, and relation (3.100) is positively homoge-

neous of degree zero, it is possible to multiply equations (3.97)–(3.99)with positive numbers (and relabel the variables λk

0 and ψk back) suchthat

0 < λk0 + max

t∈[tk0,Tk ]

‖ψk(t)‖2 ≤ 1. (3.106)

Integrating the components of the adjoint equation (3.103) yields, usingthe transversality condition (3.99),

ψk(t) =∫ T∗

t(λk

0hx(s, xk(s), uk(s)) + (ψ k(s))′fx(s, xk(s), uk(s))) ds

+∫ T∗

tρk(s)Rx(s, xk(s), u1,k(s)) ds (3.107)

for all t ∈ [t∗0, T∗] (using the standard extension from [tk0, Tk] to [t∗0, T∗]

explained at the end of step 1).Since the total variation ofψ k on [t∗0, T∗] is uniformly bounded for all k

by (3.106) (every absolutely continuous function is of bounded varia-tion on a compact interval (Taylor 1965, 412)), and the sequence {ψk} isalso uniformly bounded as a consequence of (3.107), by Helly’s selec-tion theorem (Taylor 1965, 398) there exists a function ψ such that (asubsequence of) the sequence {ψk} converges to ψ . By taking the limitin (3.107) for k → ∞, with ρk → ρ, one obtains

ψ(t) =∫ T∗

t(Hx(s, x∗(s), u∗(s), ψ(s), λ0) + ρ(s)Rx(s, x∗(s), u1∗(s))) ds,

for all t ∈ [t∗0, T∗], where ψ = ψδ,u. The adjoint equation (3.42) thenfollows.

Transversality Since ωk → ω∗ as k → ∞ (see step 3), by setting

λ1 = − limk→∞

2kλk0K1

−(ωk) ∈ Rn1+ and λ2 = − lim

k→∞2kλk

0K2(ωk) ∈ Rn2 ,

one obtains from the transversality condition (3.98) for k → ∞ that

ψ(t∗0) = −2∑

j=0

λjKjx0 (ω∗) = −Lx0 (ω∗, λ),

Page 145: Optimal Control Theory With Applications in Economics

132 Chapter 3

where L is the small Lagrangian, and λ = (λ0, λ1, λ2). Similarly, for k →∞ the transversality condition (3.99) yields that

ψ(T∗) =2∑

j=0

λjKjxT (ω∗) = LxT (ω∗, λ).

This establishes the transversality conditions (3.43) in proposition 3.5.

Maximality Consider the maximality condition (3.105) for problemPk which holds for k large enough. Since xk ⇒ x∗ and uk → u∗ (a.e. on[t∗0, T∗]) for k → ∞, one obtains from (3.97) for k → ∞ (with u = (u1, u2))that

Hu1 (t, x∗(t), u∗(t), ψ(t), λ0) + ρ(t)Ru1 (t, x∗(t), u1∗(t)) = 0.

Using the definition (3.104) of ρk(t) ≥ 0, which implies that

ρkj (t)Rj(t, xk(t), u1,k(t)) = 0,

and taking the limit for k → ∞ yields the complementary slacknesscondition

ρj(t)Rj(t, x∗(t), u1∗(t)) = 0, ∀ j ∈ {1, . . . , kR}.This complementary slackness condition and the maximality condi-tion (3.100) together imply, for k → ∞, and then ε → 0+ and u → ∞,that

u∗(t) ∈ arg maxu∈U (t,x∗(t))

H(t, x∗(t), u, ψ(t), λ0), ∀ t ∈ [t∗0, T∗].

The maximality conditions (3.44)–(3.46) have thus been establisheda.e. on [t∗0, T∗].Endpoint Optimality The inequalities and complementary slacknesscondition in (3.47) follows immediately from the definition of themultipliers λ0 and λ1. Using the endpoint-optimality conditions (3.101)–(3.102) together with the definitions (3.92)–(3.93) yields

supu∈U (t∗0,x∗

0)H

⎛⎝t∗0, x∗

0, u, −2∑

j=0

λjKjt0

(ω∗), λ0

⎞⎠−

2∑j=0

λjKjt0

(ω∗) = 0

and

Page 146: Optimal Control Theory With Applications in Economics

Optimal Control Theory 133

supu∈U (T∗ ,x∗

T )H

⎛⎝T∗, x∗

T , u,2∑

j=0

λjKjT(ω∗), λ0

⎞⎠+

2∑j=0

λjKjT(ω∗) = 0,

that is, the endpoint-optimality conditions (3.48) and (3.49).

Nontriviality If λ andψ are trivial, then (λ,ψ) must vanish identicallyon [t∗0, T∗]. In particular this means thatλk

j → 0, j ∈ {0, 1, 2} andψk(t) ⇒ 0as k → ∞. For each problem k, λk

0 > 0, so all relations of the maximumprinciple (multiplying with the same positive constant) for (P

′k) can be

renormalized such that

‖λk‖ + supt∈[tk

0,Tk]‖ψk(t)‖ +

∫ Tk

tk0

‖ρk(t)‖2dt = 1, ∀ k ≥ 1.

Thus, taking the limit for k → ∞ yields

‖λ‖ + supt∈[t∗0,T∗]

‖ψ(t)‖ +∫ T∗

t∗0‖ρ(t)‖2dt = 1. (3.108)

From the maximality condition for problem (P′k),

λk0hk

u1 + (ψk)′fu1 +kR∑j=1

ρkj Rj,u1 = 0.

By assumption A5, for any j ∈ {1, . . . , kR} there exists a vector v ∈ Rm1

with ‖v‖ ≤ 1 such that 〈v, Rj,u1〉 ≥ ε whenever Rj ≤ ε. Hence, thereis a positive constant κ > 0 (independent of j) such that ‖Rj,u1‖ ≥ κ

whenever Rj ≤ min{ε, κ}. Omitting a few technical details (see, e.g.,Arutyunov 2000), the fact that λk

0 → 0 and ψk ⇒ 0 as k → ∞ thereforeimplies that limk→∞ ‖ρk(t)‖ = 0 a.e. on [t∗0, T∗]. But this is a contradictionto (3.108), which in turn establishes the nontriviality condition (3.50).

Remark 3.19 (Strengthening Nontriviality) If Condition S holds, it ispossible to strengthen the nontriviality relation (3.50) to

λ0 + ‖ψ(t)‖ > 0, ∀ t ∈ (t∗0, T∗) (3.109)

(see footnote 18). Indeed, if (3.109) is violated, then there exists τ ∈(t∗0, T∗) such that λ0 = 0 and ‖ψ(τ )‖ = 0. By assumption A5 (regularityof the state-control constraint) there exists a function v(t) with valuesin Rm1 such that for some δ > 0,

Page 147: Optimal Control Theory With Applications in Economics

134 Chapter 3

Rj(t, x, u1) = 0 ⇒⟨v(t),

∂Rj

∂u1 (t, x, u1)⟩

≥ δ, j ∈ {1, . . . , kR},

a.e. on [t∗0, T∗]. Hence, if the maximality condition (3.46) is scalar-multiplied with v(t), than (analogous to the earlier analysis) there is apositive constant μ such that 0 ≤ ρ(t) ≤ μ‖ψ(t)‖ a.e. on [t∗0, T∗]. Hence,invoking the adjoint equation (3.42), there is a constant μ > 0 suchthat ‖ψ(t)‖ ≤ μ‖ψ(t)‖ a.e. on [t∗0, T∗]. By the (simplified) Gronwall-Bellman inequality (proposition A.9 and remark A.4), it is ψ(t) ≡ 0, sothat with initial condition ψ(τ ) = 0, it is ψ(t) ≡ 0. Using the endpoint-regularity assumption A4 together with the transversality conditionsin (3.43) therefore yields that λ1 and λ2 both vanish. But this yields acontradiction to the nontriviality condition (3.50), which establishes thestronger condition (3.109). �

Envelope Condition To prove the envelope condition, consider thefollowing autonomous (time-invariant) optimal control problem overthe fixed time interval [t∗0, T∗], which features the control variable u =(u, v) ∈ Rm+1, the state x = (ξ , x) ∈ R1+n, and the endpoint data ω = (ξ0,x0; ξT , xT):

Jk(u, ω) =∫ T∗

t∗0(1 + v(t))h(ξ (t), x(t), u(t)) dt + K0(ω) −→ max

u(·),ωs.t.

x(t) = f (ξ (t), x(t), u(t)), x(t0) = x0, x(T) = xT ,

ξ (t) = 1 + v(t), ξ (t0) = ξ0, ξ (T) = ξT ,

0 ≤ K1(ω),

0 = K2(ω),

0 ≤ R(ξ (t), x(t), u1(t)),

u = (u1, u2), u2(t) ∈ U(t), ∀ t,

t ∈ [t0, T], t0 < T.

(P)

Note first that any admissible (x, u, ω) for (P) is such that the correspond-ing (x, u,ω) with

x(θ ) = (t, x(t)), u(θ ) = (u(t), 0), ω = (t0, x0; T, xT), (3.110)

where θ = ξ−1(t), is admissible for (P) (i.e., the general finite-hori-zon optimal control problem (3.34)–(3.39)). The converse also holds;

Page 148: Optimal Control Theory With Applications in Economics

Optimal Control Theory 135

therefore (x∗, u∗,ω∗) solves (P) if and only if the corresponding (x∗, u∗, ω∗)solves (P).

As a result, the already established conditions of the PMP can be ap-plied to (P). In particular, by proposition 3.5 there exist an absolutecontinuous adjoint variable ψ = (ψ0,ψ) : [t∗0, T∗] → R1+n, an essentiallybounded functionρ : [t∗0, T∗] → RkR , and a nonnegative constantλ0 suchthat (restricting attention to its first component) the adjoint equation

−ψ0(t) = (1 + v∗(t))(λ0hξ (ξ ∗(t), x∗(t), u∗(t)) + 〈ψ(t), fξ (ξ∗(t), x∗(t), u∗(t))〉)

+ ρ(t)Rξ (ξ∗(t), x∗(t), u∗(t))

holds for all t ∈ [t∗0, T∗]. In addition, maximality with respect to v impliesthat

λ0h(ξ ∗(t), x∗(t), u∗(t)) + 〈ψ(t), f (ξ∗(t), x∗(t), u∗(t))〉 +ψ0(t) = 0.

Thus, using the variable transform in (3.110), one obtains

−ψ0(t) = H(t, x∗(t), u∗(t),ψ(t), λ0), ∀ t ∈ [t∗0, T∗],and

−ψ0(t) = Ht(t, x∗(t), u∗(t),ψ(t), λ0)

+ 〈ρ(t), Rt(t, x∗(t), u∗(t))〉, ∀ t ∈ [t∗0, T∗],which establishes the envelope condition (3.51) in proposition 3.5.

This completes the proof of the PMP in proposition 3.5. n

3.7 Supplement 2: The Filippov Existence Theorem

Consider the existence of solutions to the general finite-horizon optimalcontrol problem (3.34)–(3.39) with U(t, x) as in condition B of section 3.4.Let D ⊂ R1+n be a nonempty connected open set, termed domain as insection 2.2.1, and denote its closure by D. As before, define

D0 = {t ∈ R : ∃ (t, x) ∈ D}as the projection of D onto the t-axis, and set

D(t) = {x ∈ Rn : (t, x) ∈ D},for all t ∈ R. For any (t, x) ∈ D, let U(t, x) ⊂ Rm be a nonempty control-constraint set, and let M = ⋃

(t,x)∈D{(t, x)} × U(t, x) be the set of allfeasible (t, x, u) in R1+n+m. Last, let

Page 149: Optimal Control Theory With Applications in Economics

136 Chapter 3

� = {ω ∈ R2(1+n) : K1(ω) ≥ 0, K2(ω) = 0}be the compact set of possible vectors of endpoint dataω = (t0, x0; T, xT),for which always t0 < T. The following result by Filippov (1962) guar-antees the existence of solutions to the finite-horizon optimal controlproblem under a few additional assumptions.

Proposition 3.8 (Filippov Existence Theorem) Let D be bounded, � ⊂D × D be closed. Assume that assumptions A1–A5 are satisfied andthat conditions B and C hold. Suppose further there exist an admissi-ble state-control trajectory (x(t), u(t)), t ∈ [t0, T], and endpoint data ω =(t0, x0; T, xT) such that (t0, x(t0); T, x(T)) ∈ �. If for almost all t thevectograms V(t, x) = f (t, x, U(t, x)), x ∈ D(t), are convex, then the opti-mal control problem (3.34)–(3.39) possesses an admissible solution(x∗, u∗,ω∗).33

Proof By condition B the constraint set U(t, x) is uniformly boundedfor (t, x) ∈ D. Thus, the set M is compact, for D is compact. By continuityof f the vectograms V(t, x) are therefore also compact, and they are allcontained in a certain ball in Rn.

For any admissible (x( · ), u( · ),ω), the endpoint vector (t0, x(t0); T, x(T))lies in�. By the Weierstrass theorem (proposition A.10), the continuousfunction K0(ω) attains its maximum m0 on �. Furthermore, since Mis compact, there exists M > 0 such that |t|, ‖x‖, ‖u‖, ‖f (t, x, u)‖, ‖h(t, x,u)‖ ≤ M for all (t, x, u) ∈ M. Thus, in particular D0 ⊂ [−M, M]. Considernow the augmented vectogram

V(t, x) = {(y0, y) ∈ R1+n : y0 ≤ h(t, x, u), y = f (t, x, u), u ∈ U(t, x)

}.

Note that y ∈ V(t, x) implies that ( − M, y) ∈ V(t, x). In addition, if (y0,y) ∈ V(t, x), then necessarily y0 ≤ M. The following problem is equiva-lent to the optimal control problem (3.34)–(3.39):

J(η,ω) =∫ T

t0

η(t) dt + K0(ω) −→ maxη(·),ω

, (3.111)

(η(t), x(t)) ∈ V(t, x(t)), x(t0) = x0, x(T) = xT , (3.112)

ω = (t0, x0; T, xT) ∈ �, (3.113)

t ∈ [t0, T], t0 < T. (3.114)

33. Given a control set U , the vectogram f (t, x, U ) = {f (t, x, u) : u ∈ U} corresponds to theset of all directions in which the system trajectory can proceed from (t, x).

Page 150: Optimal Control Theory With Applications in Economics

Optimal Control Theory 137

For any endpoint vector ω ∈ �, the choice (η(t),ω) with η(t) ≡ −M isfeasible, that is, it satisfies the constraints (3.111)–(3.114). Note furtherthat any feasible (η(t),ω) satisfies

η(t) ≤ h(t, x(t), u(t)) ≤ M, ∀ t ∈ [t0, T].By the choice of the constants M, m0 it is T − t0 ≤ 2M and K0(ω) ≤ m0,so that

J(η,ω) ≤ J(u,ω) ≤ 2M2 + m0.

If η(t) = h(t, x(t), u(t)) for almost all t ∈ [t0, T], then J(η,ω) = J(u,ω).Let J∗ = sup(u,ω) J(u,ω) and J∗ = sup(η,ω) J(η,ω) be the smallest upper

bounds for the attainable values of the objective functional in the equiva-lent problems (3.34)–(3.39) and (3.111)–(3.114), respectively. Both of thesebounds are finite; they can both be realized by a feasible solution andare in fact equal.

Let {(ηk(t),ωk)}∞k=0, t ∈ Ik = [tk0, Tk], be a maximizing sequence (with

ωk = (tk0, xk

0; Tk, xkT) for k ≥ 0), in the sense that J(ηk ,ωk) → J∗ as k → ∞,

and let {xk(t)}∞k=0, t ∈ Ik, be the corresponding sequence of state trajecto-ries. Since V(t, xk(t)) = f (t, xk(t), U(t, xk(t))), it is ‖xk(t)‖ ≤ M, for all t ∈ Ik

and all k ≥ 0. As a result, the xk are Lipschitz on Ik with the same Lip-schitz constant, and therefore also equicontinuous on Ik. In addition,(t, xk(t)) ∈ D and ωk ∈ �.

The Arzelà-Ascoli theorem (proposition A.5) implies that there exista subsequence {kj}∞j=0, a point ω = (t0, x0; T, xT) ∈ �, and a state trajec-

tory x(t), t ∈ [t0, T], such that xkj (t) → x(t) as j → ∞ uniformly on [t0, T],namely,

limj→∞

(|tkj0 − t0| + |Tkj − T| + sup

t∈R

‖xkj (t) − x(t)‖) = 0,

where xkj (t) and x(t) are extended outside their domains by setting thefunctions constant (e.g., for t ≥ T it is x(t) = xT). Since the sets D and �are closed, it is (t, x(t)) ∈ D and ω ∈ � for all t ∈ [t0, T]. In addition,x is Lipschitz and therefore absolutely continuous. It follows from anappropriate closure theorem (Cesari 1973) that there exists a Lebesgue-integrable function η(t), t ∈ [t0, T], such that

(η(t), x(t)) ∈ V(t, x(t)), ∀ t ∈ [t0, T],and

∫ T

t0

η(t) dt ≥ lim supj→∞

∫ Tkj

tkj0

ηkj (t) dt. (3.115)

Page 151: Optimal Control Theory With Applications in Economics

138 Chapter 3

Note also that by continuity of K0,

limj→∞

K0(ωkj ) = K0(ω). (3.116)

By combining (3.115) and (3.116) one can conclude that J(η,ω) = J∗. Nowrepresent the augmented vectogram V(t, x) in the form

V(t, x) = {(h(t, x, u) − v, f (t, x, u)) : (u, v) ∈ U(t, x) × R+

}.

By the implicit function theorem (proposition A.7), the system ofequations

η(t) = h(t, x(t), u(t)) − v(t),

x(t) = f (t, x(t), u(t)),

has a solution (u(t), v(t)) ∈ U(t, x(t)) × R+, for almost all t ∈ [t0, T]. Thetuple (u(t),ω) and the associated state trajectory x(t) are admissiblefor the optimal control problem (3.34)–(3.39). Since J∗ is the optimalvalue of problem (3.111)–(3.114), the function v(t) vanishes a.e. on[t0, T]. But this implies that J∗ = J(η,ω) = J(u,ω) = J∗, completing theproof. n

When the vectograms V(t, x) are nonconvex for some (t, x) ∈ D, theremay be no solution to the optimal control problem (3.34)–(3.39). Toillustrate this point, Filippov (1962) considered the following example.

Example 3.10 (Nonconvexity of Vectograms) Let f = ( f1, f2) with f1(t, x, u)≡ u2 − (x2)2 and f2(t, x, u) ≡ u, h(t, x, u) ≡ −1, K0(ω) ≡ 0,34 and U(t, x) ≡[0, 1]. Furthermore, assume that D = [0, 2] × [0, 1]2 and� = {(0, (0, 0); T,(1, 0)) : T ∈ [1, 2]}. All assumptions of proposition 3.8 are satisfied,except for the fact that V(t, x) is nonconvex in the right half-plane wherex is nonnegative (figure 3.6). The optimal control problem (3.34)–(3.39)with the above primitives has no solution. Note first that T > 1 forany solution and that any sequence {(uk(t),ωk)}∞k=2 with |uk(t)| = 1 a.e.and 1 < Tk < 1 + 1

k2−1must be a maximizing sequence such that J(uk,ωk)

approaches the optimal value J∗ = sup(u,ω) J(u,ω) as k → ∞. Yet, anysuch maximizing sequence implies a sequence of state trajectories, xk(t),for t ∈ [0, Tk] and k ≥ 2, which converges to the trajectory x(t) = (t, 0),t ∈ [0, 1], which is not feasible for any u(t) with values in [0, 1] becausex2(t) ≡ 0 �= u(t) when |u(t)| = 1. �

34. The Mayer problem with h(t, x, u) ≡ 0 and K0(ω) = T is equivalent.

Page 152: Optimal Control Theory With Applications in Economics

Optimal Control Theory 139

Figure 3.6System with nonconvex vectogram (see example 3.10).

Remark 3.20 (Sliding-Mode Solutions) When the vectograms V(t, x) arenot convex, it is still possible to guarantee the existence of solutionsvia proposition 3.8, provided that generalized solutions (or sliding-modesolutions) of the optimal control problem (3.34)–(3.39) are introduced assolutions to the following modified optimal control problem:

J(p, v,ω) =∫ T

t0

n+2∑l=1

pl(t)h(t, x(t), vl(t)) dt + K0(ω) −→ maxp(·),v(·),ω

,

x(t) =n+2∑l=1

pl(t)f (t, x(t), vl(t)), x(t0) = x0, x(T) = xT ,

K1(ω) ≥ 0, K2(ω) = 0,

(p(t), vl(t)) ∈ �n+2 × U(t, x(t)), l ∈ {1, . . . , n + 2}, ∀t,

t ∈ [t0, T], t0 < T,

where�n+2 = {π = (π1, . . . ,πn+2) ∈ Rn+2+ :

∑n+2l=1 πl = 1} denotes an (n +

2)-simplex, and where p = (p1, . . . , pn+2) and v = (v1, . . . , vn+2) are thecontrol variables, with values in Rn+2 and R(n+2)m, respectively. Themodified optimal control problem, with control u = (p, v), is in fact ageneral finite-horizon optimal control problem, as in section 3.4. Sinceits vectograms (see the proof of proposition 3.8) are convex (as convexhulls of the original V), the Filippov existence theorem guarantees thatthere is a generalized solution to the original problem. The intuition

Page 153: Optimal Control Theory With Applications in Economics

140 Chapter 3

for this sliding-mode solution is that it effectively partitions the systempayoffs and system dynamics into n + 2 pieces, which could be consid-ered as random realizations of systems with different controls (in whichcase the pl are viewed as probabilities). This allows the (original) sys-tem to be steered in the direction of any point in the convex hull of itsvectogram.35 �

3.8 Notes

The presentation of control systems is inspired by Anderson and Moore(1971) and Sontag (1998). Good introductory textbooks on optimalcontrol theory are Warga (1972), Ioffe and Tikhomirov (1979), Seier-stad and Sydsæter (1987), Kamien and Schwartz (1991), Sethi andThompson (2000), and Vinter (2000). Milyutin and Osmolovskii (1998)relate the finite-horizon optimal control problem to the classical cal-culus of variations and also discuss sufficient optimality conditions.The infinite-horizon optimal control problem was discussed by Carlsonet al. (1991), and more recently, with much additional insight, by Aseevand Kryazhimskii (2007) and Aseev (2009). Weber (2005a) considersan optimal advertising model (somewhat similar to the one in exam-ple 3.8) and shows asymptotic convergence of an optimal state trajectorythrough explicit considerations.

The HJB equation was formulated by Bellman (1957), who intro-duced the method of dynamic programming, which, in its discretizedversion, is very useful for computational purposes (Bertsekas 2007).For continuous-time systems, the HJB equation is a partial differentialequation that is often difficult (or effectively impossible) to solve, evennumerically. In addition, very simple, smooth optimal control prob-lems may have value functions that are not continuously differentiable,so for the HJB equation to remain valid it is necessary to use general-ized derivatives, giving rise to nonsmooth analysis (Clarke 1983; Clarkeet al. 1998). Vinter (1988) provides a related discussion about the linkbetween the HJB equation and the PMP.

L. S. Pontryagin developed the maximum principle together withhis students and assistants, V. G. Boltyanskii, R. V. Gamkrelidze,and E. F. Mishchenko (Pontryagin et al. 1962). Boltyanskii (1994) andGamkrelidze (1999) provide separate accounts of how the maximum

35. Carathéodory’s theorem states that for any subset S of Rn, any point in its convex hull

can be represented as a convex combination of n + 1 suitable points of S.

Page 154: Optimal Control Theory With Applications in Economics

Optimal Control Theory 141

principle was discovered. A key issue in its proof was resolved byBoltyanskii (1958), who adapted Weierstrass’s idea of needle variations(which take their name from the shape of the corresponding graphs inthe limit). The formulation here is adapted from the version for a prob-lem with state constraints given by Arutyunov (1999; 2000) based onearlier work by Dubovitskii and Milyutin (1965; 1981), Dikusar andMilyutin (1989), Afanas’ev et al. (1990), and Dmitruk (1993). Dorf-man (1969) provides an early economic interpretation of optimal controltheory in the context of capital investment and profit maximization.Comprehensive accounts of the neighboring field of variational analy-sis are given by Giaquinta and Hildebrandt (1996), Rockafellar and Wets(2004), and Mordukhovich (2006).

The existence of solutions to a general time-optimal control problemwas proved by Filippov (1962) and extended to the more general prob-lem by Cesari (1983, 313).36 As Boltyanski[i] et al. (1998) point out, witha suitable change of variables, a standard optimal control problem canactually be formulated as a time-optimal control problem.

3.9 Exercises

3.1 (Controllability and Golden Rule) (Weber 1997) Consider a modelfor determining a firm’s dynamic policy about what amount u1(t) tospend on advertising and what price u2(t) to charge for its homoge-neous product at any time t ≥ 0. The advertising effect x1 tracks theadvertising expenditure, and the firm’s installed base x2 increases whendemand D(x, u2) is larger than the number of productsβx2 that fail due toobsolescence. The evolution of the state variable x = (x1, x2) is describedby a system of ODEs,

x1 = −α1x1 + u1,

x2 = D(x, u2) −βx2,

where37

D(x, u2) = [1 − x2 − γu2]+ (α2x1 +α3x2)

denotes demand, and α1,α2,α3,β, γ are given positive constants.The control at time t ≥ 0 is u(t) = (u1(t), u2(t)) ∈ U = [0, u1] × [0, 1/γ ].36. A time-optimal control problem is an OCP of the form (3.34)–(3.39), where h = −1and K0 = 0.37. For any z ∈ R, the nonnegative part of z is denoted by [z]+ = max{0, z}.

Page 155: Optimal Control Theory With Applications in Economics

142 Chapter 3

Assume that the initial state x(0) = (x10, x20) ∈ (0, u1/α1) × (0, 1) isknown. The constant u1 > 0 is a given upper limit on advertisingexpenditure.

a. Show that, without any loss of generality, one can restrict attentionto the case where γ = 1.

b. Sketch phase diagrams for the cases where β < α3 and β ≥ α3. (Hint:The interesting controls to consider are those that drive the system at (orclose to) either maximum or minimum velocity (in terms of the right-hand side of the system equation) in the different directions.)

c. Determine a nontrivial compact set C ⊂ (0, 1) × (0, u) of controllablestates, which are such that they can be reached from any other statein that set in finite time. Be sure to show how one could steer thesystem from x to x for any x, x ∈ C. Explain what happens whenx(0) /∈ C.

d. Consider the problem of maximizing the firm’s discounted infinite-horizon profit,

J(u) =∫ ∞

0e−rt(u2(t)D(x(t), u2(t)) − cu1(t)) dt,

where r > 0 is a given discount rate and c > 0 is the unit cost of adver-tising, with respect to bounded measurable controls u = (u1, u2) defineda.e. on R+, with values in the compact control set U . Can you determinea state x = (x1, x2) such that if x(0) = x, it would be optimal for the firmto stay at that state forever? (Hint: Compare the system’s turnpike withthe golden rule; see footnote 21.)

e. Check if the equilibrium state x of part d is contained in set C ofpart c. Based on this, explain intuitively how to find and implement anoptimal policy. Try to verify your policy numerically for an examplewith (α1,α2,α3,β, γ , r, c) = (1, .05, .1, .6, 100, .1, .05).

3.2 (Exploitation of an Exhaustible Resource) Let x(0) = x0 > 0 be theinitial stock of an exhaustible (also known as nonrenewable or deple-table) resource. The utility (to society) of consuming the resourceat the nonnegative rate c(t) at time t ∈ [0, T] (for a given time hori-zon T > 0) is U(c(t)), where U : R+ → R is a utility function that is twicecontinuously differentiable, increasing, and strictly concave on R++. Forany bounded, measurable consumption path c : [0, T] → [0, c], boundedby the maximum extraction rate c > 0, the social welfare is

Page 156: Optimal Control Theory With Applications in Economics

Optimal Control Theory 143

W(c) =∫ T

0e−rtU(c(t)) dt,

where r > 0 is the social discount rate. The stock of the resource evolvesaccording to

x = −c,

provided that the feasibility constraint

c(t) ∈ [0, 1{x(t)≥0}c ]is satisfied a.e. on [0, T], where 1 is the indicator function.

a. Formulate the social planner’s dynamic welfare maximization prob-lem as an optimal control problem.38

b. Using the PMP, provide necessary optimality conditions that need tohold on an optimal state-control path (x∗, c∗). If ψ(t) is the (absolutelycontinuous) adjoint variable in the PMP, denote by ν(t) = ertψ(t), forall t ∈ [0, T], the current-value adjoint variable.

c. Let η = −c Ucc(c)/Uc(c) > 0 be the relative risk aversion (or the elas-ticity of the marginal utility of consumption). Using the conditionsin part b, prove the Hotelling rule39 that ν = rν on [0, T]. Explain itseconomic significance using intuitive arguments. Show also that

cc

= − rη

,

that is, the relative growth rate of consumption on an optimal resourceextraction path is proportional to the ratio of the discount rate and therelative risk aversion.

d. Find a welfare-maximizing policy c∗(t), t ∈ [0, T], when U(c) = ln (c).Compute the corresponding optimal state trajectory x∗(t), t ∈ [0, T].e. Is it possible to write the optimal policy in part (d) in terms of afeedback law μ, in the form c∗(t) = μ(t, x∗(t)), t ∈ [0, T]?f. Redo parts a–e when T → ∞. (For each one it is enough to note anddiscuss the key changes)

3.3 (Exploitation of a Renewable Resource) Let x(t) represent the sizeof an animal population that produces a useful by-product y(t) (e.g.,

38. Instead of letting the control constraint depend on the state, it may be convenient tointroduce a constraint on the state endpoint x(T) because x(T) ≤ x(t) for all t ∈ [0, T].39. See Hotelling (1931).

Page 157: Optimal Control Theory With Applications in Economics

144 Chapter 3

cows produce milk, bees produce honey) at time t ∈ [0, T], where T > 0is a given time horizon. The production of the by-product is governedby the production function y = F(x), where F : R+ → R is continuouslydifferentiable, increasing, strictly concave, and such that F(0) = 0. Afraction of u(t) ∈ [0, 1] of the by-product is extracted at time t, andthe remaining fraction 1 − u(t) is left with the animals, so that theirpopulation evolves according to

x = α(x − x) + (1 − u(t))F(x),

where x ≥ 0 is a given critical mass for the population to be able togrow, and α ≥ r is a given growth rate. Each unit of the by-product thatis extracted can be sold at a profit of 1. A firm is trying to maximize itsprofit,

J(u) =∫ T

0e−rtu(t)F(x(t)) dt,

where r > 0 is a given discount rate, subject to the sustainabilityconstraint

x(0) = x(T) = x0,

where x0 ≤ x, withαx0 + F(x0) > αx, is the given initial size of the animalpopulation.

a. Formulate the firm’s profit-maximization problem as an optimalcontrol problem.

b. Use the PMP to provide necessary optimality conditions. Provide aphase diagram of the Hamiltonian system of ODEs.

c. Characterize the optimal policy u∗(t), t ∈ [0, T], and show that ingeneral it is discontinuous.

d. Describe the optimal policy in words. How is this policy influencedby r and α?

e. For α = 1, r = .1, x0 = 10, x = 12, T = 2, and F(x) = √x, provide an

approximate numerical solution to the optimal control problem in part a,that is, plot the optimal state trajectory x∗(t) and the optimal controltrajectory u∗(t) for t ∈ [0, T].3.4 (Control of a Pandemic) Consider the outbreak of an infectiousdisease, which poses a public health hazard.40 At time t ∈ [0, T], where

40. This exercise is related to Sethi (1977); see also Sethi and Thompson (2000, 295–298).

Page 158: Optimal Control Theory With Applications in Economics

Optimal Control Theory 145

T > 0 is a given intervention horizon, the percentage of infected peo-ple in a given population is x(t) ∈ [0, 1]. Given a public treatmentpolicy u(t) ∈ [0, u] (with u > α a given maximum intervention leveldefined by the capacity of treatment facilities), the disease dynamicsare described by the initial value problem

x = α(1 − x)x − ux, x(0) = x0,

where α > 0 is a known infectivity parameter, and the initial spread ofthe disease x0 ∈ (0, 1) is known. A social planner would like to maximizethe social-welfare functional

J(u) = −∫ T

0e−rt (x(t) + cuκ(t)

)dt,

where r > 0 is the social discount rate, c > 0 denotes the interventioncost, and κ ≥ 1 describes the diseconomies when scaling up publictreatment efforts.

a. Formulate the social planner’s welfare maximization problem as anoptimal control problem.

b. Provide a set of necessary optimality conditions for κ ∈ {1, 2}.c. Characterize the optimal solutions for κ ∈ {1, 2}, and discuss thequalitative difference between the two solutions.

d. Discuss your findings and provide an intuitive description of theoptimal policy that a public official in charge of the health care systemwould understand.

e. What happens as the intervention horizon T goes to infinity?

f. Choose reasonable numerical values for α, c, r, u, x0, and plot theoptimal state and control trajectories, for κ ∈ {1, 2}.3.5 (Behavioral Investment Strategies) Consider an investor, who attime t ≥ 0 consumes at the rate c(t) ∈ (0, c], as long as his capital (bankbalance) x(t) is positive, where c > 0 is a fairly large spending limit. Ifthe investor’s capital becomes zero, consumption c(t) must be zero aswell. Given a discount rate r > 0, the investor’s policy is to maximizehis discounted utility

JT(c) =∫ T

0e−rt U(c(t)) dt,

Page 159: Optimal Control Theory With Applications in Economics

146 Chapter 3

where U(c(t)) = ln (c(t)) is the investor’s time-t utility of consuming atthe rate c(t), and T > 0 is a given planning horizon. The return on investedcapital is α > r, so the initial value problem

x = αx − c, x(0) = x0,

where x0 > 0 is his initial capital, describes the evolution of theinvestor’s bank balance.

Part 1: Optimal Consumption Plan

a. Assuming that the investor’s bank balance stays positive for all t ∈[0, T), formulate the investor’s optimal control problem and determinehis optimal T-horizon consumption plan c∗

T(t; x0), t ∈ [0, T].b. Determine the investor’s optimal infinite-horizon consumption planc∗∞(t; x0), t ∈ R+, when the planning horizon T → ∞.

Part 2: Myopic Receding-Horizon Policy

Assume that the investor is myopic, so that, given a finite planninghorizon T > 0 and an implementation horizon τ ∈ (0, T), he proceeds asfollows. For any implementation period k ≥ 0, the investor implementsthe consumption plan c∗

T(t − kτ ; xk), t ∈ Ik = [kτ , (k + 1)τ ], where xk isthe amount of capital available at time t = kτ . Let c∗

T,τ (t), t ∈ R+, be theinvestor’s resulting (T, τ )-receding-horizon consumption plan.

c. Discuss what practical reason or circumstance might be causing theinvestor to choose a receding-horizon consumption plan c∗

T,τ over anoptimal infinite-horizon consumption plan c∗∞. Draw a picture thatshows how the receding-horizon consumption plan is obtained fromthe infinite-horizon consumption plan.

d. For a given implementation horizon τ > 0, does T → ∞ implythat c∗

T,τ → c∗∞ pointwise? Explain.

e. Is it possible that c∗T,τ becomes periodic (but nonconstant)? If yes, try

to provide an example. If no, explain.

Part 3: Prescriptive Measures

f. If x∗∞(t; x0), t ∈ R+, denotes the state trajectory under the optimalinfinite-horizon consumption plan c∗∞(t; x0), t ∈ R+, find the long-runsteady state x∗∞ = limt→∞ x∗∞(t; x0).

g. Consider the following modified (T, τ )-receding-horizon consumptionplan c∗

T,τ (t), t ∈ R+, which is such that on each time interval Ik,k ≥ 0, the investor implements an optimal endpoint-constrained

Page 160: Optimal Control Theory With Applications in Economics

Optimal Control Theory 147

consumption plan c∗T(t − kτ ; xk) where (for any x0 > 0) the plan c∗

T(·; x0)solves the finite-horizon optimal control problem formulated in part a,subject to the additional state-endpoint constraint x(T) = x∗∞,41 andwhere xk is the amount of capital available at time t = κτ . Compare, inwords, the receding-horizon consumption plans c∗

T,τ and c∗T,τ . Does c∗

T,τfix some of the weaknesses of c∗

T,τ (e.g., those identified in part 2)?Explain.

3.6 (Optimal Consumption with Stochastic Lifetime) Consider an in-vestor, who at time t ≥ 0 consumes at the rate c(t) ∈ [0, c], as long ashis capital (bank balance) x(t) is positive, where c > 0 is a fairly largespending limit. If the investor’s capital becomes zero, consumption c(t)must be zero as well. Given a discount rate r > 0, the investor’s policyis to maximize his expected discounted utility

J(c) = E

[∫ T

0e−rt U(c(t)) dt

],

where U(c(t)) = ln (c(t)) is the investor’s time-t utility of consuming atthe rate c(t), and T ≥ 0 represents the investor’s random remaining life-time. The latter is exponentially distributed with probability densityfunction g(T) = λe−λT for all T ≥ 0, where λ > 0 is a given constant. Thereturn on invested capital is α > r, so the initial value problem

x = αx − c, x(0) = x0,

where x0 > 0 is his initial capital, describes the evolution of theinvestor’s bank balance.

a. Formulate the investor’s optimal consumption problem as a deter-ministic infinite-horizon optimal control problem.

b. Determine the investor’s optimal consumption plan c∗(t), t ≥ 0.

c. Compare your solution in part b to the optimal consumptionplan c∗

T(t), t ∈ [0, T], when T > 0 is perfectly known. How much is theinformation about T worth?

d. Compare your solution in part b to the optimal (deterministic)infinite-horizon consumption plan c∗∞(t), t ≥ 0.

e. What can you learn from parts b–d for your own financial manage-ment?

41. Under which conditions on the parameters is this feasible? x∗∞ is the steady stateobtained in exercise 3.5f; it remains fixed.

Page 161: Optimal Control Theory With Applications in Economics
Page 162: Optimal Control Theory With Applications in Economics

4 Game Theory

Of supreme importance in war isto attack the enemy’s strategy.

—Sun Tzu

4.1 Overview

The strategic interaction generated by the choices available to differentagents is modeled in the form of a game. Agame that evolves over severaltime periods is called a dynamic game, whereas a game that takes placein one single period is termed a static game. Depending on the informa-tion available to each agent, a game may be either of complete or incom-plete information. Figure 4.1 provides an overview of these main typesof games, which are employed for the exposition of the fundamentalconcepts of game theory in section 4.2.

Every game features a set of players, together with their action setsand their payoff (or utility) functions. The vector of all players’ actionsis called a strategy profile or an outcome. A given player’s payoff func-tion (or utility function) maps strategy profiles to real numbers (calledpayoffs). These payoffs represent this player’s preferences over theoutcomes.1

Game theory aims at providing predictions about the possible out-comes of a given game. A Nash equilibrium is a strategy profile (i.e., a

1. In economics, a player’s preferences over the set of outcomes A would define a preorderon that set, i.e., a binary relation " (“is preferred to”) which for any a, a, a ∈ A satisfiesthe following two properties: (1) a " a or a " a (completeness); and (2) a " a and a " aimplies that a " a (transitivity). If a " a and a " a, then the player is indifferent between aand a, which is denoted by a ∼ a. A utility function U : A → R represents these preferencesif for all a, a ∈ A: U(a) ≥ U(a) ⇔ a " a. One can show that as long as the upper contourset U (a) = {a ∈ A : a " a} and the lower contour set L(a) = {a ∈ A : a " a} are closed forall a ∈ A, there exists a continuous utility function that represents the preferences.

Page 163: Optimal Control Theory With Applications in Economics

150 Chapter 4

Timing

Information

Complete Incomplete

Static

Dynamic

Section 4.2.1 Section 4.2.2

Section 4.2.3 Section 4.2.4

Figure 4.1Classification of games.

vector of actions by the different players) such that no player wants toalter his own action unilaterally, given that all other players play accord-ing to this strategy profile. Reasonable outcomes of a game are generallyexpected to at least satisfy the Nash-equilibrium requirement. Yet, asgames become more complex, perhaps because the players’ actions areimplemented dynamically over time, there may exist many Nash equi-libria, some of which are quite implausible. To see this, consider thefollowing simple two-player bargaining game.

Example 4.1 (Ultimatum Bargaining) Two players, Ann and Bert, try todecide how to split a dollar. Ann proposes a fraction of the dollar toBert. Bert then decides whether to accept or reject Ann’s offer. If theoffer is accepted, then the proposed money split is implemented andthe players’ payoffs realize accordingly. Otherwise both players receivea payoff of zero. For example, ifAnn proposes an amount of $0.30 to Bert,then if Bert accepts the offer, Ann obtains $0.70 and Bert gets $0.30. IfBert rejects the offer, then both players get a zero payoff (and the dollardisappears). To apply the concept of Nash equilibrium, one needs tolook for strategy profiles that would not provoke a unilateral deviationby either player. Note first that because Bert moves after Ann, he canobserve her action, that is, her offer a ∈ [0, 1]. Bert’s strategy may be toaccept the offer if and only if it reaches at least a certain threshold α ∈[0, 1]. With Bert’s choice to accept denoted by b = 1 and to reject by b = 0,his strategy can be summarized by

b(a,α) ={

1 if a ≥ α,0 otherwise.

If Ann believes that Bert will implement this threshold strategy, thenher best strategy is to propose a = α. Now, checking for a possible devi-ation by Bert, note that given Ann’s proposed amount α, it is best for

Page 164: Optimal Control Theory With Applications in Economics

Game Theory 151

Bert to accept (because he would otherwise get a zero payoff, which isnever strictly better than α). Therefore, for any α ∈ [0, 1], the strategyprofile consisting of a = α and b(·,α) constitutes a Nash equilibrium.Thus, even for this fairly simple dynamic game there exists a continuumof Nash equilibria, one for each α ∈ [0, 1]. This analysis is not particu-larly useful for generating a prediction for the game’s outcome becauseany split of the dollar between Ann and Bert can be justified by one ofthe Nash equilibria. A way out of this dilemma is to refine the conceptof Nash equilibrium by imposing an additional requirement. For this,note that because Ann moves before Bert, it is clearly optimal for Bert toaccept any amount that Ann offers, even if it is zero, in which case Bertis indifferent between accepting or not.2 The reason Ann was willing tooffer a positive amount (corresponding to positiveα) is that she believedBert’s threat of following through with his threshold strategy. But thisthreat is not credible, or as game theorists like to say, Bert’s thresholdstrategy is not subgame-perfect, because in the subgame that Bert plays(with himself ) after Ann has made her offer, it is best for Bert to acceptany offer. The concept of a subgame-perfect Nash equilibrium requiresthat a strategy profile induce a Nash equilibrium in each subgame. Pro-vided that the game ends in finite time, one can obtain an equilibriumpath by backward induction, starting at the end of the horizon.3 The onlysubgame-perfect Nash equilibrium of this bargaining game is thereforefor Ann to propose zero and for Bert to accept her ruthless offer. �

The preceding example illustrates the need for equilibrium refinementsin dynamic games, even when both players have perfect knowledgeabout each other. But the information that players have about each othermight be quite incomplete. For example, when a seller faces multiplepotential buyers (or agents) with different valuations for an item, thenthese valuations are generally not known to her. They belong to theagents’ private information that the seller may try to extract. Assumethat each buyer i’s valuation for the item is given by a nonnegative num-ber θ i, and suppose that the seller uses some type of auction mechanismto sell the item. Then buyer i’s bid bi for the item will depend on hisprivate valuation θ i. In other words, from the seller’s perspective and

2. A standard assumption in game theory is that in the case of indifference, a player doeswhat the game theorist wants him to do, which is usually to play an equilibrium strategy.3. The intuition is similar to the logic of the Hamilton-Jacobi-Bellman equation (seechapter 3), which contains its boundary condition at the end of the horizon and is there-fore naturally solved from the end of the horizon, especially when discretizing the optimalcontrol problem to a finite number of periods (see, e.g., Bertsekas 2007).

Page 165: Optimal Control Theory With Applications in Economics

152 Chapter 4

the perspective of any other bidder j �= i, buyer i’s strategy becomesa function of θ i, so the corresponding Bayes-Nash equilibrium (BNE) ofgames with incomplete information requires the specification of eachplayer’s actions as a function of his private information. The followingexample illustrates this notion.

Example 4.2 (Second-Price Auction) A seller tries to auction off an itemto one of N ≥ 2 agents with unknown private valuations θ1, . . . , θN ∈[0, 1].4 Assume that the agents’ valuations are, from the perspectiveof both the seller and the agents, independently and identically dis-tributed. Agents submit their bids simultaneously and the highestbidder wins the item.5 The winning bidder then pays an amount tothe seller that is equal to the second-highest bid (corresponding to thehighest losing bid). All other bidders obtain a zero payoff. The questionis now, What would be a symmetric Bayes-Nash-equilibrium biddingstrategy such that each bidder i ∈ {1, . . . , N} submits a bid bi = β(θ i),where β( · ) is the bidding function to be determined? If bidder i submitsa bid bi strictly less than his private value θ i, then there is a chance thatsome bidder j submits a bid bj ∈ (bi, θ i), in which case bidder i wouldhave been better off bidding his true valuation. Similarly, if bidder isubmits a bid bi strictly greater than θ i, then there is a chance that someagent j submits the highest competing bid bj ∈ (θ i, bi), which would leadbidder i to win with a payment above his valuation, resulting in a nega-tive payoff. Again, bidder i prefers to bid his true valuation θ i. Considernow the situation in which all bidders other than bidder i use the bid-ding function β(θ ) ≡ θ to determine their strategies. Then, by the logicjust presented, it is best for bidder i to also select bi = θ i. Therefore,the bidding function bi = β(θ i) = θ i, for all i ∈ {1, . . . , N}, determines a(symmetric) Bayes-Nash equilibrium of the second-price auction. Notethat this mechanism leads to full information revelation in the sensethat all agents directly reveal their private information to the seller. Theprice the seller pays for this information (relative to knowing every-thing for free) is equal to the difference between the highest and thesecond-highest bid. �

4. The problem of selling an item to a single potential buyer is considered in chapter 5.5. In the case where several bidders submit the same winning bid, the item is allocatedrandomly among them, at equal odds.

Page 166: Optimal Control Theory With Applications in Economics

Game Theory 153

The problem of equilibrium multiplicity, discussed in example 4.1,is compounded in dynamic games where information is incomplete.Subgame perfection as refinement has very little bite, because (proper)subgames can start only at points where all players have perfect infor-mation about the status of the game. When player A possesses privateinformation, then player B moving after A does not know which type ofplayer A he is dealing with and therefore tries to infer information fromthe history of the game about A’s type. The multiplicity arises becausewhat happens on the equilibrium path may depend strongly on whathappens off the equilibrium path, that is, what the players expect wouldhappen if they deviated from their equilibrium actions. The notionof equilibrium path becomes clear when reconsidering the ultimatumgame in example 4.1. In that game, Bert’s strategy is a complete contin-gent plan, which specifies his action for every possible offer that Anncould make. On the equilibrium path she makes only one offer, and Bertwill respond to precisely that offer. All of his other possible responses toAnn’s other possible offers are off the equilibrium path. Thus, in terms ofNash equilibrium, Bert’s anticipated off-equilibrium behavior may changeAnn’s in-equilibrium behavior. Indeed, when Bert threatens to use a (notsubgame-perfect) threshold strategy, Ann’s best response is to offer himexactly that threshold, because Ann then fears that Bert would rejectlower offers. These off-equilibrium-path beliefs usually drive the mul-tiplicity of equilibria in dynamic games of incomplete information, ascan be seen in the following example.

Example 4.3 (Job Market Signaling) (Spence 1973) Consider a job appli-cant whose productivity in terms of his profit-generation ability is eitherθL = 1 or θH = 2. While the information about his productivity typeθ ∈ {θL, θH} is private, the worker does have the option to acquire a pub-licly observable education level e at the cost C(e, θ ) = e/θ , which has noinfluence on his productivity. With complete information, a firm consid-ering to hire the applicant would offer a wage w(θ ) = θ , compensatingthe worker exactly for his productivity.6 If the two types choose thesame education level, so that the firm cannot distinguish between them,then the firm offers a wage of θ = (θL + θH)/2 = 3/2, corresponding to

6. This implicitly assumes that enough firms are competing for workers, so that noneis able to offer wages below a worker’s expected productivity and still expect to hireworkers.

Page 167: Optimal Control Theory With Applications in Economics

154 Chapter 4

the worker’s expected productivity. The low-type worker would be will-ing to acquire an education level of up to eL = 1/2 to attain this poolingequilibrium, whereas the high-type worker would be willing to get aneducation of up to eH = 1 in order to end up at a separating equilibriumwith different education levels. Note that in a separating equilibrium,the low-type worker would never acquire any education. The high typehas to acquire at least e

¯H = 1/2 in order to discourage the low type fromtrying to pool by matching the high type’s education. What actually hap-pens in equilibrium depends decisively on the firm’s interpretation ofout-of-equilibrium actions. For example, it would be legitimate for thefirm to interpret any worker with an education level off the equilibriumpath as a low type. With these somewhat extreme out-of-equilibriumbeliefs, it is possible to construct any pooling equilibrium where bothworker types acquire education level e∗ ∈ [0, eL] as well as any separat-ing equilibrium where the workers acquire the education levels e∗

L = 0and e∗

H ∈ [e¯H , eH], respectively. �

Section 4.2 provides an introduction to games in discrete time withcomplete and incomplete information. Based on this body of classicalgame theory, continuous-time differential games are discussed in sec-tion 4.3. In a differential game all players’ strategies are functions ofcontinuous time and their reactions may be instantaneous, unless thereis an imperfect information structure which, for example, could includea delay in player i’s noticing player j’s actions. To deal with excessiveequilibrium multiplicity in dynamic games where strategies can dependon the full history of past actions and events,7 when dealing with differ-ential games one often requires equilibria to satisfy a Markov propertyin the sense that current actions can depend only on current states or,in other words, that all the relevant history of the game is included inthe current state. A further simplification is to assume that all playerscondition their strategies on time rather than on the state, which leadsto open-loop strategies. By contrast, a closed-loop strategy can be con-ditioned on all the available information, which usually includes thestate variable. The refinement of subgame perfection in the context ofdifferential games leads to the concept of a Markov-perfect equilibrium.The discussion also includes some examples of non-Markovian equilib-ria using, for example, trigger strategies, where a certain event (such as

7. See, e.g., the folk theorems in proposition 4.5 and remark 4.4 about equilibria ininfinitely repeated games.

Page 168: Optimal Control Theory With Applications in Economics

Game Theory 155

a player’s deviation from the equilibrium path) would cause a regimeshift in the players’ behavior.

4.2 Fundamental Concepts8

4.2.1 Static Games of Complete InformationIn the absence of any time dependence, a game � in normal form is fullyspecified by a player set N , an action set Ai for each player i ∈ N , and apayoff function Ui : A → R for each player i ∈ N , where A = ∏

i∈N Ai isthe set of (pure) strategy profiles a = (ai)i∈N . Thus,

� = (N , A, {Ui( · )}i∈N ) (4.1)

fully describes a static game of complete information. The following twobasic assumptions govern the players’ interaction in the game �:

• Rationality Each player i ∈ N chooses his action ai in the normal-formgame � so as to maximize his payoff Ui(ai, a−i) given the other players’strategy profile a−i = (aj)j∈N \{i}.• Common knowledge Each player i ∈ N knows the rules of the game �(all of its elements) and knows that the other players know the rules ofthe game and that they know that he knows the rules of the game andthat he knows that they know that he knows, and so on.9

The following simple but very important example illustrates the nota-tion and describes how one might reach a prediction about the outcomeof a game, both in terms of payoffs and in terms of the eventually imple-mented strategy profile.

Example 4.4 (Prisoner’s Dilemma) Let N = {1, 2} be a set of two prison-ers that are under suspicion of having committed a crime together.During their separate but simultaneous interrogations each prisonercan choose either to cooperate (C), that is, to deny all charges, or todefect (D), that is, to admit all charges, incriminating the other pris-oner. Each prisoner i ∈ N has an action set of the form Ai = {C, D},so A = {C, D} × {C, D} = {(C, C), (C, D), (D, C), (D, D)} is the set of allstrategy profiles. The prisoners’ payoffs are specified as follows:

8. Readers already familiar with the basics of game theory can skip directly to section 4.3without loss in continuity.9. When this generally infinite belief hierarchy is interrupted at a finite level, then the gamewill be of bounded rationality, which is beyond the scope of this book. Geanakoplos (1992)summarized the interesting consequences of the common-knowledge assumption ineconomics.

Page 169: Optimal Control Theory With Applications in Economics

156 Chapter 4

Prisoner 2

C D

Prisoner 1 C (1, 1) (−1, 2)

D (2, −1) (0, 0)

In this payoff matrix, the entry ( − 1, 2) for the strategy profile (C, D)means that U1(C, D) = −1 and U2(C, D) = 2. The other entries haveanalogous interpretations. It is easy to see that when fixing, say,prisoner 2’s action, prisoner 1 is better off choosing D instead of C,because u1(D, a2) > u1(C, a2), for all a2 ∈ A2. By symmetry prisoner 2is also always best off to choose D, so the only reasonable predictionabout the outcome of this game is that both prisoners will choose D,leading to the payoff vector (0, 0). This famous game is usually referredto as prisoner’s dilemma because both players, caught in their strategicinterdependence, end up with a payoff vector worse than their sociallyoptimal payoff vector of (1, 1) when they both cooperate. The key reasonfor the socially suboptimal result of this game is that when one prisonerdecides to cooperate, the other prisoner invariably prefers to defect. �

As shown in example 4.4, it is useful to decompose a strategyprofile a ∈ A into player i’s action ai and all other players’ strategyprofile a−i = (aj)j∈N \{i}, so it is customary in game theory to write

a = (ai, a−i) = (aj)j∈N

for any i ∈ N . Using this notation, one can introduce John Nash’s notionof an equilibrium as the leading prediction about the outcome of thegame �. A strategy profile a∗ = (ai∗)i∈N is a Nash equilibrium (NE) if

∀ i ∈ N : Ui(ai∗, a−i∗) ≥ Ui(ai, a−i∗), ∀ ai ∈ Ai. (4.2)

This means that at a Nash-equilibrium strategy profile a∗ each player imaximizes his own payoff given that all other players implement thestrategy profile a−i∗. In other words, for any player i there is no strat-egy ai ∈ Ai such that he strictly prefers strategy profile (ai, a−i∗) to theNash-equilibrium strategy profile a∗ = (ai∗, a−i∗). Another common wayto express the meaning of a Nash equilibrium is to recognize that (4.2)is equivalent to the following simple statement:

No player wants to unilaterally deviate from a Nash-equilibrium strat-egy profile.

Page 170: Optimal Control Theory With Applications in Economics

Game Theory 157

The following classic example demonstrates that Nash equilibria do nothave to be unique.

Example 4.5 (Battle of the Sexes) Consider two players, Ann and Bert,each of whom can choose between two activities, “go dancing” (D) or“go to the movies” (M). Their payoffs are as follows:

Bert

D M

Ann D (2, 1) (0, 0)

M (0, 0) (1, 2)

It is straightforward to verify that any strategy profile in which bothplayers choose the same action is a Nash equilibrium of this coordinationgame. If players are allowed to randomize over their actions, an addi-tional Nash equilibrium in mixed strategies can be identified. For eachplayer i ∈ N = {Ann, Bert}, let

�(Ai) = {(π , 1 −π ) : π = Prob(Player i plays D) ∈ [0, 1]}denote an augmented strategy space such that pi = (pi

D, piM) ∈ �(Ai)

represents the probability distribution with which player i chooses thedifferent actions in the augmented game

�� =⎛⎝N ,

∏i∈N

�(Ai),

⎧⎨⎩Ui :

∏j∈N

�(Aj) → R

⎫⎬⎭

i∈N

⎞⎠ ,

where player i’s expected payoff,

Ui(pi, p−i) = piDp−i

D Ui(D, D) + piMp−i

M Ui(M, M)

= 3piDp−i

D + (1 − piD − p−i

D )Ui(M, M),

takes into account the payoffs shown in the matrix and the fact thatpi

M = 1 − piD. Given the other player’s strategy p−i, player i’s best-response

correspondence is

BRi(p−i) = arg maxpi∈�(Ai)

Ui(pi, p−i) =⎧⎨⎩

{0} if 3 p−iD − Ui(M, M) < 0,

[0,1] if 3 p−iD − Ui(M, M) = 0,

{1} otherwise.

Page 171: Optimal Control Theory With Applications in Economics

158 Chapter 4

In the context of Ann and Bert’s coordination game this means thatplayer i’s best response is to do what the other player is sufficientlylikely to do, unless the other player makes player i indifferent overall possible probability distributions in �(Ai). By definition, a Nashequilibrium p∗ = (pi∗, p−i∗) of �� is such that

pi∗ ∈ BRi(p−i∗), ∀ i ∈ N .

Continuing the previous argument, if both players make each otherindifferent, by choosing pAnn

D = UBert(M, M)/3 = 2/3 and pBertD =

UAnn(M, M)/3 = 1/3, one obtains the Nash equilibrium p∗ = (pAnn∗, pBert∗)of ��, with

pAnn∗ = (2/3, 1/3) and pBert∗ = (1/3, 2/3).

Both players’ corresponding Nash-equilibrium payoffs, UAnn(p∗) =UBert(p∗) = 2/3, are less than the equilibrium payoffs under either of thetwo pure-strategy Nash equilibria of �. Note also that the latter equi-libria reappear as Nash equilibria ((1, 0), (1, 0)) and ((0, 1), (0, 1)) of theaugmented game ��, which therefore has three (mixed-strategy) Nashequilibria in total. �

Based on the insights obtained in example 4.5, it is useful to extend thedefinition of a Nash equilibrium to allow for players’ randomizationsover actions. For this, let�(Ai) be the space of all probability measures Pi

defined on the standard probability space (Ai, F i, P), where F i is anappropriate σ -algebra over Ai,10 such that∫

AidPi(ai) = 1.

A mixed-strategy Nash equilibrium of � is a (pure-strategy) Nash equilib-rium of the augmented game

�� =⎛⎝N ,

∏i∈N

�(Ai),

⎧⎨⎩Ui :

∏j∈N

�(Aj) → R

⎫⎬⎭

i∈N

⎞⎠ ,

where for any P ∈ ∏i∈N �(Ai) player i’s expected payoff is

10. It is implicitly assumed that player i’s action set Ai is closed under the operation ofunion, intersection, and difference (i.e., it forms a ring). For details on measure theory andon how to slightly relax this assumption (to semirings), see, e.g., Kirillov and Gvishiani(1982).

Page 172: Optimal Control Theory With Applications in Economics

Game Theory 159

Ui(p) =∫

AUi(a) dP(a).

It is clear that the distinction between mixed-strategy and pure-strategyNash equilibria depends only on the viewpoint, since any mixed-strategy Nash equilibrium is defined in terms of a pure-strategyequilibrium of a suitably augmented game.

In actual games, it is sometimes difficult to interpret the meaning ofa mixed-strategy equilibrium that places positive probability mass onmore than one action, especially if the game is only played once. Yet,the main reason for introducing the possibility of randomization is toconvexify the players’ action spaces in order to guarantee the existence ofa Nash equilibrium, at least in mixed strategies. The following exampleshows that the existence of a Nash equilibrium in pure strategies cannotbe taken for granted.

Example 4.6 (Matching Pennies) Two agents, 1 and 2, play a game whereeach player chooses simultaneously one side of a penny, either “heads”(H) or “tails” (T). Player 1 wins both pennies if the players have chosenmatching sides; otherwise player 2 wins the pennies. The net payoffs inthis zero-sum game are as follows:

Player 2

H T

Player 1 H (1, −1) (−1, 1)

T (−1, 1) (1, −1)

It is easy to see that there does exist a pure-strategy equilibrium of thismatching-pennies game. Indeed, player 1 would always like to imitateplayer 2’s strategy, whereas player 2 would then want to deviate andchoose a different side, so unilateral deviations cannot be excluded fromany strategy profile in A = {(H, T)} × {(H, T)}. The only mixed-strategyequilibrium is such that both players randomize so as to choose H and Twith equal probabilities. �

Proposition 4.1 (Existence of a Nash Equilibrium) Let

� =(

N , A =∏i∈N

Ai, {Ui( · )}i∈N

)

Page 173: Optimal Control Theory With Applications in Economics

160 Chapter 4

be a normal-form game, where the player set N �= ∅ is finite and eachaction set Ai �= ∅ is finite-dimensional, convex, and compact for i ∈ N .If in addition each player i’s payoff function Ui(ai, a−i) is continuousin a = (ai, a−i) and quasi-concave in ai, then� has a (pure-strategy) Nashequilibrium.

Proof Let BR : A ⇒ A be the set-valued best-response correspondencefor �, defined by

BR(a) = (BRi(a−i))i∈N , ∀ a ∈ A.

A Nash equilibrium a∗ ∈ A is by definition a fixed point of BR( · ), thatis, it is such that

a∗ ∈ BR(a∗).

Since all players’ payoff functions are by assumption continuous andtheir (nonempty) action sets compact, by the Weierstrass theorem(proposition A.10 in appendix A) the image BR(a) is nonempty for anystrategy profile a ∈ A. The Berge maximum theorem (proposition A.15)further implies that BR(a) is compact-valued and upper semicontin-uous. Last, Ui(ai, a−i) is quasi-concave in ai, so for any player i ∈ N ,it is

ai, ai ∈ BRi(a−i) ⇒ θ ai + (1 − θ )ai ∈ BRi(a−i), ∀ θ ∈ (0, 1),

whence for any a ∈ A:

a, a ∈ BR(a) ⇒ θ a + (1 − θ )a ∈ BR(a), ∀ θ ∈ (0, 1).

Thus, BR( · ) is an upper semicontinuous mapping with convex andcompact values in 2A, where the set of strategy profiles is convex andcompact. By the Kakutani fixed-point theorem (proposition A.17), thereexists a point a∗ ∈ A such that a∗ ∈ BR(a∗). n

Corollary 4.1 (Existence of a Mixed-Strategy Nash Equilibrium) Anynormal-form game � = (

N , A = ∏i∈N Ai, {Ui : A → R}i∈N

)with a fi-

nite player set N and a finite set of strategy profiles A has a mixed-strategy Nash equilibrium.

Proof The existence of a mixed-strategy Nash equilibrium of thenormal-form game � is by definition equivalent to the existence of a(pure-strategy) Nash equilibrium of the augmented game ��, whichsatisfies the assumptions of proposition 4.1. n

Page 174: Optimal Control Theory With Applications in Economics

Game Theory 161

Example 4.7 (Cournot Oligopoly) Consider N firms, each of which sellsa differentiated product on a common market. Given the other firms’strategy profile q−i, each firm i ∈ N = {1, . . . , M} simultaneouslychooses its output quantity qi ≥ 0 so as to maximize its profits

�i(qi, q−i) = Pi(qi, q−i)qi − Ci(qi),

where the inverse demand curve Pi(qi, q−i) (continuous, decreasing inits first argument) describes the nonnegative price firm i can obtainfor its product as a function of all firms’ output decisions, and Ci(qi)is a continuously differentiable, increasing, convex cost function suchthat Ci

qi (0) = 0 and the Inada conditions Ciqi (0) = 0 and Ci(∞) = ∞ are

satisfied.11 If firm i’s revenue Ri(qi, q−i) = Pi(qi, q−i)qi is quasi-concavein qi for all qi ≥ 0 and bounded for all (qi, q−i), then firm i’s profit func-tion �i(qi, q−i) is quasi-concave and thus by proposition 4.1 a Nashequilibrium of this Cournot oligopoly game does exist. The specialcase where Pi(qi, q−i) = 1 − Q with Q = q1 + · · · + qN and Ci(qi) = cqi

with c ∈ (0, 1) a constant marginal cost parameter yields that q∗ =(qi∗)i∈N , with qi∗ ≡ (1 − c)/(N + 1), is the unique Nash equilibrium ofthis game.12 �

Remark 4.1 (Wilson’s Oddness Result) Wilson (1971) showed that almostany finite game (as in corollary 4.1) has an odd number of mixed-strategyNash equilibria, in the sense that if a given game happens to have aneven number of mixed-strategy Nash equilibria, then a small randomperturbation of the players’ payoffs will produce with probability 1 anew game with an odd number of mixed-strategy Nash equilibria. Forinstance, after the two pure-strategy Nash equilibria in the battle-of-the-sexes game discussed in example 4.5 have been determined, theoddness theorem suggests that unless the game is singular there existsanother Nash equilibrium, in mixed strategies. �

4.2.2 Static Games of Incomplete InformationIn many real-world games the publicly available information aboutplayers’ preferences may be limited. For example, a bidder in an auction

11. Inada conditions, such as strict monotonicity and derivatives of either zero or infin-ity toward the interval boundaries, ensure that solutions to optimization problems areattained at the interior of their domains. From the Inada conditions and the properties ofthe revenue function, it is possible to conclude that firm i can restrict its attention to theconvex compact action set Ai = [0, qi] for some appropriate positive constant qi , i ∈ N .12. The Banach fixed-point theorem (proposition A.3) can sometimes be used to guaranteethe existence of a unique Nash equilibrium (see example A.4).

Page 175: Optimal Control Theory With Applications in Economics

162 Chapter 4

may not know the other bidders’ payoff functions or budget constraints.In fact, he might not even be sure how many other bidders there are inthe first place. In such games of incomplete information, one contin-ues to maintain the assumption of rationality and common knowledgeas formulated in section 4.2.1. Yet, it is necessary relax the degree towhich information about the elements of the game (the players, theiraction sets, and their payoff functions) is available. For this, it is conve-nient to encapsulate all the not commonly available information abouta player i as a point θ i in a type space �i, which is referred to as theplayer’s type. In general, the type space could be infinite-dimensional,but in most practically relevant situations the type space can be takenas a subset of a finite-dimensional Euclidean space. Before the gamestarts, ex ante, all players assume that the types are jointly distributed,with the cumulative distribution function (cdf) F(θ ) = Prob(θ ≤ θ ),13

where

θ = (θ i)i∈N ∈ � =∏i∈N

�i.

Assuming that each player θ i observes his own type, his beliefs about theother players’ types are given by the conditional distribution Fi(θ−i) =F(θ−i|θ i), which is obtained using Bayesian updating. Because of theplayers’ use of information to perform Bayesian updates, a game withincomplete information is also commonly referred to as a Bayesian game.A Bayesian game in normal form is a collection

�B = (N , A,�, {Ui : A ×� → R}i∈N , F : � → [0, 1]). (4.3)

Note that for simplicity the influence of the types is limited to the players’payoff functions, even though it could in principle also figure in theexplicit description of the player set and the action sets. A game withmore general dependences can be rewritten in the current form, forinstance, by including an additional player (referred to as Nature) whosetype determines all the remaining components of the game and whose

13. In principle, it is possible to assume that each player i has a different joint distri-bution Fi(θ ) of types in mind (given that this difference of opinion is publicly known;otherwise prior beliefs over prior beliefs are needed). However, since the subsequent argu-ments remain essentially unaffected, this added complexity is dropped. Aumann (1976)showed that in the absence of any strategic considerations, individuals sharing statisticalinformation are in fact not able to disagree about their prior beliefs. Hence, assuming thatall agents have the same beliefs about the joint type distribution amounts to requiringthat all of them have initial access to the same public information and that any additionalinformation that is obtained privately by agent i is part of his type θ i .

Page 176: Optimal Control Theory With Applications in Economics

Game Theory 163

action set is a singleton. The concept of a Nash equilibrium generalizesto Bayesian games as follows. ABayes-Nash equilibrium of�B is a strategyprofile α∗ = (αi∗)i∈N with αi∗ : �i → Ai such that

αi∗(θ i) ∈ arg maxai∈Ai

Ui(ai,α−i∗, θ i), ∀ θ i ∈ �i, ∀ i ∈ N ,

where player i’s expected utility conditional on his own type is givenby

Ui(ai,α−i∗, θ i) = E[Ui(ai,α−i∗(θ−i), θ i, θ−i)|θ i]

=∫�−i

Ui(ai,α−i∗(θ−i), θ i, θ−i) dF(θ−i|θ i),

where α−i∗(θ−i) = (αj(θ j))j∈N \{i} and F(θ−i|θ i) denotes a conditionaldistribution function.

Example 4.8 (First-Price Auction) Consider N ≥ 2 bidders participatingin a first-price auction for a certain good. Each bidder i ∈ N = {1, . . . , N}has a private valuation θ i ∈ �i = [0, 1], and it is assumed that thegood cannot be resold. Each bidder believes that the type vector θ =(θ 1, . . . , θN) ∈ [0, 1]N is distributed according to the cdf F(θ ) = GN(θ i),where G : [0, 1] → [0, 1] is a continuous, increasing cdf; that is, thetypes can be viewed as an independently and identically distributedsample of the distribution G. Each bidder i simultaneously chooses abid ai ∈ Ai = [0, 1]. Given the other bidders’ strategy profile α−i(θ−i),player i’s best-response correspondence is

BRi(θ i) = arg maxai∈[0,1]

{(θ i − ai)Prob( maxj∈N \{i}

{αj(θ j)} ≤ ai|θ i)},

for all θ i ∈ [0, 1]. Provided that each bidder j �= i follows the same sym-metric bidding strategy αj(θ j) ≡ β(θ j), which is increasing in his type θ j,it is possible to invert β and obtain that

BRi(θ i) = arg maxai∈[0,1]

{(θ i − ai)GN−1(β−1(ai))}, ∀ θ i ∈ [0, 1].

A necessary optimality condition for bidder i’s bid ai∗ to lie in its bestresponse BRi(θ i) is that

0 = ddai

∣∣∣∣ai=ai∗

{(θ i − ai)GN−1(β−1(ai))}

= − GN−1(β−1(ai∗)) + (θ i − ai∗)(N − 1)GN−2(β−1(ai∗))β(β−1(ai∗))/g(β−1(ai∗))

, (4.4)

Page 177: Optimal Control Theory With Applications in Economics

164 Chapter 4

using the inverse function theorem (proposition A.8), and where g isthe positive probability density corresponding to the distribution G. Ina symmetric Nash equilibrium, bidder i’s bid ai∗ will be equal to β(θ i),so that β−1(ai∗) = β−1(β(θ i)) = θ i, and relation (4.4) becomes

0 = −GN−1(θ i) + (N − 1)(θ i −β(θ i))GN−2(θ i)

β(θ i)/g(θ i), ∀ θ i ∈ [0, 1],

or equivalently, a linear ordinary differential equation (ODE) of the form

β(θ i) + (N − 1)g(θ i)G(θ i)

β(θ i) = (N − 1)θ ig(θ i)G(θ i)

, ∀ θ i ∈ [0, 1].

With the insight that the only amount a bidder with zero valuation canbid in equilibrium is zero, which implies the initial condition β(0) = 0,the Cauchy formula yields the equilibrium bidding function14

β(θ i) = θ i −∫ θ i

0 GN−1(ϑ) dϑGN−1(θ i)

, ∀ θ i ∈ [0, 1], (4.5)

which fully describes the symmetric Bayes-Nash equilibrium, withαi∗(θ i) ≡ β(θ i). Note also that β(θ i) is an increasing function, jus-tifying ex post the initial monotonicity assumption that led to thissolution. �

Example 4.9 (Cournot Oligopoly with Cost Uncertainty) Consider N ≥ 2identical firms selling a homogeneous good in a common market.Each firm i ∈ N = {1, . . . , N} decides about its output qi ≥ 0, whichcosts C(qi, θ i) = θ iqi to produce, where the marginal cost parame-ter θ i ∈ [0, 1] belongs to the firm’s private information. The vector θ =(θ1, . . . , θN) follows ex ante the symmetric joint distribution F(θ ). Thus,each firm i, by observing its own cost information, is generally ableto infer some information about the other firms’ costs as well. Themarket price P(Q) = 1 − Q depends only on the firms’ aggregate out-put Q = q1 + · · · + qN . Looking for a symmetric Bayes-Nash equilibrium,

14. Specifically, the Cauchy formula (2.9) gives

β(θ i) =(∫ θ i

0(N − 1)

ϑg(ϑ)G(ϑ)

exp[∫ ϑ

0(N − 1)

g(s)G(s)

ds]

)exp

[−∫ θ i

0(N − 1)

g(ϑ)G(ϑ)

]

=(∫ θ i

0(N − 1)ϑg(ϑ)GN−2(ϑ)dϑ

)(1

GN−1(θ i)

)=∫ θ i

0 ϑ(

ddϑ GN−1(ϑ)

)dϑ

GN−1(θ i),

for all θ i ∈ [0, 1], which after an integration by parts simplifies to the expression in (4.5).

Page 178: Optimal Control Theory With Applications in Economics

Game Theory 165

given that any firm j �= i follows the strategy qj∗ = α0(θ j), firm i’sprofit-maximization problem,

α0(θ i) ∈ arg maxqi∈[0,1]

∫[0,1]N−1

⎛⎝1 − qi −

∑j∈N \{i}

α0(θ j) − θ i

⎞⎠ qidF(θ−i|θ i)

determines the missing function α0 : [0, 1] → R+. Indeed, the first-ordernecessary optimality condition yields that

0 = 1 − 2α0(θ i) − (N − 1)α0(θ i) − θ i, ∀ θ i ∈ [0, 1],as long as α0(θ i) ∈ (0, 1), where

α0(θ i) =∫

[0,1]N−1α0(θ j) dF(θ−i|θ i), ∀ θ i ∈ [0, 1], ∀ j ∈ N \ {i}.

Combining the last two relations, and using the abbreviation

θ = E[θ i] =∫

[0,1]Nθ idF(θ ),

for all i ∈ N , one obtains

α0(θ i) ≡ (1 − θ )

(1 − E[θ j|θ i]

1 − θ− N − 1

N + 1

).

This yields the interior solution

α0(θ i) = 1 − θ

2

(1 − θ i

1 − θ− N − 1

2

(1 − E[θ j|θ i]

1 − θ− N − 1

N + 1

)), ∀ θ i ∈ [0, 1],

provided that α0(θ i) ∈ (0, 1) for almost all θ i ∈ [0, 1], that is, at an interiorsolution.15 The Bayes-Nash equilibrium α∗ with αi∗ = α0 specializes tothe symmetric Nash equilibrium in the corresponding game with com-plete information, when c = θ ≡ E[θ j|θ i] ≡ θ i ∈ (0, 1), which impliesthat α0(θ i) ≡ (1 − c)/(N + 1), as in example 4.7. �

Proposition 4.2 (Existence of a Bayes-Nash Equilibrium) A Bayesiangame �B of the form (4.3) with a finite player set N = {1, . . . , N}(with N ≥ 2), compact action setsA1, . . . , AN , compact finite-dimensional

15. To guarantee that the equilibrium strategy α0 takes only values at the interior ofthe action set with probability 1, it is necessary that E[θ j|θ i] be bounded from aboveby θ + 2(1 − θ )/(N + 1).

Page 179: Optimal Control Theory With Applications in Economics

166 Chapter 4

type spaces �1, . . . ,�N , and utility functions Ui(·, θ ), which are contin-uous for all θ ∈ �, has a Bayes-Nash equilibrium (in behavioral (mixed)strategies).

Proof See Balder (1988) for a proof; he defines a behavioral (mixed)strategy for player i as a transition-probability measure between thespaces �i and Ai. n

4.2.3 Dynamic Games of Complete InformationAdynamic game evolves over several time periods and can be describedby an extensive form, which, depending on past actions and observa-tions as well as on current and expected future payoffs, for each timeperiod specifies if and how a player can choose an action. It is clear thatbecause of the generality of this situation the study of a given dynamicgame in extensive form can be very complicated. Since the purposehere is to motivate and provide background for the theory of differen-tial games, attention is restricted to games with a fairly simple dynamicstructure.

Extensive-Form Games An extensive-form description of a dynamicgame specifies what players know in each stage of the game, when itis their turn to play, and what actions they can take. Before introducingthe general elements of this description it is useful to consider a simpleexample.

Example 4.10 (Entry Game) Consider two firms, 1 and 2. Firm 1 is apotential entrant and at time t = 0 decides about entering a market (i.e.,select the action e) or not (i.e., select the action e). Firm 2 is an incumbentmonopolist who can observe firm 1’s action, and at time t = 1 choosesto either start a price war and thus to fight the entrant (i.e., select theaction f ) or to accommodate firm 1 and not fight (i.e., select the action f ).Assume that firm 1 obtains a zero payoff if it does not enter and a payoffof either −1 or 1 if it enters the market, depending on whether firm 2decides to fight or not. Firm 2, on the other hand, obtains a payoff of2 if firm 1 does not enter and otherwise a payoff of −1 when fightingor a payoff of 1 when not fighting. The game tree in figure 4.2 depictsthe sequence of events as well as the firms’ payoffs. At each node afirm makes a decision, until a terminal node with payoffs is reached.A firm’s strategy consists of a complete contingent plan for each of itsdecision nodes, no matter if that node is reached in equilibrium or not.For example, when firm 1 decides not to enter the market, then in reality

Page 180: Optimal Control Theory With Applications in Economics

Game Theory 167

Firm 1

Firm 2

Don't Enter ( ) Enter ( )

(0,2)

(-1,-1) (1,1)

Fight ( ) Don't Fight ( )

Figure 4.2Game tree for the entry game.

there is nothing to decide for firm 2. Yet, in order for firm 1 to be ableto come to the decision of not entering, it needs to form an expecta-tion about what firm 2 would do if firm 1 decided to enter the market.The normal-form representation of the game in a payoff matrix is asfollows:

Firm 2

f f

Firm 1 e (−1, −1) (1, 1)

e (0, 2) (0, 2)

It is easy to verify that there are two pure-strategy Nash equilibria,(e, f ) and (e, f ).16 While the first of these two Nash equilibria seemsplausible, there clearly is a problem with the equilibrium where firm 1decides not to enter the market based on firm 2’s threat of fighting.Indeed, given the intertemporal structure of the game, once firm 1 actu-ally enters the market, it is better for firm 2 not to fight (leading to apayoff of 1 instead of −1). Thus, firm 2’s threat of fighting is not credibleand should be eliminated. �

16. This game has no additional Nash equilibria, even in mixed strategies. Thus,with an even number of equilibria it is degenerate in view of Wilson’s oddness result(see remark 4.1). The degeneracy is created by firm 2’s indifference about its action whenfirm 1 does not enter the market.

Page 181: Optimal Control Theory With Applications in Economics

168 Chapter 4

This example shows that the concept of a Nash equilibrium is gen-erally too weak for dynamic games because it may lead to noncrediblethreats or, more generally, to time inconsistencies. A noncredible threatarises because of a lack in commitment ability of the player that issuesthe threat. A well-known example with a surprising consequence of thislack in commitment ability is the following.

Example 4.11 (Intertemporal Pricing and the Coase Conjecture) A sellerwith market power is able to sell her goods at prices above marginal cost.While this seems to guarantee the seller a strictly positive net payoff, thefollowing informal argument by Coase (1972) shows that this does nothave to be the case, when the seller can sell her goods at any point t ≥ 0in continuous time.17 The intuition is that the monopolist at time t = 0 iscompeting with a copy of its own product that is sold at time t = � > 0.Of course, any consumer, when looking at the product today versusthe product tomorrow will agree that the product at time t = � is notquite as good as the product now at time t = 0.18 However, this differ-ence in product-quality difference vanishes when� tends to zero. Thus,as� → 0+ arbitrarily many copies of virtually the same product will beavailable in any fixed time interval, so the resulting perfect competitionmust drive the monopolist’s price down to marginal cost. Hence, in aperfect world with continuous money and time increments, a seller’sability to adjust her prices over time is more of a curse than a blessing,because of the seller’s lack of commitment power. This Coase prob-lem can be ameliorated by renting the product instead of selling it orby making binding promises about future production (e.g., by issuinga limited edition). Perishable products also increase the monopolist’scommitment power (e.g., when selling fresh milk) as well as adjust-ment costs for price changes (e.g., due to the necessity of printing anew product catalogue). The ability to commit to a price path from thepresent into the future is a valuable asset for a seller; the question ofcommitment as a result of the consumers’ option of intertemporal arbi-trage (i.e., they can choose between buying now or later) is importantfor any dynamic pricing strategy, at least when the available informa-tion is fairly complete. Note that the Coase problem is not significant in

17. The Coase conjecture was proved by Stokey (1981), Bulow (1982), and Gül et al. (1986)in varying degrees of generality.18. The reason for this may be both time preference and quality preference. By waiting,the consumer incurs, on the one hand, an opportunity cost of not being able to use theproduct and, on the other hand, a cost due to a decay in the product’s quality (e.g., dueto perishability or technological obsolescence).

Page 182: Optimal Control Theory With Applications in Economics

Game Theory 169

situations when consumers are nonstrategic (i.e., not willing or able towait). �

The previous two examples underline the importance of time-consis-tency issues. The lack of players’ ability to commit to their intertemporalstrategic plans weakens the plausibility of Nash equilibria that rely onthis commitment. The additional requirement of subgame perfection elim-inates time inconsistencies. A subgame of a dynamic game is a game thatarises after a certain time t ≥ 0 has passed in the original dynamic game,which up to that instant could have taken any arbitrary feasible path.The subgame therefore starts at a certain node in the extensive-formdescription of the game. A subgame-perfect Nash equilibrium is a strategyprofile that when restricted to any subgame induces a Nash equilibriumof the subgame.

Example 4.12 (Entry Game, Continued) The only “proper” subgame,namely, a subgame that is not the game itself, is the subgame that startsat firm 2’s decision node (see figure 4.3). The strategy profile (e, f ) ofthe original game does not induce a Nash equilibrium of this subgame,since f does not maximize firm 2’s payoff, given that firm 1 decided toenter the market. Hence, the Nash equilibrium (e, f ), which contains thenoncredible threat, is not subgame-perfect. �

It is now possible to formally describe a (finite) extensive-form game,

�E = (N , K, Z , A, H, a( · ), h( · ), ν( · ),π ( · ), σ ( · ), {Ui : Z → R}i∈N ),

where N = {1, . . . , N} is the set of N ≥ 1 players; K = {κ1, . . . , κK} is a setof K ≥ 1 nodes; Z ⊂ K is a set of terminal nodes; A is a set of all players’actions; H ⊂ 2K is a set of information sets (usually a partition of the setof nonterminal nodes K \ Z); a : K → A is a function that assigns to eachnode k an action a(k) that leads to it from its predecessor (with a(κ1) = ∅for the initial node k0); h : K \ Z → H is a function that assigns eachnonterminal node to an information set; ν : H → N is a player functionthat assigns to each information set a player whose turn it is to takean action; π : K → K is a predecessor function that specifies the (unique)predecessor of each node (with ∅ being the predecessor of the initialnode κ1 ∈ K); σ : K ⇒ K is a (set-valued) successor function that specifiesthe set of successor nodes for each node (with σ (Z) = {∅}); and the payofffunctions Ui specify for each terminal node z ∈ Z and each player i ∈N a payoff Ui(z). To understand the meaning of all the elements of

Page 183: Optimal Control Theory With Applications in Economics

170 Chapter 4

the extensive-form description of a dynamic game, consider again thesimple game in example 4.10.

Example 4.13 (Entry Game, Continued) An extensive-form representa-tion �E of the entry game has the elements N = {1, 2}, K = {κ1, . . . , κ5},Z = {κ3, κ4, κ5}, H = {{κ1}, {κ2}}, h(κi) = {κi}, and ν({κi}) = i for i ∈ {1, 2}.The functions a,π , σ are specified in table 4.1, and the nodes andplayers’ payoffs are depicted in figure 4.3. �

As becomes clear from example 4.10, even for simple dynamic gamesa full-scale specification of all the elements of its extensive-form rep-resentation is rather cumbersome. In most practical applications it istherefore simply omitted, and one relies on an informal description ofthe game together with a basic game tree such as the one shown infigure 4.2 for example 4.10. This game is termed a dynamic game with

Table 4.1Extensive-Form Representation of the Entry Game

Node κ1 κ2 κ3 κ4 κ5

a(κi) ∅ e e f f

π (κi) ∅ κ1 κ1 κ2 κ2

σ (κi) {κ2, κ3} {κ4, κ5} ∅ ∅ ∅

Firm 1

Firm 2

Don't Enter ( ) Enter ( )

Fight ( ) Don't Fight ( )

Figure 4.3Game tree with node specification for the entry game.

Page 184: Optimal Control Theory With Applications in Economics

Game Theory 171

perfect information, since all the players’ information sets contained atmost one node. If information sets contain sometimes more than onenode, the game is called a dynamic game with imperfect information.Thus, while information in both types of games is complete, we can dis-tinguish between cases where information is perfect and others where itis not. The role of information sets and imperfect information becomesclear when representing a simple static game of complete informationin extensive form.

Example 4.14 (Battle of the Sexes in Extensive Form) Ann and Bert’s gamein example 4.5 can be represented in extensive form as in figure 4.4. Theinformation set containing two of Bert’s decision nodes means that at thetime of taking a decision, Bert does not know what action Ann has de-cided to take. This dynamic game is therefore equivalent to the originalsimultaneous-move game. Note also that it is possible to switch Annand Bert in the diagram and obtain an equivalent representation of thegame; extensive-form representations are in general not unique. Finally,note that when information is perfect in the sense that all informationsets are singletons, and Bert knows what Ann has decided to do (as infigure 4.4b), then Ann has a definite advantage in moving first becauseshe is able to commit to going dancing, which then prompts Bert togo dancing as well, in the only subgame-perfect Nash equilibrium of

Ann Ann

D M

D M D M D M D M

D M

(2,1) (1,2)(0,0) (0,0) (2,1) (1,2)(0,0) (0,0)

Bert Bert Bert Bert

(a) (b)

Figure 4.4Battle-of-the-sexes game in extensive form: (a) without commitment (one information set),and (b) with commitment (two information sets).

Page 185: Optimal Control Theory With Applications in Economics

172 Chapter 4

this perfect-information, sequential-move version of the battle-of-the-sexes game. �

Proposition 4.3 (Existence of a Subgame-Perfect Nash Equilibrium)(Zermelo 1913) (1) Every finite extensive-form game (of perfect infor-mation) has a pure-strategy Nash equilibrium that can be derived usingbackward induction. (2) If no player has the same payoffs at two ter-minal nodes, then there is a unique subgame-perfect Nash equilibrium(which can be derived through backward induction).

Proof (Outline) Consider the dynamic game �E.(1) Any subgame of �E can be represented in normal form, which by

proposition 4.1 has a Nash equilibrium. Thus, by backward inductionit is possible to find a Nash equilibrium of �E.

(2) If players are not indifferent between terminal nodes, then ateach node there exists a strictly dominant choice of action; random-ization between different actions is therefore never optimal. Hence, anysubgame-perfect Nash equilibrium must be in pure strategies. Becauseof the strict dominance at each decision node, the subgame-perfect Nashequilibrium must also be unique. n

In order to verify that a given strategy profile (which for eachplayer i ∈ N contains a mapping αi from all his information sets to feasi-ble actions) of the dynamic game�E constitutes a subgame-perfect Nashequilibrium, the following result is of great practical significance.

Proposition 4.4 (One-Shot Deviation Principle) In an extensive-formgame�E a strategy profile α∗ = (αi∗)i∈N is a subgame-perfect Nash equi-librium if and only if it satisfies the one-shot deviation condition: noplayer can gain by deviating from α∗i at one single information set whileconforming to it at all other information sets.

Proof (Outline) ⇒: The necessity of the one-shot deviation condi-tion follows directly from the definition of a subgame-perfect Nashequilibrium. ⇐: If the one-shot deviation condition is satisfied for α∗

when α∗ is not a subgame-perfect Nash equilibrium, then there existsa decision node at which some player i has a better response than theone prescribed by αi∗. But then choosing that response and conform-ing to αi∗ thereafter would violate the one-shot deviation condition.Hence, the strategy profile α∗ must be a subgame-perfect Nash equi-librium. n

Page 186: Optimal Control Theory With Applications in Economics

Game Theory 173

Example 4.15 (Centipede Game) Each of two players, 1 and 2, has a start-ing capital of one dollar. The players take turns (starting with player 1),choosing either “continue” (C) or “stop” (S). Upon a player’s selecting C,one dollar is transferred by an independent third party from thatplayer’s capital to the other player’s capital and one additional dol-lar is added to the other player’s capital. The game stops when oneplayer chooses S or if both players’ payoffs are at least $100. Figure 4.5shows the corresponding game tree. Now consider the strategy profileα,which is such that both players always choose C, and use the one-shotdeviation principle to check if it constitutes a subgame-perfect Nashequilibrium. Since player 2 finds deviating in the last period, that is,playing (C, C, . . . , C, S) instead of (C, C, . . . , C), profitable, the one-shotdeviation condition in proposition 4.4 is not satisfied and α cannot be asubgame-perfect Nash equilibrium. Via backward induction, it is easyto determine that the unique subgame-perfect Nash equilibrium is ofthe form [(S, S, . . . , S); (S, S, . . . , S)], namely, both agents choose S at eachof their respective decision nodes. �

Remark 4.2 (Infinite-Horizon One-Shot Deviation Principle) The one-shotdeviation principle in proposition 4.4 extends to infinite-horizon games,provided that they are continuous at infinity, that is, if two strategyprofiles agree in the first T periods, then the absolute difference in eachplayer’s payoff converges to zero as T goes to infinity. This condition issatisfied if the players’ stage-game payoffs are uniformly bounded, aslong as the discount factor δi applied to player i’s payoffs between periodslies in (0, 1), for all i. �

Repeated Games The simplest way to extend the notion of a complete-information normal-form game (see section 4.2.1) to T > 1 discrete timeperiods is to repeat a one-period stage game � of the form (4.1) for Ttimes. The resulting repeated game �T is referred to as a supergame. At

C C C1 C CC 112 2 2

SSSSSS

. . . (100,100)

(1,1) (0,3) (2,2) (97,100) (99,99) (98,101)

Figure 4.5Centipede game.

Page 187: Optimal Control Theory With Applications in Economics

174 Chapter 4

each time t ∈ {0, . . . , T − 1} any player i ∈ N selects an action ait ∈ Ai.

Player i can condition his action on the information sit ∈ S i

t available tohim, where S i

t is player i’s (measurable) observation (or sample) space attime t. A player’s observation si

t may include the players’ past actionprofiles (up to time t − 1, for t ≥ 1) and the realizations of certainpublicly observable random events. The set-valued function that mapsthe available information to a probability distribution over player’s i’sobservation space S i

t is called player i’s information structure.For simplicity, it is assumed in the discussion of repeated games that

each player’s information is complete. That is, at time t ∈ {0, . . . , T − 1},player i’s observation space S i

t corresponds to the set Ht of possibletime-t histories ht, where

H0 = ∅, Ht = Ht−1 × A, ∀ t ∈ {1, . . . , T − 1},and

ht = (a0, . . . , at−1) ∈ Ht ⊂ H.

Thus, a player’s strategy αi is a mapping from the set H of all possiblehistories of the supergame, with

H =T−1⋃t=0

Ht,

to time-t actions in Ai.19 Let δ ∈ (0, 1) be a per-period discount fac-tor, common to all players. For a given strategy profile α = (αi)i∈N ,we set u(0) = (ui(0))i∈N = αi(∅) and u(t) = (ui(t))i∈N = α({ht, u(t − 1)})for t ∈ {1, . . . , T − 1}; there player i’s average payoff is given by

JiAvg(ui|u−i) = 1−δ

1−δT+1

T∑t=0

δtUi(u(t)) → (1 − δ)∞∑

t=0

δtUi(u(t)), as T → ∞.

The intuition is that when player i obtains a constant payoff Ui(u(t)) = cin each period t ≥ 0, then his average payoff will be equal to c as well.In this way one can directly compare a player’s average payoffs to hisstage-game payoffs, irrespective of the discounting between periods. ANash equilibrium of the supergame �T is a strategy profile α∗ = (αi∗)i∈Nsuch that

19. A generalization of this definition that includes mixed-strategy profiles is straightfor-ward and therefore omitted.

Page 188: Optimal Control Theory With Applications in Economics

Game Theory 175

αi∗(ht) ∈ arg maxai∈Ai

{Ui(ai,α−i∗(ht)) + Vi(t, (ht, (ai,α−i∗(ht))))},

where

Vi(t, ht) =T−1∑

s=t+1

δt−sUi(a∗s )

with

a∗s = α∗(h∗

s ), h∗s+1 = (h∗

s ,α∗(h∗s )), h∗

t = ht, s ∈ {t + 1, . . . , T − 1}.Remark 4.3 (Augmented History) In some situations it is useful to in-clude publicly observable realizations of random variables (e.g., a cointoss, or the number of currently observable sunspots) in the players’observations because they may allow the players to coordinate theiractions and thus effectively enlarge the set of attainable payoff profiles.Given time-t realizations of a random process ωt, let

ht = (a0, . . . , at−1;ω0, . . . ,ωt−1) ∈ Ht ⊂ H

be an augmented time-t history, and

H =T−1⋃t=0

Ht

be the set of all such augmented histories, where

Ht = Ht ×�t, ∀ t ∈ {1, . . . , T − 1}.Analogous to the earlier definition, an augmented strategy profile is of theform α = (αi)i∈N , where

αi : H → Ai

for all i ∈ N . �

Example 4.16 (Finitely Repeated Prisoner’s Dilemma) Consider a T-foldrepeated prisoner’s dilemma game with a stage-game payoff matrixas in example 4.4. To obtain a subgame-perfect Nash equilibrium of thesupergame, by proposition 4.3 one can use backward induction startingat t = T. Clearly, in the last period both prisoners choose D, which thenfixes the terminal payoffs at time t = T − 1, so that again both prison-ers will choose D. Continuing this argument (e.g., by induction) yieldsthat for any finite time horizon T, the unique subgame-perfect Nash

Page 189: Optimal Control Theory With Applications in Economics

176 Chapter 4

equilibrium of the supergame is for both players to play D (i.e., to defect)in all periods. �

For infinitely repeated games, backward induction cannot be used toobtain a subgame-perfect Nash equilibrium. Yet, by threatening a lowerfuture payoff it may be possible to induce other players to deviate from a“myopic” stage-game Nash equilibrium. Depending on the threats used,different outcomes can be attained (see, e.g., proposition 4.5). Note thatthe game does not even have to be really infinite: a positive probabilityof continuation in each period is enough to yield an equivalent analysis.For instance, if the continuation probability p is constant across periods,then one may be able to consider δ = δp instead of δ as the per-perioddiscount factor over an infinite time horizon; this is often referred to asstochastic discounting.

Example 4.17 (Infinitely Repeated Prisoner’s Dilemma) Consider an infin-itely repeated prisoner’s dilemma, obtained by letting T in example 4.16go to infinity. The one-shot deviation principle (proposition 4.4 andremark 4.2) can be used to show that one subgame-perfect Nash equi-librium of the supergame is that both players choose D in every period.If the players can condition their strategies on histories, then othersubgame-perfect Nash equilibria are also possible. For example, as longas δ > 1/2, the following grim-trigger strategy profile (for all i ∈ N )constitutes a subgame-perfect Nash equilibrium:

• Player i chooses C in the first period.• Player i continues to choose C, as long as no player has deviated to Din any earlier period.• If the opponent chooses D, then player i plays D always (i.e., for therest of the game).

If both players conform to the grim-trigger strategy, then each player’saverage payoff is 1. To show that the preceding grim-trigger strategyprofile constitutes a subgame-perfect Nash equilibrium, consider a one-shot deviation in period t, which yields a payoff of

(1 − δ)(1 + δ+ · · · + δt−1 + 2δt + 0 + 0 + · · · ) = 1 − δt(2δ− 1) < 1,

as long as δ > 1/2. Now one must check that in the subgame in whichboth players play D neither has an incentive to deviate. But it hasalready been shown that the stationary strategy profile in which allplayers always play D is a subgame-perfect Nash equilibrium. Indeed,

Page 190: Optimal Control Theory With Applications in Economics

Game Theory 177

all individually rational payoff vectors can be implemented using agrim-trigger strategy (see proposition 4.5 and remark 4.4).20 �

The set of individually rational payoffs is

V = {(v1, . . . , vN) ∈ RN : ∃ a ∈ A s.t. Ui(a) ≥ vi ≥ v¯

i, ∀ i ∈ N },where

i = mina−i∈A−i

{maxai∈Ai

Ui(ai, a−i)}

is player i’s minmax payoff. Payoff vectors v such that each player i’s pay-off vi is strictly greater than his minmax payoff v

¯i are strictly individually

rational. Figure 4.6 illustrates the set V of all individually rational payoffvectors in the context of example 4.17. Furthermore, the set

R = {(v1, . . . , vN) : ∃ NE e∗ of � and ∃ a ∈ A s.t. Ui(a) ≥ vi ≥ Ui(e∗)} ⊆ V

(2,-1)

(-1,2)

(0,0)

(1,1)

Figure 4.6Convex hull of the stage-game payoff vectors, and set of individually rational payoffs.

20. Axelrod (1984) reports the results of an experiment where subjects (including promi-nent economists) were asked to specify a strategy for a repeated two-player prisoner’sdilemma game so as to maximize the average performance against all other submittedstrategies. The winning entry, by the mathematician Anatol Rapoport, was tit-for-tat, i.e.,a strategy that prescribes cooperation in the first period and from then on copies the oppo-nent’s previous-period action. While the strategy never wins, it tends to perform very wellon average. This was confirmed in a repeat experiment where the same entry won again.For more details, see Dixit and Nalebuff (1991, ch. 4).

Page 191: Optimal Control Theory With Applications in Economics

178 Chapter 4

is the set of Nash-reversion payoffs. The following Nash-reversionfolk theorem21 provides a simple implementation of equilibrium payoffvectors in R.

Proposition 4.5 (Nash-Reversion Folk Theorem) (Friedman 1971)For any Nash-reversion payoff π ∈ R there is a constant δ

¯∈ (0, 1) such

that for any common discount factor δ ∈ (δ¯, 1), there exists a subgame-

perfect Nash equilibrium of the supergame �∞(δ) with payoffs equalto π .

Proof (Outline) Consider a (possibly correlated) stage-game strategyprofile a such that v = (U1(a), . . . , UN(a)) ∈ R. The following strategyprofile induces a subgame-perfect Nash equilibrium in the supergame:

• Start playing ai and continue doing so as long as a was played in theprevious period.• If in the previous period at least one player deviated, then each playerplays a dominated Nash-equilibrium strategy profile e∗ for the rest ofthe game.

This strategy profile indeed constitutes a subgame-perfect Nash equi-librium, since

maxa∈A

Ui(a) + δUi(e∗)1 − δ

≤ Ui(a)1 − δ

as long as δ ∈ (0, 1) is large enough. The rest follows using the one-shotdeviation principle. n

Remark 4.4 (Subgame-Perfect Folk Theorem) (Fudenberg and Maskin1986) The set R of Nash-reversion payoffs is a subset of the set V ofall individually rational payoffs. The following folk theorem states thatall (strictly) individually rational payoffs can be implemented usingan appropriate subgame-perfect strategy profile, provided that theplayers are patient enough and provided that the set of individuallyrational payoffs is of full dimension. This is remarkable because theimplementable payoff vectors may be strictly smaller than the smallestNash-equilibrium payoff vector.

If dim(V) = N, then for any v = (v1, . . . , vN) ∈ V with vi > ¯vi there is a con-

stant ¯δ(v) ∈ (0, 1) such that for any δ ∈ (¯δ(v), 1), there exists a subgame-perfect

21. The term folk theorem stems from the fact that its content was known (i.e., it was partof “folk wisdom”) before a proof appeared in the literature.

Page 192: Optimal Control Theory With Applications in Economics

Game Theory 179

Nash equilibrium (in mixed strategies) of �∞(δ) with an (average) expectedpayoff of v.

The proof of this result is constructive and requires trigger strate-gies with finite-length punishment phases (including punishment for thefailure to punish), followed by an indefinite reward phase (with extrapayoffs for those who punished). Abreu et al. (1994) have shown thatthe dimensionality condition can be relaxed to dim(V) ≥ N − 1. �

Remark 4.5 (Equilibrium Multiplicity and Institutional Design)Kreps (1990, 95–128) noted that one of the major problems plaguinggame theory is the generic lack of predictive power due to equilibriummultiplicity. The preceding folk theorems underline this deficiency. Onthe other hand, the explicit equilibrium constructions in the proof of eachresult can provide valuable clues as to how to design self-enforcing insti-tutions, in the sense that no explicit contracts are needed for the players’repeated interaction, only a common expectation that a certain equilib-rium will be played. The latter view of institutions as a self-enforcingcommon set of expectations corresponds to a modern definition of insti-tutions by Aoki (2001). As a natural refinement for equilibria one can,for example, focus on equilibria that produce Pareto-optimal outcomes,which (by definition) cannot be improved upon for any player withoutmaking another player worse off. This refinement is often referred to asPareto perfection. �

Example 4.18 (Repeated Cournot Duopoly) Consider an infinitely re-peated Cournot duopoly, where two firms, 1 and 2, produce homoge-neous widgets in respective quantities q1 and q2. Firm i’s production costis C(qi) = cqi (with constant marginal cost, c > 0). The inverse marketdemand is specified as P(Q) = a − Q, where a > c and Q = q1 + q2.

(1) Duopoly. The unique Nash equilibrium of the stage game is givenby q1

c = q2c = (a − c)/3, yielding profits of π1

c = π2c = (a − c)2/9 for the

firms.(2) Monopoly. If the two firms merge, they can improve stage-game

profits by producing half of the monopoly quantity each, that is, theychoose q1

m = q2m = (a − c)/4 so as to obtain π1

m = π 2m = (a − c)2/8 > π c

i .Note that the monopoly outcome is Pareto-dominant (from the firms’point of view); however, without a contract, each firm could improveits profit unilaterally by deviating; in other words, it is not a Nashequilibrium of the stage game: the best response to monopoly quantitywould be BRi(q−i

m ) = ((a − c) − q−im )/2 = 3(a − c)/8 > qi

c > qim, leading to

Page 193: Optimal Control Theory With Applications in Economics

180 Chapter 4

deviation profits of π i = 9(a − c)2/64 > π im. Furthermore, a collusion is

possible in this game if both firms are patient enough, that is, if the firms’common discount factor δ is close enough to 1. Consider the followingNash-reversion strategy for firm i:

• Produce qim in the first period and continue doing so as long as the

observed outcome in the previous period is (q1m, q2

m).• If the outcome in the previous period is different from (q1

m, q2m), then

always choose qic thereafter.

Using the one-shot deviation principle (see proposition 4.4 andremark 4.2), it is straightforward to verify that this strategy profileconstitutes a subgame-perfect Nash equilibrium of the infinite-horizonsupergame. Indeed, the payoff difference from a deviation,

�i =(π i + δπ i

c

1 − δ

)− δπ i

m

1 − δ

=(

9(a − c)2

64+ δ(a − c)2

9(1 − δ)

)− (a − c)2

8(1 − δ)< 0 ⇔ δ >

917

is negative as long as δ is close enough to 1, since π im > π i

c . �

4.2.4 Dynamic Games of Incomplete InformationPreliminaries Recall that a game is of perfect information if each infor-mation set contains a single node; otherwise it is of imperfect informa-tion. A game is of complete information if all players know all relevantinformation about each other; otherwise it is of incomplete information.It turns out that games of imperfect information are sufficient to repre-sent all games provided one introduces an additional player, referred toas Nature, who selects the player types following a mixed strategy withprobability weights that implement the players’ beliefs.

Proposition 4.6 (Equivalence of Incomplete and Imperfect Information)(Harsanyi 1967) Any game of incomplete information can be rewrittenas a game of imperfect information.

Proof (Outline) Given an arbitrary game of incomplete information,one can introduce an additional player, called Nature (N0). Player N0

is the first to move, and her actions follow all other players’ beliefs: infact, N0 randomizes over the player types in �. Any move by Naturecorresponds to a particular type realization; however, players cannotobserve that move, and thus their respective information sets contain

Page 194: Optimal Control Theory With Applications in Economics

Game Theory 181

all possible nodes that N0’s choice could lead to. Clearly this is a gameof imperfect information, equivalent to the given game of incompleteinformation. n

In dynamic games with incomplete information, the concept ofBayesian perfection strengthens the Bayes-Nash equilibrium (see sec-tion 4.2.2) by requiring that players have beliefs about the probabilitythat each particular decision node has been reached in equilibrium. Abelief is thereby a probability distribution over the set of nodes in agiven information set. A strategy profile (together with a belief system)constitutes a perfect Bayesian equilibrium (PBE) of a game of incompleteinformation if the following four requirements are satisfied:

• At each information set, the player with the move must have a belief(a probability distribution) about which node in his information set hasbeen reached.• Given their beliefs, all players’ strategies (complete contingent plans)must be sequentially rational, that is, the actions taken at all informationsets by players with the move must be optimal.• On any equilibrium path (information sets reached with positive prob-ability in a given equilibrium), beliefs must be determined by Bayes’ ruleand the players’ equilibrium strategies.• Off any equilibrium path, beliefs are determined by Bayes’ rule andthe players’ equilibrium strategies, where possible.

Signaling Games When decision-relevant information is held pri-vately by individual agents, an uninformed decision maker (the princi-pal) may be able to elicit credible revelation of this private informationby designing an appropriate incentive-compatible screening mechanism(see chapter 5). For example, a sales manager may be able to elicit truth-ful revelation of a potential buyer’s willingness to pay by proposing tohim a menu of purchase contracts, indexed by different price-quality(or price-quantity) tuples. However, in cases when the decision makeris unable (or unwilling) to create such a mechanism, it may be in (atleast some of) the agents’ best interest to take the initiative and sendmessages to the principal in an attempt to credibly convey their privateinformation. When doing so, the parties engage in signaling.22

22. The term signaling in this context was coined by Michael Spence (1973). For this discov-ery he was awarded the 2001 Nobel Memorial Prize in Economics, together with GeorgeAkerlof and Joseph Stiglitz.

Page 195: Optimal Control Theory With Applications in Economics

182 Chapter 4

Despite the risk of somewhat confusing the issue, it can be notedthat in many practical problems private information is held by all par-ties, dissolving the fine distinction between signaling and screening. Forinstance, information about a product’s true quality might be held by asalesman, whereas each potential buyer best knows her own willingnessto pay for the product as a function of its quality. To maximize profitsthe salesman could attempt to design an optimal screening mechanism(see chapter 5), but then he might still be unable to sell some of hisproducts if he cannot credibly communicate quality information (seeexample 4.19). To avoid such market failure through adverse selection,the seller may attempt to actively signal the product’s true quality toconsumers, for instance, by offering a (limited) product warranty aspart of each sales contract.

This section focuses on the signaling issue and neglects the screeningissue in the trade situation with bilateral private information. Thus,attention is limited to situations where the only private information isheld by the party that engages in signaling.

Consider the following canonical two-stage signaling game. In thefirst stage a sender S sends a message s ∈ S to a receiver R. The sender’sprivate information can be summarized by S’s type θ ∈ �. R’s priorbeliefs about the distribution of types in the type space are commonknowledge and given by a probability distribution μ. In the secondstage, player R, after receiving the message, possibly updates her beliefsabout S’s type (resulting in posterior beliefs p(θ |s) contingent on theobserved s ∈ S) and takes an action a ∈ A. At this point, player Sobtains a utility payoff described by the function u : A × S ×� → R,and player R obtains a utility payoff given by v : A × S ×� → R. Forsimplicity it is assumed in this section that the action space A, the signalspace S, and the type space � are all finite.

Example 4.19 (Adverse Selection: Market for Lemons) Akerlof (1970) de-scribed a market for used cars, in which sellers S offer either one of twopossible car types, lemons (L for low quality) or peaches (H for high qual-ity). A car’s true quality (or type) θ ∈ {θL, θH} = � is observed only bythe seller, and any buyer thinks that with probability μ ∈ (0, 1) the carwill be a desirable peach. The buyers R have valuations v(θH) > v(θL)for the two goods, and the sellers have valuations u(θH) > u(θL) suchthat v(θ ) > u(θ ) for θ ∈ � (i.e., there exist gains from trade).23 In the

23. The buyer’s action set A is given by {“buy,” “don’t buy”}; gains from trade can berealized only when a car is sold.

Page 196: Optimal Control Theory With Applications in Economics

Game Theory 183

absence of signaling, that is, if there is no credible product informa-tion, both qualities are traded in a common market, and the expectedvalue of a good for a potential buyer becomes v = (1 −μ)v(θL) +μv(θH).If the sellers offering the high-quality goods have a private valuationu(θH) > v for these goods, there will be no more high-quality goodstraded, and the lemons take over the market completely. This yieldsa socially inefficient outcome as a result of the buyers’ adverse selec-tion in response to the sellers’ private quality information. Even thoughtransactions between agents may be mutually beneficial, only limitedtrade may occur as a result of private information that the sellers pos-sess about the quality of the good. If a seller can send a message s ∈ S(e.g., the length of warranty offered with the car) before the other partydecides about buying the car, then this may induce a separating equilib-rium, in which warranty is offered only with peaches, not lemons. Apooling equilibrium occurs if no warranty is offered at all (if expensivefor sellers), or if warranty is offered with both types of cars (if cheap forsellers). To demonstrate the potential effects of warranty, assume that aseller of a used car of quality θ ∈ {θL, θH} is able to offer warranty to thebuyer. Assume that providing such warranty services incurs an expectedcost of c(θ ) with c(θL) > c(θH) ≥ 0, that is, providing warranty servicesfor low-quality cars is more expensive in expectation than for high-quality cars. Warranty would entitle the buyer to a technical overhaulor a full refund of the purchase price (within a reasonable time interval)if the car were found by the buyer to be of lower quality than θH .24 Fur-ther assume that there are many identical car dealers engaging in pricecompetition; then in a separating equilibrium (in which only sellers ofhigh-quality cars offer warranty) the price charged for a high-qualitycar is pH = u(θH) + c(θH), and for a low-quality car it is pL = u(θL). Notethat the seller of a low-quality car, by deviating, could obtain a pay-off of pH − c(θL) instead of zero. Hence, a separating equilibrium inwhich only high-quality cars are offered with warranty exists if andonly if

pH − c(θL) = u(θH) − [c(θL) − c(θH)] ≤ 0 (4.6)

24. Note the implicit assumption that the seller has no reason to believe that the buyerwould engage in any negligent activity with the car before requesting a full refund. It isalso implicitly assumed that both parties agree on what constitutes a car of quality θH ,and that quality is observable to the owner within the warranty period.

Page 197: Optimal Control Theory With Applications in Economics

184 Chapter 4

and

v(θH) ≥ u(θH) + c(θH). (4.7)

Under the last two conditions market failure through adverse selec-tion can be prevented. Nevertheless, the unproductive investment c(θH)in the warranty services for high-quality cars is wasted and accountsfor the inefficiency generated by the information asymmetry in thissignaling game. If c(θH) = 0, then the separating equilibrium is effi-cient. If only (4.6) fails to hold, then low-quality cars can be boughtwithout risk at the higher price pH , and there thus exists a poolingequilibrium in which all cars are offered with warranty at price pH .If (4.7) fails to hold, then high-quality cars cannot be offered with war-ranty; depending on (4.6), low-quality cars might still be sold with awarranty. �

Definition 4.1 A (behavioral) strategy for player S of type θ ∈ � is afunction σ : S ×� → [0, 1], which assigns probability σ (s, θ ) to sendingmessage s ∈ S, where

∑s∈S

σ (s, θ ) = 1,

for all θ ∈ �. Similarly, a (behavioral) strategy for player R is a func-tion α : A × S → [0, 1], which assigns probability α(a, s) to action a ∈ Agiven that message s ∈ S has been received and which satisfies

∑a∈A

α(a, s) = 1,

for all s ∈ S.

Definition 4.2 (Bayes-Nash Equilibrium) The strategy profile (σ ,α)constitutes a Bayes-Nash equilibrium of the canonical signaling game if

σ (s, θ ) > 0 ⇒ s ∈ arg maxs∈S

{∑a∈A

α(a, s)u(a, s, θ )

}, (4.8)

and for each s ∈ S for which ν(s) = ∑θ∈� σ (s, θ )μ(θ ) > 0,

α(a, s) > 0 ⇒ a ∈ arg maxa∈A

{∑θ∈�

p(θ |s)v(a, s, θ )

}, (4.9)

Page 198: Optimal Control Theory With Applications in Economics

Game Theory 185

where p(θ |s) is obtained via Bayesian updating,

p(θ |s) = σ (s, θ )μ(θ )ν(s)

, (4.10)

whenever possible, that is, for all s ∈ S for which ν(s) > 0.

Definition 4.3 A Bayes-Nash equilibrium (σ ,α) of the signaling gameis called separating equilibrium if each type sends a different message.It is called pooling equilibrium if (σ ,α) is such that there is a single signals0 ∈ S sent by all types, namely, σ (s0, θ ) = 1 for all θ ∈ �. Otherwise itis called partially separating.

Since in a separating equilibrium each type sends different signals,for any θ ∈ � there exists a collection of pairwise disjoint sets Sθ ⊂ Ssuch that ∪θ∈�Sθ = S, such that

∑s∈Sθ σ (s, θ ) = 1.

In the definition of the Bayes-Nash equilibrum, the posterior beliefsystem p is not a part of the equilibrium. In addition, off the equilibriumpath, that is, for messages s ∈ S, for which ν(s) = 0, the receiver’s poste-rior beliefs p( · |s) are not pinned down by Bayesian updating (4.10). Sincethe freedom in choosing these out-of-equilibrium beliefs may result ina multitude of Bayes-Nash equilibria, it is useful to make the beliefsystem p, which specifies posterior beliefs for any s ∈ S, a part of theequilibrium concept.

Definition 4.4 (Perfect Bayesian Equilibrium) The tuple (σ ,α, p) is aperfect Bayesian equilibrium of the canonical signaling game if the strategyprofile (σ ,α) and the belief system p satisfy conditions (4.8)–(4.10).

Example 4.20 Consider a signaling game with type space� = {θ1, θ2},message space S = {s1, s2}, and action set A = {a1, a2} (figure 4.7).Player R’s prior beliefs about the type distribution are given by μk =Prob(θk) for k ∈ {1, 2} with μk ∈ [0, 1] and μ1 +μ2 = 1. If a sender oftype θk plays strategy {σ (si, θk)}i,k , then upon observing s ∈ S, player R’sposterior beliefs are given by

p(θk|si) = σ (si, θk)μk

σ (si, θ1)μ1 + σ (si, θ2)μ2,

for all i, k ∈ {1, 2}, provided that the denominator ν(si) = σ (si, θ1)μ1 +σ (si, θ2)μ2 is positive. If ν(si) = 0, then no restriction is imposed onplayer R’s posterior beliefs. First examine pooling equilibria, in which bothsender types send si in equilibrium, σ (si, θk) = 1 andσ (s−i, θk) = 0. In that

Page 199: Optimal Control Theory With Applications in Economics

186 Chapter 4

N0R R

Figure 4.7Signaling game with |A| = |S| = |�| = 2 (see example 4.20).

case player R’s posterior beliefs are only partly determined, since on theequilibrium path p(θk|si) = μk, whereas off the equilibrium path p(θk|s−i)cannot be pinned down by Bayesian updating. The reason for thelatter is that given the sender’s strategy, receiving a message s−i corre-sponds to a zero-probability event. If qk = p(θk|θ−i) denote the sender’soff-equilibrium-path beliefs, then naturally qk ∈ [0, 1] and q1 + q2 = 1.The receiver’s equilibrium strategy α off the equilibrium path is suchthat α(aj, s−i) > 0 implies

q1v(aj, s−i, θ1) + q2v(aj, s−i, θ2) ≥ q1v(a−j, s−i, θ1) + q2v(a−j, s−i, θ2),

for any j ∈ {1, 2}. In other words, an action can only be included inplayer R’s mixed-strategy profile if it maximizes her expected payoffconditional on having observed s−i. Similarly, on the equilibrium pathplayer R’s strategy α is such that α(aj, si) > 0 implies

μ1v(aj, si, θ1) +μ2v(aj, si, θ2) ≥ μ1v(a−j, si, θ1) +μ2v(a−j, si, θ2).

A strict preference, either on or off the equilibrium path, for an action aj

over a−j yields a pure-strategy equilibrium on the part of the receiver,which means that R puts all of the probability mass on aj. It is impor-tant to note that off the equilibrium path the optimal action generallydepends on R’s posterior beliefs q = (q1, q2), whence the usefulness ofincluding posterior beliefs in the description of the equilibrium. Whilethe Bayes-Nash equilibrium does not impose sequential rationality on

Page 200: Optimal Control Theory With Applications in Economics

Game Theory 187

the receiver (her actions do not have to be consistent with any off-equilibrium beliefs), the perfect Bayesian equilibrium requires that heraction be utility-maximizing conditional on having formed a belief. Todetermine all pooling equilibria, one then needs to verify that (4.8) holds,that is,

si(q) ∈ arg maxs∈{s1,s2}

{α(a1, s; q)u(a1, s, θk) +α(a2, s; q)u(a2, s, θk)

},

for all k ∈ {1, 2}.Now consider separating equilibria, in which different types send dif-

ferent messages. Without loss of generality assume that sender type θk

sends sk, that is, σ (sk, θk) = 1 and σ (s−k , θk) = 0. Then the receiver canperfectly infer the sender’s type from the observed message, so herposterior beliefs are given by p(θk|sk) = 1 and p(θk|s−k) = 0, at least aslong as she believes that both types can occur (μk ∈ (0, 1)).25 Player R’sequilibrium strategy α is now such that

α(aj, sk) > 0 ⇒ aj ∈ arg maxa∈{a1,a2}

v(a, sk, θk),

for all j ∈ {1, 2}. As before one needs to verify that (4.8) holds (notingthat σ (sk, θk) = 1), namely,

sk ∈ arg maxs∈{s1,s2}{α(a1, s)u(a1, s, θk) +α(a2, s)u(a2, s, θk)},

for k ∈ {1, 2}, to determine all separating equilibria. There may existpartially separating equilibria, in which σ (si, θk) ∈ (0, 1) for all i, k ∈{1, 2}. �

To deal with the potential multiplicity of signaling equilibria, consideran equilibrium refinement. Let

BR(�, s) =⋃

p:p(�|s)=1

arg maxa∈A

∑θ∈�

p(θ |s)v(a, s, θ )

be the receiver’s set of pure-strategy best responses conditional onobserving the message s which the receiver believes comes from a senderwith a type in � ⊆ �, implying that her posterior beliefs p are such

25. If the receiver believes that sender type θk never occurs (μk = 0), then her posteriorbeliefs upon observing sk cannot be determined by Bayesian updating, in which case,similar to the off-equilibrium-path reasoning in pooling equilibria, the receiver can havearbitrary posterior beliefs p(θl|sk) (for l ∈ {1, 2}) as to which type sent an “impossible”message.

Page 201: Optimal Control Theory With Applications in Economics

188 Chapter 4

that

p(�|s) =∑θ∈�

p(θ |s) = 1.

The set BR(�, s) ⊆ A contains all the receiver’s actions that might beoptimal conditional on having observed s and the receiver’s havingformed a belief which presumes the sender’s type θ to lie in the set �.Given a perfect Bayesian equilibrium (σ ,α, p), let

u∗(θ ) =∑

(a,s)∈A×Sα(a, s)σ (s, θ )u(a, s, θ )

be sender type θ ’s expected equilibrium payoffs. Hence, any sender oftype θ in the set

�s = {θ ∈ � : u∗(θ ) > maxa∈BR(�,s)

u(a, s, θ )}

would never send message s, since it would result in a payoff strictlyless than his expected equilibrium payoff u∗(θ ). The set � \ �s there-fore contains all the types that could reasonably be expected to send themessage s in equilibrium. Hence, any equilibrium in which a sender ofa type in � \ �s obtains, by sending the message s, an expected equilib-rium payoff strictly below the worst payoff he could rationally expectfrom the receiver (conditionally on her observation of s) seems coun-terintuitive. Based on this reasoning, Cho and Kreps (1987) introducedan intuitive criterion that can be used to eliminate signaling equilibriain which the sender could increase his equilibrium payoff by deviating(taking into account that the receiver cannot rationally play a strategythat is never a best response).

4.3 Differential Games

Let (t0, x0) be given initial data, consisting of an initial time t0 ∈ R

and an initial state x0 ∈ Rn, and let T > t0 be a (possibly infinite) timehorizon. A differential game (of complete information) in normal form isgiven by

�(t0, x0) = (N , {U i( · )}i∈N , {Ji(ui|μ−i( · ))}i∈N ),

Page 202: Optimal Control Theory With Applications in Economics

Game Theory 189

where N = {1, . . . , N} is a finite set of N ≥ 2 players, and where eachplayer i ∈ N chooses his control (or action) ui(t) = μi(t, x(t)) ∈ Rmi forall t ∈ [0, T) so as to maximize his objective functional

Ji(ui|μ−i) =∫ T

t0

hi(t, x(t), ui(t),μ−i(t, x(t))) dt,

subject to the control constraint

ui(t) ∈ U i(t, x(t)), ∀ t ∈ [t0, T),

and subject to the system equation in integral form (as in (2.17))

x(t) = x0 +∫ t

t0

f (s, x(s), ui(s),μ−i(s, x(s))) ds, ∀ t ∈ [t0, T),

given the other players’ strategy profile μ−i( · ). The reason for consider-ing the right-open interval [t0, T) instead of the closed interval [0, T]is to suggest with this notation that all the developments here alsoapply to the infinite-horizon case where T = ∞. Let m = m1 + · · · + mN .The functions f : R1+n+m → Rn and hi : R1+n+m → R are assumed to becontinuously differentiable, and the upper semicontinuous set-valuedmapping U i : R1+n ⇒ Rmi , with (t, x) �→ U i(t, x),26 is assumed to havenonempty, convex, and compact images, for all i ∈ N .

4.3.1 Markovian EquilibriaA strategy-profile μ∗( · ) = (μi∗( · ))i∈N is a (Markovian) Nash equilibriumof the differential game �(t0, x0) if there exists a state trajectory x∗(t), t ∈[0, T), such that for each i ∈ N the control ui∗(t) = μi(t, x∗(t)), t ∈ [t0, T),solves player i’s optimal control problem

J(ui|μ−i∗) =∫ T

t0

hi(t, x∗(t), ui(t),μ−i∗(t, x∗(t))) dt −→ maxui(·)

, (4.11)

x∗(t) = f (t, x∗(t), ui(t),μ−i∗(t, x∗(t))), x∗(t0) = x0, (4.12)

ui(t) ∈ U i(t, x∗(t)), ∀ t, (4.13)

t ∈ [t0, T], (4.14)

26. The control-constraint set U i(t, x) can usually be represented in the form

U i(t, x) = {ui = (u1i, u2i) ∈ L∞ : Ri(t, x, u1) ≥ 0, u2i ∈ U2i(t)},analogous to the relations (3.37)–(3.38) of the general optimal control problem discussedin section 3.4.

Page 203: Optimal Control Theory With Applications in Economics

190 Chapter 4

in the sense that (x∗(t), ui∗(t)), t ∈ [t0, T), is an optimal state-control trajec-tory of (4.11)–(4.14). The Nash-equilibrium strategy profile μ∗ is calledan open-loop Nash equilibrium if it is independent of the state; otherwiseit is called a closed-loop Nash equilibrium.

Remark 4.6 (Open-Loop vs. Closed-Loop Nash Equilibria) In an open-loopNash equilibrium any player i ∈ N commits to a control trajectory overthe entire time horizon, that is, ui∗(t) = μi∗(t), for a.a. t ∈ [t0, T). In aclosed-loop Nash equilibrium player i takes into account the fact that allother players use the state when determining their equilibrium strategyprofile u−i∗(t) = μ−i∗(t, x∗(t)). Because any open-loop Nash equilibriumcan be viewed as a closed-loop Nash equilibrium with trivial statedependence, it is evident that the class of closed-loop Nash equilibriais richer than the class of open-loop Nash equilibria. The key differ-ence between the two concepts is that a functional dependence on thestate of the other players’ equilibrium strategy profile prompts player ito take into account his actions’ effect on the state because of the antici-pated reactions to variations of the state (in addition to the direct payoffexternalities from the other players’ actions). Figure 4.8 provides someadditional intuition of open-loop versus closed-loop strategy profiles inthe differential game �(t0, x0). �

Remark 4.7 (Markovian Strategies) The term Markovian in the definitionof closed-loop (and open-loop) Nash equilibria refers to the fact that thestrategies do not depend on the history other than through the currentstate.27 In general, one can expect non-Markovian, that is, history-dependent Nash equilibria (see section 4.2.3). The fact that a strategyprofile is Markovian means that all relevant memory in the system iscarried by the current state; it does not mean that there is no memoryin the system. Thus, the system equation determines the way historyinfluences the players’ closed-loop strategies. �

Time Consistency and Subgame PerfectionAs in section 4.2.1, the differential game �(t, x) is a subgame of the differ-ential game �(t0, x0) if t ≥ t0 and there exists an admissible and feasiblestrategy profile such that the system can be controlled from (t0, x0) tothe event (t, x) (see remark 3.3). A Markovian Nash equilibrium μ∗

27. More generally, a stochastic process has the Markov property, named after the Russianmathematician Andrey Markov, if the conditional distribution of future states dependssolely on the current state.

Page 204: Optimal Control Theory With Applications in Economics

Game Theory 191

Other Players Other Players

System System

Player Player

(a) (b)

Figure 4.8Differential game with (a) open-loop and (b) closed-loop strategies.

of �(t0, x0) with associated (by proposition 2.3 unique) state trajectory

x∗(t) = x0 +∫ t

t0

f (s, x∗(s),μ∗(s, x∗(s))) ds, ∀ t ∈ [t0, T), (4.15)

is called time-consistent if the restriction μ∗|[t,T)×Rn is a Markovian Nashequilibrium of �(t, x∗(t)), for all t ∈ [t0, T). The Markovian Nash equilib-rium μ∗ of �(t0, x0) is called subgame-perfect if the restriction μ∗|[t,T)×Rn

is a Markovian Nash equilibrium of any subgame �(t, x) of �(t0, x0). Asubgame-perfect Markovian Nash equilibrium is also referred to as aMarkov-perfect Nash equilibrium. It is evident that Markov perfectionimplies time consistency. The converse is not true in general, becauseMarkov perfection requires an equilibrium to not rely on any non-credible threats off the equilibrium path as well as on the equilibriumpath, whereas time consistency imposes a condition only on the equi-librium path, not considering the situation after an albeit unexpecteddeviation.28

Proposition 4.7 (Time Consistency) Any Markovian Nash equilib-rium μ∗ of the differential game �(t0, x0) is time-consistent.

Proof The proof proceeds via contradiction. Ifμ∗ is not time-consistent,then there exists t ∈ (t0, T) such that μ∗|[t,T)×Rn is not a Markovian Nash

28. For the notion of noncredible threats, see example 4.10.

Page 205: Optimal Control Theory With Applications in Economics

192 Chapter 4

equilibrium of �(t, x∗(t)), where x∗(t), t ∈ [t, T), is as in (4.14). Hence,for some player i ∈ N an alternative strategy μi(t, x∗(t)) yields a higherpayoff thanμi∗(t, x∗(t)), where x∗ is the state trajectory under the strategyprofile (μi,μ−i∗). But this implies that μ∗ cannot be a Nash equilibrium,since player i could unilaterally improve his payoff by switching to μi

for all times t ≥ t. n

The following example, which builds on several examples in section35, illustrates the difference between open-loop and closed-loop Marko-vian equilibria as well as the difference between time consistency andsubgame perfection.

Example 4.21 (Joint Exploitation of an Exhaustible Resource) In the samesetting as in exercise 3.2, consider N ≥ 2 identical agents who, startingat time t = 0, exploit a nonrenewable resource of initial quantity x0 > 0,so that at given time t = T in the future none of the resource is left. Eachagent i ∈ N = {1, . . . , N} chooses a consumption rate ci(t) ∈ [0, c/N],where c > (r/ρ) x0/(1 − e(r/ρ)T) is a maximum allowable extraction rate.The evolution of the resource stock x(t) solves the initial value problem(IVP)

x(t) = −N∑

j=1

cj(t), x(0) = x0,

for all t ∈ [0, T], provided that the feasibility constraint

ci(t) ∈ [0, 1{x(t)≥0}c]is satisfied a.e. on [0, T], for all i ∈ N . Assume that each agent experiencesthe (instantaneous) utility U(y) = y1−ρ when consuming the resource ata nonnegative rate y, consistent with a constant relative risk aversion ofρ > 1 (see example 3.5). Agent i determines his consumption ci( · ) so asto maximize the discounted utility

J(ci) =∫ T

0e−rtU(ci(t)) dt,

where r > 0 is the discount rate. First determine the unique open-loop Nash-equilibrium strategy profile c∗. For this, introduce agent i’scurrent-value Hamiltonian

Page 206: Optimal Control Theory With Applications in Economics

Game Theory 193

Hi(t, x, c, ν i) = U(ci) − ν iN∑

j=1

cj,

where ν i is his (current-value) adjoint variable and c = (c1, . . . , cN) isa strategy profile. Let (x∗(t), c∗(t)), t ∈ [0, T], be an open-loop Nash-equilibrium state-control trajectory. As in example 3.4, using thePontryagin maximum principle (PMP),

ci∗(t) = c0e−(r/ρ)t =(x0

N

) (r/ρ)e−(r/ρ)t

1 − e−(r/ρ)T , ∀ t ∈ [0, T].

The constant c0 (the same for all agents) is determined by the endpointconstraint x∗(T) = 0, so that

x0 =N∑

j=1

∫ T

0cj∗(t) dt = Nc0

(r/ρ)(1 − e−(r/ρ)T),

resulting in the open-loop Nash-equilibrium trajectory

x∗(t) = x0e−(r/ρ)t − e−(r/ρ)T

1 − e−(r/ρ)T , ∀ t ∈ [0, T].

Thus, the open-loop Nash equilibrium leads to a socially optimalexploitation of the nonrenewable resource, just as in example 3.5. Thisequilibrium is also time-consistent, that is, if after time t the systemis started in the state x∗(t), then the corresponding open-loop Nashequilibrium on [t, T] is the same as the restriction of the open-loopNash equilibrium on [0, T] to the interval [t, T]. On the other hand, theopen-loop Nash equilibrium is not subgame-perfect, because after anunexpected deviation, for instance, from x∗(t) to some other state ξ ∈(0, x0), agents do not condition their subsequent consumption choiceson the new state ξ but only on the elapsed time t.

Now consider a closed-loop Nash equilibrium with an affine feedbacklaw of the form

μi(x) = αi +β ix, ∀ x ∈ [0, x0],where αi,β i are constants. Taking into account the symmetry, set α ≡ αi

and β ≡ β i, so agent i’s current-value Hamiltonian becomes

Hi(t, x, (ci,μ−i), νi) = U(ci) − νi(ci + (N − 1)(α+βx)).

The corresponding adjoint equation is then νi = (r + (N − 1)β

)ν i, so

Page 207: Optimal Control Theory With Applications in Economics

194 Chapter 4

ν i(t) = ν i0ert,

where r = r + (N − 1)β. Thus, agent i’s closed-loop Nash-equilibriumconsumption becomes

ci∗(t) =(x0

N

) (r/ρ)e−(r/ρ)t

1 − e−(r/ρ)T, ∀ t ∈ [0, T].

On the other hand, using the given feedback law, the closed-loop statetrajectory x∗(t), t ∈ [0, T], solves the IVP

x = −N (α+βx) , x(0) = x0,

so

x∗(t) = x0 − α

β(1 − e−Nβt), ∀ t ∈ [0, T].

The closed-loop state trajectory x∗(t), t ∈ [0, T], under the strategy pro-file c∗ = (c1, . . . , cN) is the same as the open-loop state trajectory afterreplacing r by r. Combining this with the previous expression for x∗

yields

α =(x0

N

) (r/ρ)1 − e−(r/ρ)T

and β = (r/ρ)N

,

where

= r

ρ− (1 − 1N

) .

The closed-loop Nash equilibrium is Markov-perfect, for its feedbacklaw is conditioned directly on the state. Also note that in the closed-loop Nash equilibrium the agents tend to exploit the resource faster thanwhen all players are committed up to time t = T. The lack of commit-ment in the closed-loop Nash equilibrium leads to an overexploitationof the resource compared to the welfare-maximizing solution that isimplemented by an open-loop equilibrium carrying with it the abil-ity to fully commit to a control trajectory at the beginning of the timeinterval. Figure 4.9 contrasts the open-loop and closed-loop equilibriumtrajectories. �

Example 4.22 (Linear-Quadratic Differential Game) Consider N ≥ 2 play-ers with quadratic objective functionals whose payoffs depend onthe evolution of a linear system. Given a time horizon T > 0, eachplayer i ∈ N = {1, . . . , N} chooses a control ui(t), t ∈ [0, T), in a convex,

Page 208: Optimal Control Theory With Applications in Economics

Game Theory 195

Open-Loop

Closed-Loop

Closed-Loop

Open-Loop

c

c

Figure 4.9State and control trajectories in an open-loop and a closed-loop Markovian Nashequilibrium of a joint exploitation game (see example 4.21).

compact subset U i of Rmi . It is assumed to be large enough to allow foran effectively unconstrained optimization. The evolution of the statevariable x(t) ∈ Rn is governed by the linear system equation

x(t) = A(t)x(t) +N∑

i=1

Bi(t)ui(t),

where the continuous bounded matrix functions A, Bi are such thatA(t) ∈ Rn×n and Bi(t) ∈ Rn×mi , for all t ∈ [0, T). Given the other players’strategy profile μ−i(t, x), player i solves the optimal control problem

J(ui|μ−i) =∫ T

0hi(t, x(t), ui(t),μ−i(t, x(t))) dt − e−rTx′(T)Kix(T) −→ max

ui(·),

x(t) = A(t)x(t) +N∑

i=1

Bi(t)ui(t), x(0) = x0,

ui(t) ∈ U i, ∀ t,

t ∈ [0, T],where

hi(t, x, u) = −e−rt

⎡⎣x′Ri(t)x +

N∑j=1

(uj)′Sij(t)uj

⎤⎦ ,

Page 209: Optimal Control Theory With Applications in Economics

196 Chapter 4

r ≥ 0 is a common discount rate, and R(t), Sij(t) are continuous boundedmatrix functions with values in Rn×n and Rmj×mj , respectively. The ter-minal cost matrix Ki ∈ Rn×n is symmetric positive definite. The setupis analogous to the linear-quadratic regulator problem in example 3.3.Using the PMP or the Hamilton-Jacobi-Bellman (HJB) equation it is pos-sible to explicitly determine open-loop and closed-loop equilibria of thislinear-quadratic differential game (see exercise 4.1). �

4.3.2 Non-Markovian EquilibriaThe discussion of repeated games in section 4.2.3 showed how importantthe players’ available information structure can be for the constructionof Nash equilibria, of which, according to the various folk theorems(e.g., proposition 4.5), there can in principle be many. Now consider theconcept of information structure, which for each player i defines whatis known at a given time t about the state of the system and the historyof the players’ actions. For any player i ∈ N , let

I i :{(t, u( · )) : u( · ) ∈ L∞([t0, T), Rm), t ∈ [t0, T)

} → S i

be a mapping from the set of available data to his observation space S i,called player i’s information structure (IS).29 The observation space S =S1 × · · · × SN is a subset of a finite-dimensional Euclidean space. The(combined) information structure I = I1 × · · · × IN is called causal (ornonanticipatory) if

I(t, u( · )) = I(t, u( · )∣∣[0,t) ), ∀ t ∈ [t0, T).

A causal information structure does not use any future information. Theinformation structure I is called regular if for any admissible u( · ), u( · ) ∈L∞([t0, T), Rm):

∫ T

t0

‖u(s) − u(s)‖ds = 0 ⇒ I(t, u( · )) = I(t, u( · )), ∀ t ∈ [t0, T).

A regular information structure produces therefore the same obser-vations for any two strategy profiles which are identical except on azero-measure set of time instances. Last, a (pure) strategy profile of adifferential game �I(t0, x0) with information structure I is a mapping μ =(μ1, . . . ,μN) : [t0, T) × S → Rm. A definition of a Nash equilibrium in

29. In contrast to the repeated games discussed in section 4.2.3, assume here (for sim-plicity) that the information structure maps to a stationary observation space, which thencorresponds essentially to the space of all histories in the earlier discussion.

Page 210: Optimal Control Theory With Applications in Economics

Game Theory 197

this context is completely analogous to the earlier definitions of Nashequilibrium, and is therefore omitted.

Example 4.23 (Information Structures) Let i ∈ {1, . . . , N}, N ≥ 2.

• Markovian IS I i(t, u( · )) ≡ x(t).• Delayed-state IS Given a delay δ ∈ (0, T − t0), consider

I i(t, u( · )) ≡{

x0 if t ∈ [t0, t0 + δ],x(t − δ) if t ∈ [t0 + δ, T).

This IS depends only on the past, that is, it is causal; it is also regular.However, when the delay δ is negative, then the IS Ii(t, u( · )), t ∈ [t0, T),becomes noncausal and continues to be regular.• Delayed-control IS Let δ ∈ (t0, T − t0) be a given delay, as in the lastexample. The IS I i(t, u( · )) = uj( max{t0, t − δ}) for some j ∈ {1, . . . , N} \{i} is causal but not regular. If the delay is negative, the IS is neithercausal nor regular.• Sampled-observation IS Given the time instances t1, . . . , tκ ∈ (t0, T)with tk−1 < tk for all k ∈ {2, . . . , κ}, consider I i(t, u( · )) ≡ {x(tk) : tk ≤ t}.This IS is causal and regular.

The next example provides an information structure that commonlyarises in hierarchical play with commitment. �

Example 4.24 (Stackelberg Leader-Follower Games) Consider a game withtwo players, 1 and 2, where player 1 in the role of the leader first com-mits to a control path u1(t), t ∈ [t0, T). Player 2, the follower, observesthis strategic preannouncement and chooses his payoff-maximizingresponse, u2(t), t ∈ [t0, T). The corresponding (anticipatory but regu-lar) information structure I = (I1, I2) is such that I1(t, u( · )) = {∅, u1( · )}and I2(t, u( · )) ∈ {u1( · ), u( · )}. In other words, the follower knows theleader’s entire control trajectory, while the leader chooses his strategywithout knowing the follower’s strategy.30 This generally creates a time-consistency problem, similar to the discussion of this phenomenon insection 4.2.3. �

Example 4.25 (Cournot-Stackelberg Duopoly) Building on the hierarchi-cal information structure in the last example, consider two identicalfirms, 1 and 2, with firm 1 as leader and firm 2 as follower. Eachfirm i ∈ {1, 2} chooses a production output ui ≥ 0 at the cost C(ui) =

30. Of course, the leader can anticipate the follower’s response, but he is unable to changehis own strategy intermittently; that is, the leader has no recourse.

Page 211: Optimal Control Theory With Applications in Economics

198 Chapter 4

(ui)2/2 on the infinite time interval [0, ∞), so as to maximize its objectivefunctional

Ji(ui|u−i) =∫ ∞

0e−rt[p(t)ui(t) − C(ui)] dt,

where r > 0 is a common discount factor. The price process p(t) isdetermined as solution of the IVP

p = 1 − p − u1(t) − u2(t), p(0) = p0,

for a given initial value p0 > 0. To determine an open-loop Nash equi-librium of this hierarchical game, one first solves the follower’s optimalcontrol problem given the leader’s control trajectory u1(t), t ≥ 0. Thecorresponding current-value Hamiltonian is

H1(p, u1, u2, ν2) = pu2 − (u2)2

2+ ν2(1 − p − u1 − u2),

where ν2 is the current-value adjoint variable. Using the PMP, themaximality condition yields u2 = p − ν2, and the adjoint equationbecomes

ν2 = (2 + r)ν2 − p(t), ∀ t ∈ [0, ∞).

The leader can now solve its own optimal control problem, taking intoaccount the anticipated actions by the follower. Firm 1’s correspondingcurrent-value Hamiltonian is

H2(p, u1, u2, ν1, ν2) = pu1 − (u1)2

2+ ν1

1 (1 − p − u1 − u2)

+ ν12 ((2 + r)ν2 − p),

where ν1 = (ν11 , ν1

2 ) is the leader’s adjoint variable. It is important tonote that the leader takes into account the evolution of the follower’sadjoint variable and thus works with an augmented state variable x =(p, ν2) instead of just with p which the follower uses. The PMP yields themaximality condition u1 = p − ν1

1 and the adjoint equations

ν11 = (3 + r)ν1

1 + ν12 − p(t),

ν12 = −ν1

1 − 2ν12 ,

for all t ∈ [0, ∞). Because the Hamiltonian system of adjoint equationsand state equation (for the price) amounts to a system of linear ODEs,

Page 212: Optimal Control Theory With Applications in Economics

Game Theory 199

it is possible to find an explicit solution; for computational details in asimilar setting, see exercise 4.2. The turnpike price,

p∗ = (5 + 2r)(2 + r)21 + 23r + 6r2 ,

is obtained by computing the equilibrium (p∗, ν1, ν2) of the Hamilto-nian system, which also yields the corresponding long-run equilibriumproduction levels,

u1∗ = p∗ − ν11 = 3 + 3r + r2

9 + 10r + 3r2 and u2∗ = p∗ − ν2 = 2 + 3r + r2

9 + 10r + 3r2 .

It is interesting to compare these last results to the turnpike price p∗ =(2 + r)/(4 + 3r) and the long-run production levels u1∗ = u2∗ = (1 + r)/(4 + 3r) in an open-loop Nash equilibrium without hierarchical play:

p∗ < p∗ and u2∗ < ui∗ < u1∗,

that is, the leader produces more than any of the firms in thesimultaneous-move equilibrium, which does not afford the leader thepossibility to anticipate the follower’s reaction completely. In the longrun, the resulting total output is higher and thus the market price lowerthan in the non-hierarchical open-loop Nash equilibrium. �

Trigger-Strategy Equilibria The relevance and intuition of trigger-strategy equilibria was first discussed in section 4.2.1. The generalintuition carries over to differential games, yet it is necessary to becareful about defining what exactly a deviation means. The reason isthat, for example, deviations from a given target strategy profile u(t),t ∈ [t0, T), at single time instances are not payoff-relevant and shouldtherefore not trigger any response.

Example 4.26 (Prisoner’s Dilemma as Differential Game) In what follows,a static two-agent prisoner’s dilemma game is generalized to a suit-able infinite-horizon differential game �, and subgame-perfect trigger-strategy equilibria of� are derived with cooperation (on the equilibriumpath), assuming that both players have a common positive discountrate r. In this it is assumed that each player experiences a detectionlag, and that he can condition his time-t action on the entire historyof play up to time t ≥ 0. Describe the set of (average) payoff vectorsthat can be implemented using such trigger-strategy equilibria, depend-ing on r. Following is the payoff matrix for a standard single-period

Page 213: Optimal Control Theory With Applications in Economics

200 Chapter 4

prisoner’s dilemma game, where ui ∈ [0, 1] is player i’s chosen proba-bility of cooperating (i.e., playing C), for i ∈ {1, 2} instead of defecting(i.e., playing D):

Player 2

(u2) (1 − u2)

C D

Player 1 (u1) C(1, 1) (−1, 2)

(1 − u1) D (2, −1) (0, 0)

Consider a dynamic version of the prisoner’s dilemma stage gameover an infinite time horizon,31 in which both players care about theirrespective average payoffs. This game can be written as a differentialgame by realizing that, given the strategy uj : R → [0, 1] of player j ∈{1, 2} (an essentially bounded, measurable function), player i ∈ {1, 2} \{ j} solves the (somewhat degenerate) optimal control problem

Ji(ui|uj) −→ maxui(·)

s.t. xi = 1 − uj, x(0) = 0,

ui(t) ∈ [0, 1], ∀ t ≥ 0,

where

Ji(ui|uj) = r∫ ∞

0e−rt(ui(t)uj(t) + 2uj(t)(1 − ui(t)) − ui(t)(1 − uj(t))) dt.

The state xi(t), t ≥ 0, measures the duration over which player j has, onaverage, not cooperated before time t. Though it is not directly relevantfor player i’s objective functional, keeping track of this state may allowplayer i to condition his strategy on player j’s cumulative behavior aspart of a Markov-perfect equilibrium.

Note first that if both players restrict attention to stationary strategyprofiles u(t) ≡ u0 = (ui

0, uj0) ∈ [0, 1]2, the payoffs Ji(ui

0|uj0) will be identical

to those shown in the preceding payoff matrix. Thus, the differential

31. Equivalently, consider a finite-horizon game in which a random length T of the timehorizon is exponentially distributed, so that its hazard rate is constant.

Page 214: Optimal Control Theory With Applications in Economics

Game Theory 201

game � is indeed a generalization of the static game, when players arefree to choose nonstationary strategy profiles. Now consider differenttypes of equilibria of �.

Open-Loop Equilibria The current-value Hamiltonian for player i’soptimal control problem is

Hi(t, x, u, νi) = uiuj + 2uj(1 − ui) − ui(1 − uj) + νii (1 − uj) + νi

j (1 − ui),

where x = (x1, x2) is the state of the system. The PMP yields the fol-lowing necessary optimality conditions for any optimal state-controltrajectory (x∗(t), ui∗(t)), t ≥ 0:

• Adjoint equation

ν i(t) = rνi(t), ∀ t ≥ 0.

• Transversality

e−rtν i(t) → 0 as t → ∞.

• Maximality

ui∗(t) ∈

⎧⎪⎨⎪⎩

{0} if ν ij (t) < −1

[0,1] if νij (t) = −1

{1} if νij (t) > −1

⎫⎪⎬⎪⎭ = arg max

ui∈[0,1]Hi(t, x∗(t), ui, uj∗(t), ν i(t)),

for all t ≥ 0.

The adjoint equation together with the transversality condition impliesthat ν i(t) ≡ 0, so the maximality condition entails that ui∗(t) ≡ 0 for i ∈{1, 2}. Thus, the unique open-loop Nash equilibrium of the differentialgame � is for both players to always defect.

Closed-Loop Equilibria Now examine closed-loop equilibria of theform μi∗(t, x) = 1 − [sgn(xi)

]+. The intuition behind this strategy is that

any significant (i.e., not on a zero-measure time interval) deviation fromcooperation by player j is noticed by player i, and subsequently pun-ished by player i’s playing ui = 0 (i.e., defect) for all future times. Indeed,if player j plays μj∗, then it is best for player i to play μi∗, since thatyields a payoff of 1, while any deviation from this strategy yields a pay-off of zero. Note also that this closed-loop equilibrium is Markovian andregular. It implements perpetual cooperation on the equilibrium path.

Trigger-Strategy Equilibria Assuming that each player experiences acommon positive detection lag δ, and that he can condition his time-t

Page 215: Optimal Control Theory With Applications in Economics

202 Chapter 4

action on the information I(t, u( · )) = {u(s) : 0 ≤ s ≤ [t − δ]+} up to timet ≥ 0, subgame-perfect trigger-strategy equilibria with cooperation onthe equilibrium path can be sustained by the threat of minmax pun-ishment (see example 4.17) off the equilibrium path. For this, let anyadmissible reference function u = (u1, u2) : R+ → [0, 1]2 be given, andconsider the strategy profile μ∗ = (μi∗,μj∗) with

ui(t) ≡ μi∗(t, I(t, u( · ))|u( · )) ={

1 if∫ [t−δ]+

0 ‖u(s) − u(s)‖ds = 0,0 otherwise.

Note that because the system does not explicitly depend on time, if itis optimal for player i to deviate on an interval [τ , τ + δ] (i.e., playing adeviating strategy ui) for some τ ≥ 0, then it is also optimal to deviateon the interval [0, δ]. Player i’s corresponding deviation payoff Ji

dev(u|uj)is bounded, as

0 ≤ Jidev(ui|uj) ≤ 2r

∫ δ

0e−rsds = 2

(1 − e−rδ) ≤ Ji(ui|uj) ≤ 2,

provided that

0 < δ ≤ 1r

min{

ln(

22 − Ji(ui|uj)

), ln

(2

2 − Jj(uj|ui)

)}≡ δu.

Note that u(t) = u(t), t ≥ 0, on the equilibrium path, so that it is possi-ble to implement any individually rational average payoff vector v =(vi, vj) ∈ V , where

V = {v ∈ R2++ : v ∈ co({0, ( − 1, 2), (2, −1), (1, 1)})}

is the intersection of the convex hull of the static payoff vectors and thepositive quadrant of R2. In other words, as long as the reference strategyprofile u = (ui, uj) is individually rational, so that

(Ji(ui|uj), Jj(uj|ui)) ∈ V ,

it can be implemented as a trigger-strategy equilibrium. In addition, onecan obviously implement u(t) ≡ 0, which corresponds to the standardprisoner’s dilemma outcome (and the open-loop equilibrium). �

4.4 Notes

The theory of games dates back to von Neumann (1928) and vonNeumann and Morgenstern (1944). Good introductory textbooks onnoncooperative game theory are by Fudenberg and Tirole (1991),

Page 216: Optimal Control Theory With Applications in Economics

Game Theory 203

Gibbons (1992), and for differential games, Basar and Olsder (1995),and Dockner et al. (2000). Mailath and Samuelson (2006) give an intro-duction to repeated games, including issues concerning asymmetricinformation.

Nash (1950) introduced the modern notion of equilibrium that iswidely used in noncooperative game theory. Radner and Rosen-thal (1982) provided sufficient conditions for the existence of pure-strategy Bayes-Nash equilibria. Milgrom and Weber (1985) relaxedthose conditions by allowing for distributional strategies. Their findingswere further generalized by Balder (1988), whose main result essen-tially corresponds to proposition 4.2. Time-consistency problems inleader-follower games such as economic planning were highlighted byKydland and Prescott (1977); for an overview of such issues in the con-text of nonrenewable resources, see Karp and Newbery (1993). Signalinggames were first discussed by Spence (1973) and appear in many eco-nomic contexts, for instance, advertising (Kihlstrom and Riordan 1984).

4.5 Exercises

4.1 (Linear-Quadratic Differential Game) Consider the differentialgame in example 4.22.

a. Determine the (unique) open-loop Nash equilibrium.

b. Determine a closed-loop Nash equilibrium.

c. Explain the difference between the two equilibria in parts a and b.

4.2 (Cournot Oligopoly) In a market where all N ≥ 2 firms are offeringhomogeneous products, the evolution of the “sticky” market price p(t)as a function of time t ≥ 0 is described by the IVP

p(t) = f (p(t), u1(t), . . . , uN(t)), p(0) = p0,

where the initial price p0 > 0 and the continuously differentiable excessdemand function f : R1+N → R are given, with

f (p, u1, . . . , uN) = α

(a − p −

N∑i=1

ui

).

The constant α > 0 is a given adjustment rate, and a > 0 representsthe known market potential. Each firm i ∈ {1, . . . , N} produces the out-put ui(t) ∈ U = [0, u], given the large capacity limit u > 0, resulting inthe production cost

Page 217: Optimal Control Theory With Applications in Economics

204 Chapter 4

C(ui) = cui + (ui)2

2,

where c ∈ [0, a) is a known constant. Given the other firms’ strategy pro-file u−i(t) = μ−i(t, p(t)), t ≥ 0, each firm i maximizes its infinite-horizondiscounted profit,

Ji(ui) =∫ ∞

0e−rt(p(t)ui(t) − C(ui(t))) dt,

where r > 0 is a common discount rate.

a. Formulate the differential game �(p0) in normal form.

b. Determine the unique Nash equilibrium u0 (together with an appro-priate initial price p0) for a static version of this Cournot game, whereeach firm i can choose only a constant production quantity ui

0 ∈ U andwhere the price p0 is adjusted only once, at time t = 0, and remainsconstant from then on.

c. Find a symmetric open-loop Nash equilibrium u∗(t), t ≥ 0, of �(p0),and compute the corresponding equilibrium turnpike (p∗, u∗), namely,the long-run equilibrium state-control tuple. Compare it to your solutionin exercise 4.2.a and explain the intuition behind your findings.

d. Find a symmetric Markov-perfect (closed-loop) Nash equilibriumμ∗(t, p), (t, p) ∈ R2+, of �(p0), and compare it to your results in part c.

e. How do your answers in parts b–d change as the market becomescompetitive, as N → ∞?

4.3 (Duopoly Pricing Game) Two firms in a common market are com-peting on price. At time t ≥ 0, firm i ∈ {1, 2} has a user base xi(t) ∈ [0, 1].Given the firms’ pricing strategy profile p(t) = (pi(t), pj(t)),32 t ≥ 0, thisuser base evolves according to the ODE

xi = xi(1 − xi − xj)[α(xi − xj) − (pi(t) − pj(t))], xi(0) = xi0,

where j ∈ {1, 2} \ {i}, and the initial user base xi0 ∈ (0, 1 − xj0) is given.Intuitively, firm i’s installed base increases if its price is smaller thanα(xi − xj) + pj, where the constant α ≥ 0 determines the importance ofthe difference in installed bases as brand premium. For simplicity,

32. It is without loss of generality to assume that p(t) ∈ [0, P]2, where P > 0 can beinterpreted as a (sufficiently large) maximum willingness to pay.

Page 218: Optimal Control Theory With Applications in Economics

Game Theory 205

assume that the firms are selling information goods at zero marginalcost. Hence, firm i’s profit is

Ji(pi|pj) =∫ ∞

0e−rtpi(t) [xi(t)]+ dt,

where r > 0 is a given common discount rate.

a. Formulate the differential game �(x0) in normal form.

b. Show that any admissible state trajectory x(t) of the game�(x0) movesalong the curve C(x0) = {(x1, x2) ∈ [0, 1]2 : x1x2 = x10x20}.c. Show that an open-loop state-control trajectory (x∗(t), p∗(t)) in thegame �(x0) is characterized as follows.• If x10 = x20, then p∗(t) ≡ 0 and x∗(t) ≡ x0.• If x∗

i0 > x∗j0, then pj∗(t) = 0, and (x∗

i (t), −x∗j (t)) increases along the

curve C(x0), converging to the stationary point

x = (xi, xj) =(

1 −√1 − 4xi0xj0

2,

1 +√1 − 4xi0xj0

2

).

d. Plot a typical Nash-equilibrium state-control trajectory.

e. Provide an intuitive interpretation of the Nash-equilibrium strat-egy profile of �(x0) determined earlier, and discuss the correspondingmanagerial conclusions. What features of the game are not realistic?

4.4 (Industrial Pollution) Consider a duopoly in which at time t ≥ 0each firm i ∈ {1, 2} produces a homogeneous output of qi(t) ∈ [0, yi],where yi ≥ 0 is its capacity limit. Given a total output of Q = q1 + q2,the market price is given by p(Q) = [1 − Q]+. The aggregate output Qcauses the emission of a stock pollutant x(t), which decays naturally atthe rate β > 0. Given the initial stock x(0) = x0 > 0, the evolution of thepollutant is described by the initial value problem

x(t) = Q(t) −βx(t), x(0) = x0,

for all t ≥ 0. The presence of the pollutant exerts an externality on bothfirms, as firm i’s total cost

Ci(x, qi) = ciqi + γ i x2

2

depends not only on its own production but also on the accumu-lated stock of pollution. The constant ci ∈ (0, 1) is a known marginal

Page 219: Optimal Control Theory With Applications in Economics

206 Chapter 4

production cost, and γ i ∈ {0, 1} indicates if firm i cares about pollution(γ i = 1) or not (γ i = 0). At time t, firm i’s total profit is

π i(x, q) = p(q1 + q2) · qi − Ci(x, qi),

where q = (q1, q2). At each time t ≥ 0, firm i chooses the capacity-expansion rate ui(t) ∈ [0, u] to invest in expanding its capacity y(t), whichevolves according to

yi(t) = ui(t) − δyi(t), y(0) = y0,

where δ > 0 is a depreciation rate, yi0 > 0 is firm i’s initial capacity level,and u > 0 is a (large) upper bound on the capacity-expansion rate. Thecost of expanding capacity at the rate ui is K(ui) = κ(ui)2/2, where κ > 0is the marginal expansion cost. Assume that both firms maximize theirtotal discounted payoffs, using the common discount rate r > 0.33 (Hint:Always check first if qi = yi in equilibrium.)

Part 1: Symmetric Environmental Impact

Both firms care about pollution, (γ 1, γ 2) = (1, 1).

a. Derive the solution to a static version of the duopoly game in whichall variables and their initial values are constants.

b. Formulate the differentiable game �(x0, y0) and derive an open-loopNash equilibrium.

c. Determine a closed-loop Nash equilibrium of �(x0, y0) in affinestrategies.

d. Compare the results you obtained in parts a–c.

Part 2: Asymmetric Environmental Impact

Only firm 1 cares about pollution, (γ 1, γ 2) = (1, 0).

e. Derive static, open-loop, and closed-loop equilibria of �(x0, y0), as inpart 1, and discuss how an asymmetric environmental impact changesthe outcome of the game compared to the situation with a symmetricenvironmental impact.

f. Discuss possible public-policy implications of your findings in parts1 and 2.

33. The cost K(ui) is sometimes referred to as internal adjustment cost.

Page 220: Optimal Control Theory With Applications in Economics

5 Mechanism Design

See first that the design is wise and just:that ascertained, pursue it resolutely;do not for one repulse forego the purposethat you resolved to effect.

—William Shakespeare

This chapter reviews the basics of static mechanism design in settingswhere a principal faces a single agent of uncertain type. The aim of theresulting screening contract is for the principal to obtain the agent’s typeinformation in order to avert adverse selection (see example 4.19), max-imizing her payoffs. Nonlinear pricing is discussed as an application ofoptimal control theory.

5.1 Motivation

Adecision maker may face a situation in which payoff-relevant informa-tion is held privately by another economic agent. For instance, supposethe decision maker is a sales manager. In a discussion with a potentialbuyer, she is thinking about the right price to announce. Naturally, theclient’s private value for a product is a piece of hidden information thatthe manager would love to know before announcing her price. It wouldprevent her from announcing a price that is too high, in which case therewould be no trade, or a price that is too low, in which case the buyeris left with surplus that the seller would rather pocket herself. In par-ticular, the decision maker would like to charge a person with a highervalue more for the product than a person with a lower value, providedshe could at least cover her actual marginal cost of furnishing the item.

Thus, the key question is, What incentive could an economic agenthave to reveal a piece of hidden information if an advantage could be

Page 221: Optimal Control Theory With Applications in Economics

208 Chapter 5

obtained by announcing something untruthful? More specifically, couldthe decision maker devise a mechanism that an agent might find attrac-tive enough to participate in (instead of ignoring the decision makerand doing something else) and that at the same time induces revela-tion of private information, such as the agent’s willingness to pay? Inorder for an agent to voluntarily disclose private information to a deci-sion maker (the principal), he would have to be offered a nonnegativeinformation rent, which he would be unable to obtain without revealinghis private information. An appropriate screening mechanism shouldmake it advantageous for any agent type (compared to his status quoor an outside option) to disclose his private information even whenthis information is likely to be used against him. The revelation principle(see proposition 5.1) guarantees that without loss of generality the prin-cipal can limit her search of appropriate mechanisms to those in whichany agent would find it optimal to announce his private informationtruthfully (direct mechanisms). An agent’s piece of private informationis commonly referred to as his type. The sales manager, before announc-ing a price for the product, thus wishes to know the type of the potentialbuyer, allowing her to infer his willingness to pay. As an example,if there are two or more agent types competing for the item, then atruth-revealing mechanism can be implemented using a second-priceauction (see example 4.2).1 The next section provides a solution for thesales manager’s mechanism design problem when the buyer’s privateinformation is binary (i.e., when there are only two possible types).

5.2 A Model with Two Types

Assume that a sales manager (seller) faces a buyer of type θL or θH ,whereby θL < θH . The buyer’s type θ ∈ {θL, θH} = � is related to hiswillingness to pay in the following way: if the seller (the principal)announces a price (or transfer) t for a product of quality x ∈ R,2 the(type-dependent) buyer’s utility is equal to zero if he does not buy (thusexercising his outside option), and it is

U(x, θ ) − t ≥ 0 (5.1)

if he does buy. The function U : X ×� → R (with X = [x¯, x] ⊂ R

and −∞ < x¯< x < ∞), assumed to be strictly increasing in (x, θ ) and

1. The basic ideas of mechanism design discussed in this chapter remain valid for thedesign of mechanisms with multiple agents, such as auctions.2. Instead of quality one can, equivalently, also think of quantity as instrument for thescreening mechanism (see example 5.2).

Page 222: Optimal Control Theory With Applications in Economics

Mechanism Design 209

concave in x, represents the buyer’s preferences.3 In order to design anappropriate screening mechanism that distinguishes between the twotypes, the seller needs a contracting device (i.e., an instrument), suchas a product characteristic that she is able to vary. The sales contractcould then specify the product characteristic, say, the product’s qualityx, for which the buyer would need to pay a price t = τ (x).4 The ideafor the design of a screening mechanism is that the seller proposes amenu of contracts containing variations of the instrument, from whichthe buyer is expected to select his most desirable one. Since there areonly two possible types, the seller needs at most two different contracts,indexed by the quality on offer, x ∈ {xL, xH}. The buyer, regardless oftype, cannot be forced to sign the sales contract: his participation is vol-untary. Thus, inequality (5.1) needs to be satisfied for any participatingtype θ ∈ {θL, θH}. Furthermore, at the price tH a buyer of type θH shouldprefer quality xH ,

U(xH , θH) − tH ≥ U(xL, θH) − tL, (5.2)

and at price tL a buyer of type L should prefer xL, so

U(xL, θL) − tL ≥ U(xH , θL) − tH . (5.3)

Assume that the unit cost for a product of quality x is c(x), where c :R → R is a strictly increasing continuous function. The contract-designproblem is to choose {(tL, xL), (tH , xH)} (with tL = τ (xL) and tH = τ (xH))so as to maximize the manager’s expected profit,

�(tL, xL, tH , xH) = (1 − p) [tL − c(xL)] + p [tH − c(xH)] , (5.4)

where p = Prob(θ = θH) = 1 − Prob(θ = θL) ∈ (0, 1) denotes the seller’sprior belief about the probability of being confronted with type θH asopposed to θL.5 The optimization problem,

max{(tL ,xL),(tH ,xH )}

�(tL, xL, tH , xH), (5.5)

is subject to the individual-rationality (or participation) constraint (5.1) aswell as the incentive-compatibility constraints (5.2) and (5.3). The general

3. It is assumed here that the buyer’s preferences are quasilinear in money. Refer tofootnote 1 in chapter 4 for the definition of the utility function as representation ofpreferences.4. It is important to note that the instrument needs to be contractable, i.e., observable by thebuyer and verifiable by a third party, so that a sales contract specifying a payment τ (x) fora quality x can be enforced by a benevolent court of law.5. Since the buyer’s type is unknown to the seller, she treats θ as a random variable withrealizations in the type space �.

Page 223: Optimal Control Theory With Applications in Economics

210 Chapter 5

solution to this mechanism design problem may be complicated anddepends on the form of U. Its solution is simplified when U has increas-ing differences in (x, θ ). In other words, assume that U(x, θH) − U(x, θL)is increasing in x, or equivalently, that

x ≥ x ⇒ U(x, θH) − U(x, θH) ≥ U(x, θL) − U(x, θL), (5.6)

for all x, x ∈ X . Condition (5.6) implies that the marginal gain from addi-tional quality is greater for type θH (the high type) than for type θL (thelow type). To further simplify the principal’s constrained optimizationproblem, one can show that the low type’s participation constraint isbinding. Indeed, if this were not the case, then U(xL, θL) − tL > 0 andthus,

U(xH , θH) − tH ≥ U(xL, θH) − tL ≥ U(xL, θL) − tL > 0,

which would allow the principal to increase prices for both the highand the low type because neither type’s participation constraint isbinding. As an additional consequence of this proof the individual-rationality constraint for the high type can be neglected, but is bindingfor the low type. This makes the principal’s problem substantially eas-ier. Another simplification is achieved by noting that the high type’sincentive-compatibility constraint (5.2) must be active. If this were nottrue, then

U(xH , θH) − tH > U(xL, θH) − tL ≥ U(xL, θL) − tL = 0,

whence it would be possible to increase tH without breaking (5.1) for thehigh type: a contradiction. Moreover, it is then possible to neglect (5.3),since incentive compatibility for the low type is implied by the factthat (5.2) is binding and the sorting condition (5.6) holds, tH − tL =U(xH , θH) − U(xL, θH) ≥ U(xH , θL) − U(xL, θL). The last inequality withθH > θL also implies that xH > xL. To summarize, one can thereforedrop the high type’s participation constraint (5.1) and the low type’sincentive-compatibility constraint (5.3) from the principal’s program,which is now constrained by the high type’s incentive-compatibilityconstraint,

tH − tL = U(xH , θH) − U(xL, θH), (5.7)

and the low type’s participation constraint,

U(xL, θL) = tL. (5.8)

Page 224: Optimal Control Theory With Applications in Economics

Mechanism Design 211

Equations (5.7) and (5.8) allow substituting tL and tH into the man-ager’s expected profit (5.4). With this, the contract-design problem (5.5),subject to (5.1)–(5.3), can be reformulated as an unconstrained optimiza-tion problem,

maxxL,xH∈X

{(1 − p)[U(xL, θL) − c(xL)]

+ p[U(xH , θH) − c(xH) − (U(xL, θH) − U(xL, θL))]}.The problem therefore decomposes into the two independent maximiza-tion problems,

x∗H ∈ arg max

xH∈X{U(xH , θH) − c(xH)} (5.9)

and

x∗L ∈ arg max

xL∈X

{U(xL, θL) − c(xL) − p

1 − p(U(xL, θH) − U(xL, θL))

}. (5.10)

From (5.9)–(5.10) the principal can determine t∗H and t∗L using (5.7)–(5.8). In order to confirm that indeed x∗

H > x∗L, as initially assumed,

first consider the first-best solution to the mechanism design problem,{(tFB

L , xFBL ),(tFB

H , xFBH )}, that is, the solution under full information. Indeed,

if the principal knows the type of the buyer, then

xFBj ∈ arg max

xj∈X{U(xj, θj) − c(xj)

}, j ∈ {L, H}, (5.11)

and

tFBj = U(xj, θj), j ∈ {L, H}. (5.12)

Comparing (5.9) and (5.11) yields that x∗H = xFB

H . In other words, even inthe presence of hidden information the high type will be provided with thefirst-best quality level. As a result of the supermodularity assumption (5.6)on U, the first-best solution xFB(θ ) is increasing in θ . Hence, θL < θH

implies that xFBL = xFB(θL) < xFB(θH) = xFB

H . In addition, supermodularityof U implies that for the low type the second-best solution x∗

L in (5.10) can-not exceed the first-best solution xFB

L in (5.11), since U(xL, θH) − U(xL, θL),a nonnegative function, increasing in xL, is subtracted from the first-bestmaximand in order to obtain the second-best solution (which thereforecannot be larger than the first-best solution). Hence, it is

x∗H = xFB

H > xFBL ≥ x∗

L. (5.13)

Page 225: Optimal Control Theory With Applications in Economics

212 Chapter 5

Thus, in a hidden-information environment the low type is furnishedwith an inefficient quality level compared to the first-best. Moreover,the low type is left with zero surplus (since t∗L = U(x∗

L, θL)), whereas thehigh type enjoys a positive information rent (from (5.7) and (5.12)),

t∗H = tFBH − (U(x∗

L, θH) − U(x∗L, θL)

)︸ ︷︷ ︸Information Rent

. (5.14)

The mere possibility that a low type exists thus exerts a positive external-ity on the high type, whereas the net surplus of the low type remainsunchanged (and equal to zero) when moving from the principal’sfirst-best to her second-best solution (figure 5.1).

If the principal’s prior belief is such that she thinks the high typeis very likely (i.e., p is close enough to 1), then she adopts a shutdownsolution, in which she effectively stops supplying the good to the lowtype. Let x

¯= min X be the lowest quality level (provided at cost c(x

¯)).

InformationRent

Figure 5.1First-best and second-best solution of the model with two types.

Page 226: Optimal Control Theory With Applications in Economics

Mechanism Design 213

Then (5.10) implies that for

p ≥ U(x¯, θL) − c(x

¯)

U(x¯, θH) − c(x

¯)

≡ p0 (5.15)

the principal shuts down (charges zero price for no product or a cost-less minimum-quality product), that is, she starts selling exclusively tothe high type. In that case, the high type’s information rent collapsesto zero, as he is unable to derive a positive externality from the nowvalueless (for the principal) low type. The following example clarifiessome of the general notions introduced in this section for a widely usedparametrization of the two-type model.

Example 5.1 Assume that the consumer’s private value for the prod-uct is proportional to both his type θ and the product’s quality (orquantity) x,6 so that U(x, θ ) = θx, whereby θ ∈ {θL, θH} with θH > θL > 0and x ∈ [0, x] with some (large enough) maximum achievable qualitylevel x. The cost of a product of quality x is assumed to be quadratic,c(x) = γ x2/2 for some positive constant γ ≥ θH/x (so that x ≥ θH/γ ).Note first that U(x, θ ) exhibits increasing differences in (x, θ ), sinceU(x, θH) − U(x, θL) = x(θH − θL) is increasing in x so that condition (5.6)is indeed satisfied. Assuming a principal’s prior p ∈ (0, 1) of the sameform as before, the general results obtained earlier can be used to get

x∗H = xFB

H = θH/γ

and

x∗L = 1

γ

[θL − p

1 − p(θH − θL)

]+< xFB

whereby xFBj = θj/γ for j ∈ {L, H}. Let

p0 = θL

θH

denote the threshold probability for the high type as in (5.15): for p ≥p0, it is x∗

L = 0. In other words, if the high type is more likely than p0,then the principal offers a single product of efficient high quality while

6. In this formulation, the type parameter θ can be interpreted as the marginal utility ofquality (or quantity), θ = Ux(x, θ ). The higher the type for a given quality level, the higherthe marginal utility for extra quality. The underlying heuristic is that “power users” areoften able to capitalize more on quality improvements (or larger quantities, e.g., morebandwidth) than less sophisticated, occasional users.

Page 227: Optimal Control Theory With Applications in Economics

214 Chapter 5

0 1 0 1

InformationRent

Figure 5.2Comparison of first-best and second-best solutions in terms of expected profit (�FB vs. �∗)and expected welfare (WFB vs. W∗) (see example 5.1).

the low type is excluded from the market. The corresponding second-best prices are given by p∗

L = U(x∗L, θL) = x∗

LθL and p∗H = p∗

L + (U(x∗H , θH) −

U(x∗L, θH)) = p∗

L + x∗H(θH − θL)/(1 − p) (Figure 5.2), whence the principal’s

expected profit under this optimal screening mechanism becomes

�∗ = �(t∗L, x∗L, t∗H , x∗

H) =

⎧⎪⎪⎨⎪⎪⎩θ2

L + pθ2H − 2pθLθH

2γ (1 − p)if p ≤ p0

pθ2H

2γotherwise.

By contrast, the first-best profit is

�FB = �(tFBL , xFB

L , tFBH , xFB

H ) = 12γ

(θ2L + p(θ2

H − θ2L )).

In the absence of type uncertainty, for p ∈ {0, 1}, it is �∗ = �FB. Withuncertainty, for p ∈ (0, 1), it is �∗ < �FB. Figure 5.3 shows how �∗ and�FB differ as a function of p. Now consider the social welfare (i.e., the sumof the buyer’s and seller’s surplus in expectation) as a function of p. In theabsence of hidden type information, the seller is able to appropriate allthe surplus in an efficient manner, and thus first-best expected welfare,WFB, equals first-best expected profit, �FB. The second-best expectedwelfare, W∗ = �∗ + p(U(x∗

H , θH) − t∗H), is not necessarily monotonic in p:as p increases the seller is able to appropriate more information rent fromthe high type, while at the same time losing revenue from the low type,for which she continues decreasing quality (as a function of p) until theshutdown point p0 is reached, at which all low types are excluded from

Page 228: Optimal Control Theory With Applications in Economics

Mechanism Design 215

0 1 0 1

Welfare

Figure 5.3Comparison of first-best and second-best solutions in terms of expected profit (�FB vs. �∗)and expected welfare (WFB vs. W∗) (see example 5.1).

the market. From then on, high types are charged the efficient price, andsecond-best welfare linearly approaches the first-best for p → 1−. �

5.3 The Screening Problem

Now consider the screening problem in a more abstract mechanismdesign setting. A principal faces one agent of unknown type θ ∈ � =[0, 1] and can offer him a contract (t, x), where x ∈ R is a consump-tion input for the agent provided by the principal and t ∈ R denotesa monetary transfer from the agent to the principal. Assume that for allθ ∈ � an agent’s preference order over allocations (t, x) can be obtainedby evaluating U(t, x, θ ), where U : R2 ×� → R is a sufficiently smoothutility function that is increasing in x, θ , concave in x, and decreasingin t.

Assumption 5.1 Ut < 0 < Ux, Uθ ; Uxx ≤ 0.

The principal typically controls the agent’s choice of x; however,as part of the mechanism the principal can commit to a certain set ofrules for making the allocation decision (see footnote 9). Her priorbeliefs about the distribution of agents on � are described in terms ofa cumulative distribution function F : � → [0, 1]. Before the principalprovides the agent’s consumption input, the agent decides about hisparticipation in the mechanism, and he can send a message m ∈ M tothe principal, whereby the message space M is a (measurable) set spec-ified by the principal. For instance, if the principal is a sales manager,

Page 229: Optimal Control Theory With Applications in Economics

216 Chapter 5

as before, the set M might contain the different products on offer.7

For simplicity assume that the message space contains a null messageof the type “I would not like to participate.” Allowing the agent notto participate means that the principal needs to consider the agent’sindividual-rationality constraint when designing her mechanism.

Definition 5.1 A mechanism M = (M, a) consists of a (compact) mes-sage space M �= ∅ and an allocation function a : M → R2 that assignsan allocation a(m) = (t, x)(m) ∈ R2 to any message m ∈ M.

Now consider a (dynamic) game, in which the principal proposesa mechanism M to an agent. The game usually has three periods. Inthe first period, the principal commits to M = (M, a) and the agentdecides whether to participate in the mechanism. In case of nonpartici-pation,8 both the agent and the principal obtain zero payoffs from theiroutside options, and the game ends. Otherwise, in the second period,the agent selects a message m ∈ M so as to maximize his utility (whichresults in his incentive-compatibility constraint). In the third period,the allocation a(m) specified by the mechanism is implemented, and thegame ends. The relevant equilibrium concept for this dynamic gameunder incomplete information is either the Bayes-Nash equilibrium orthe perfect Bayesian equilibrium (see section 4.2.3).

Let the principal’s preferences over allocations (t, x) for an agentof type θ be represented by a (sufficiently smooth) utility functionV : R2 ×� → R. The problem of finding a mechanism M that (in expec-tation) maximizes the principal’s utility can be greatly simplified usingthe revelation principle, which is essentially due to Gibbard (1973),Green and Laffont (1979), and Myerson (1979). It is presented here in asimplified one-agent version.

Proposition 5.1 (Revelation Principle) If for a given mechanism M =(M, a) an agent of type θ ∈ � finds it optimal to send a messagem∗(θ ), then there exists a direct revelation mechanism Md = (�, ad) suchthat ad(θ ) = a(m∗(θ )), and the agent finds it optimal to report his typetruthfully under Md.

7. To obtain more general results, assume that M contains all possible (probabilistic)convex combinations over its elements, so that m ∈ M in fact represents a probabilitydistribution over M.8. The participation decision may also take place in the second period when the messagespace contains a message to this effect.

Page 230: Optimal Control Theory With Applications in Economics

Mechanism Design 217

Filter(IncentiveCompati-

bility)

Mechanism

Figure 5.4Direct mechanism Md and the revelation principle.

Proof The proof is trivial. Since under mechanism M = (M, a) theagent finds it optimal to report m∗(θ ), it needs to be the case that

m∗(θ ) ∈ arg maxm∈M

U(a(m), θ ) (5.16)

for θ ∈ �. If the principal sets ad(θ ) = a(m∗(θ )), then clearly it is optimalfor the agent to send md(θ ) = θ in the new mechanism Md, which istherefore a direct revelation mechanism. n

The revelation principle implies that in her search for an optimalmechanism the principal can limit herself without loss of generalityto direct (i.e., truth-telling) mechanisms (Figure 5.4). The fact that theprincipal is able to commit to a certain mechanism is essential forthe revelation principle to work. Commitment to a mechanism allowsthe principal to promise to the agent that indeed a revelation mechanismis applied so that truthful messages are incentive-compatible.9

Definition 5.2 A direct mechanism (�, a) is implementable if the allo-cation function a : � → R2 satisfies the agent’s incentive-compatibility(or truth-telling) constraint, namely, if

U(t(θ ), x(θ ), θ ) ≥ U(t(θ ), x(θ ), θ ), (5.17)

for all θ , θ ∈ �.

9. In environments with renegotiation, where the principal is unable to commit to a rev-elation mechanism, the revelation principle fails to apply, and it may become optimalfor the principal to select an indirect mechanism. Indirect mechanisms can also help toensure that allocations are unique (strong implementation); by contrast, the revelationprinciple ensures only that a truthful action is among the agent’s most preferred actions(weak implementation). For more details see Palfrey and Srivastava (1993).

Page 231: Optimal Control Theory With Applications in Economics

218 Chapter 5

Note that the direct mechanisms in proposition 5.1 are implementable.Relation (5.17) is just a restatement of (5.16). The following analysisrestricts attention to differentiable mechanisms, that is, mechanismsin which the allocation function a = (t, x) is differentiable (and all therelevant sets are convex).

Assumption 5.2 (Sorting/Spence-Mirrlees Condition) The marginalrate of substitution between the agent’s consumption input (the good)and money is monotonic in the agent’s type,

∂θ

(−Ux(t, x, θ )

Ut(t, x, θ )

)≥ 0. (5.18)

For quasilinear preferences, which can be represented by a net util-ity function of the form U(x, θ ) − t (instead of U(t, x, θ )), the sortingcondition (5.18) amounts to requiring increasing differences in (x, θ ).In that case, inequality (5.18) reduces to (5.6) or, given differentiabil-ity, to the familiar supermodularity condition Uxθ ≥ 0. The followingresult provides a useful characterization of an implementable directmechanism.

Proposition 5.2 (Implementation Theorem) The direct mechanism (�,a) with a = (t, x) : � → R2 twice differentiable is implementable if andonly if for all θ ∈ �,

Ux(t(θ ), x(θ ), θ ) x(θ ) + Ut(t(θ ), x(θ ), θ ) t(θ ) = 0 (5.19)

and

x(θ ) ≥ 0. (5.20)

Proof ⇒: Consider an agent of type θ ∈ � and a direct mechanism(�, a). In choosing his message m(θ ), the agent solves

m(θ ) ∈ arg maxθ∈�

U(t(θ ), x(θ ), θ ),

for which the first-order necessary optimality condition can be written as

Ux(t(θ ), x(θ ), θ ) x(θ ) + Ut(t(θ ), x(θ ), θ ) t(θ ) = 0. (5.21)

Hence, any direct mechanism must necessarily satisfy (5.21) for θ =θ , namely, equation (5.19) for all θ ∈ �. The necessary optimalitycondition (5.21) becomes sufficient if, in addition the correspondingsecond-order condition,

Uxx(x)2 + 2Uxtxt + Utt(t)2 + Uxx + Utt ≤ 0, (5.22)

Page 232: Optimal Control Theory With Applications in Economics

Mechanism Design 219

is satisfied at a θ that solves (5.21). At a truth-telling optimum, relation(5.22) needs to be satisfied for θ = θ . Differentiating equation (5.19) withrespect to θ (note that it holds for all θ ∈ �) yields(Uxxx + Uxtt + Uxθ

)x + Uxx + (Uxtx + Uttt + Utθ

)t + Utt = 0,

so the second-order condition (5.22) becomes

Uxθ x + Utθ t ≥ 0,

or equivalently, using the fact that by (5.19) t = −Uxx/Ut,

Ut x∂

∂θ

(Ux

Ut

)≥ 0.

Since by assumption 5.1 the agent’s utility decreases in his transfer to theprincipal, Ut < 0, one obtains by assumption 5.2 that necessarily x > 0on �. ⇐: In order to demonstrate that (5.19) and (5.20) are sufficientfor the direct mechanism (�, (t, x)) to be implementable, one must showthat (5.17) in definition 5.2 holds for all θ , θ ∈ �. If one sets

U(θ , θ ) = U(t(θ ), x(θ ), θ ),

then the first-order and second-order optimality conditions can bewritten in the form U1(θ , θ ) = 0 and U11(θ , θ ) ≤ 0.10 If θ ≤ θ , then

U(θ , θ ) − U(θ , θ ) =∫ θ

θ

U1(ϑ , θ ) dϑ (5.23)

=∫ θ

θ

Ut(t(ϑ), x(ϑ), θ )(

Ux(t(ϑ), x(ϑ), θ )Ut(t(ϑ), x(ϑ), θ )

x(ϑ) + t(ϑ))

dϑ .

From (5.19) and (5.20), together with assumption 5.2, one obtains thatfor ϑ ≤ θ ,

0 = Ux(t(ϑ), x(ϑ),ϑ)Ut(t(ϑ), x(ϑ),ϑ)

x(ϑ) + t(ϑ) ≥ Ux(t(ϑ), x(ϑ), θ )Ut(t(ϑ), x(ϑ), θ )

x(ϑ) + t(ϑ).

By assumption 5.1, Ut < 0, so that the right-hand side of (5.23) is non-negative, U(θ , θ ) ≥ U(θ , θ ). If θ > θ , then (5.23) still holds. With theintegration bounds reversed, assumptions 5.1 and 5.2 lead to the sameconclusion, namely, that the right-hand side of (5.23) is nonnegative. In

10. Uj (resp. Ujj) denotes the partial derivative of U (resp. Uj) with respect to its firstargument.

Page 233: Optimal Control Theory With Applications in Economics

220 Chapter 5

other words, the direct mechanism (�, (t, x)) is by (5.17) implementable,which completes the proof. n

The implementation theorem directly implies a representation of allimplementable direct mechanisms, which the principal can use to findan optimal screening mechanism:

1. Choose an arbitrary nondecreasing schedule x(θ ) for the agent’sconsumption good.

2. For the x( · ) in the last step, find a transfer schedule t(θ ) − t0 by solv-ing the differential equation (5.19). Note that this transfer schedule isdetermined only up to a constant t0. The constant t0 can be chosen so asto satisfy the agent’s participation constraint,

U(t(θ ), x(θ ), θ ) ≥ 0, ∀ θ ∈ [θ0, 1], (5.24)

where θ ∈ [0, 1] is the lowest participating type.11

3. Invert the schedule in step 1, that is, find ϕ(x) = {θ ∈ � : x(θ ) = x},which may be set-valued. The (unit) price schedule as a function of theconsumption choice, τ (x) ∈ t(ϕ(x)), can then be written in the form

τ (x) ={

t(θ ) for some θ ∈ ϕ(x) �= ∅,∞ if ϕ(x) = ∅.

(5.25)

This transformation is sometimes referred to as the taxation principle.Note that whenever ϕ(x) is set-valued, different types obtain the sameprice for the same consumption choice. This is called bunching, sincedifferent types are batched together. Bunching occurs for neighboringtypes when x(θ ) = 0 on an interval of positive length (see remark 5.1).

5.4 Nonlinear Pricing

Consider again the sales manager’s decision problem (see section 5.1),but this time the manager (principal) assumes that potential buyers havetypes θ distributed in the continuous type space� = [0, 1] with differen-tiable cumulative distribution function F : � → [0, 1] (and density f =F). The principal wishes to find an optimal allocation function (t, x) :� → R2 so as to maximize her expected payoff,

11. Assumption 5.1 implies that if type θ ∈ (0, 1) decides to participate in the principal’smechanism, then (as a consequence of Uθ > 0) all types in � larger than θ also decide toparticipate. Thus, the set of all participating types must be of the form �0 = [θ0, 1] ⊂ �

for some θ0 ∈ [0, 1].

Page 234: Optimal Control Theory With Applications in Economics

Mechanism Design 221

∫�0

V(t(θ ), x(θ ), θ ) dF(θ ), (5.26)

subject to the implementability conditions (5.19) and (5.20) in propo-sition 5.2 and to the participation constraint (5.24), where �0 = [θ0, 1]is the set of participating types. The principal’s payoff function V :R2 ×� → R is assumed continuously differentiable and satisfies thefollowing assumption, which ensures that the principal likes money,dislikes providing the attribute (at an increasing rate), and for any typedoes not mind providing the zero bundle, that is, a zero attribute at zeroprice (assuming that her outside payoff is zero).

Assumption 5.3 Vxx, Vx < 0 < Vt; V(0, θ ) ≥ 0, ∀ θ ∈ �.

With the control u(t) = x(t), to find a screening mechanism that maxi-mizes her expected payoff (5.26), subject to (5.19),(5.20), and (5.24), theprincipal can solve the optimal control problem (OCP)

J(u) =∫ 1

θ0

V(t(θ ), x(θ ), θ )f (θ ) dθ −→ maxu(·),(t0,x0,θ0)

, (5.27)

t(θ ) = − Ux(t(θ ), x(θ ), θ )Ut(t(θ ), x(θ ), θ )

u(θ ), t(θ0) = t0, (5.28)

x(θ ) = u(θ ), x(θ0) = x0, (5.29)

0 ≤ U(t0, x0, θ0), (5.30)

u(θ ) ∈ [0, u], ∀ θ , (5.31)

θ ∈ [θ0, 1], (5.32)

where u > 0 is a large (finite but otherwise arbitrary) control constraint.Problem (5.27)–(5.32) is a general OCP of the form (3.35)–(3.39), withassumptions A1–A5 and conditions S, B, and C (see section 3.4) satisfied.Let

H(t, x, θ , u,ψ) = V(t, x, θ ) f (θ ) − Ux(t, x, θ )Ut(t, x, θ )

ψtu +ψxu

be the corresponding Hamiltonian, where ψ = (ψt,ψx) is the adjointvariable. Thus, the Pontryagin maximum principle (PMP) in proposi-tion 3.5 can be used to formulate necessary optimality conditions for theprincipal’s mechanism design problem.

Page 235: Optimal Control Theory With Applications in Economics

222 Chapter 5

Proposition 5.3 (Optimal Screening Contract) Let assumptions 5.1–5.3 be satisfied, and let (t∗(θ ), x∗(θ ), t∗0, x∗

0, θ∗0 ), θ ∈ [θ∗

0 , 1], be an optimalsolution to the principal’s screening problem (5.27)–(5.32), with u∗(θ ) ≡x∗(θ ). Then there exist a multiplier λ ∈ R and an absolutely continuousfunction ψ = (ψt,ψx) : [θ∗

0 , 1] → R2 such that the following optimalityconditions are satisfied.

• Adjoint equation

ψt(θ ) = ∂

∂tUx(t∗(θ ), x∗(θ ), θ )Ut(t∗(θ ), x∗(θ ), θ )

ψt(θ )u∗(θ ) − Vt(t∗(θ ), x∗(θ ), θ ) f (θ ), (5.33)

ψx(θ ) = ∂

∂xUx(t∗(θ ), x∗(θ ), θ )Ut(t∗(θ ), x∗(θ ), θ )

ψt(θ )u∗(θ ) − Vx(t∗(θ ), x∗(θ ), θ ) f (θ ), (5.34)

for all θ ∈ [θ∗0 , 1].

• Transversality

ψ(θ∗0 ) = −λU(t,x)(t∗0, x∗

0, θ∗0 ) and ψ(1) = 0. (5.35)

• Maximality

∀ θ ∈ [θ∗0 , 1] : u∗(θ ) �= 0 ⇒ ψx(θ ) = Ux(t∗(θ ), x∗(θ ), θ )

Ut(t∗(θ ), x∗(θ ), θ )ψt(θ ). (5.36)

• Endpoint optimality

λ ≥ 0, λU(t∗0, x∗0, θ∗

0 ) = 0, (5.37)

λUθ (t∗0, x∗0, θ∗

0 ) = V(t∗0, x∗0, θ∗

0 ) f (θ∗0 ). (5.38)

• Nontriviality

|λ| + ‖ψ(θ )‖ �= 0, ∀ θ ∈ [θ∗0 , 1]. (5.39)

• Envelope condition

V(t∗(θ ), x∗(θ ), θ ) f (θ ) = −∫ 1

θ

Hθ (t∗(s), x∗(s), s, u∗(s),ψ(s)) ds, (5.40)

for all θ ∈ [θ∗0 , 1].

Proof The conditions obtain by applying the PMP in proposition 3.5 tothe OCP (5.27)–(5.32). n

Page 236: Optimal Control Theory With Applications in Economics

Mechanism Design 223

Remark 5.1 (Quasilinear Payoffs) If both the agent’s and the principal’spayoff functions are quasilinear in money, that is, if

U(t, x, θ ) = U(x, θ ) − t and V(t, x, θ ) = V(x, θ ) + t,

for some appropriate functions U, V (so that all assumptions on U, Vremain satisfied), then some optimality conditions in proposition 5.3can be simplified. For example, (5.33) and (5.35) directly yield

ψt(θ ) = 1 − F(θ ), ∀ θ ∈ [θ∗0 , 1]. (5.41)

The maximality condition (5.36), together with assumption 5.1, thenimplies that

ψx(θ ) = −(1 − F(θ )) Ux(x∗(θ ), θ ) ≤ 0, ∀ θ ∈ [θ∗0 , 1]. (5.42)

Assuming λ �= 0, one obtains from the endpoint condition (5.37)

t∗0 = U(x∗0, θ∗

0 ), (5.43)

that is, the lowest participating type obtains zero surplus. From theprevious relation, the transversality condition (5.35), and the endpoint-optimality condition (5.38) it can be concluded that

U(x∗0, θ∗

0 ) + V(x∗0, θ∗

0 ) − 1 − F(θ∗0 )

f (θ∗0 )

Uθ (x∗0, θ∗

0 ) = 0. (5.44)

The term on the left-hand side of (5.44) is the virtual surplus evaluatedfor the lowest participating type. Combining (5.41) and (5.42) with themaximality condition (5.36) in proposition 5.3 yields

u∗(θ ) �= 0 ⇒ Ux(x∗(θ ), θ ) + Vx(x∗(θ ), θ )

− 1 − F(θ )f (θ )

Uxθ (x∗(θ ), θ ) = 0, (5.45)

which is consistent with maximizing the virtual surplus

S(x, θ ) = U(x, θ ) + V(x, θ ) − 1 − F(θ )f (θ )

Uθ (x, θ ) (5.46)

with respect to x, where the term W = U + V = U + V correspondsto the actual surplus in the system, and the term ((1 − F)/f )Uθ repre-sents the social loss due to the asymmetric information. Note also that

Page 237: Optimal Control Theory With Applications in Economics

224 Chapter 5

Figure 5.5Ironing of the optimal attribute schedule and bunching of types.

S = U + V − hcUθ , where

hc(θ ) ≡ 1 − F(θ )f (θ )

is the complementary (or inverse) hazard rate. As in the two-type caseof section 5.2, the lowest type ends up without any surplus (full sur-plus extraction at the bottom), and the highest type obtains an efficientattribute, maximizing W(x, 1) (no distortion at the top), since the socialloss vanishes for θ = 1 (by virtue of the fact that hc(1) = 0). Last, it isimportant to ask what happens when the lower control bound becomesbinding, that is, when u∗(θ ) = x∗(θ ) = 0. In that case, the now constantattribute schedule implies that an interval of types is treated in the sameway by the principal (the types obtain the same attribute in return for thesame transfer). This is called bunching (figure 5.5). The correspondingironing procedure dates to Mussa and Rosen (1978). �

The following example illustrates how to construct an optimal nonlinearpricing scheme using an optimal screening contract.

Example 5.2 (Nonlinear Pricing) Consider the sales manager’s problemof finding an optimal pricing scheme τ (x) for selling a quantity x ≥ 0of her product, when her cost of providing that amount to an agentis C(x) = cx2/2, where c > 0 is a given cost parameter. When the agent

Page 238: Optimal Control Theory With Applications in Economics

Mechanism Design 225

buys the quantity x at the price t, his net utility is

U(t, x, θ ) = θx − t,

where type θ ∈ � = [0, 1] belongs to the agent’s private information. Themanager (principal), with net payoff V(t, x, θ ) = t − C(x), believes thatthe different agent types are uniformly distributed on�, so that F(θ ) ≡ θ .Since both the principal’s and the agent’s net payoffs are quasilinearin money, one can use the conditions in remark 5.1 to determine theoptimal price-quantity schedule (t∗, x∗) : � → R2. Indeed, the virtualsurplus in (5.46) becomes

S(x, θ ) = θx − cx2/2 − (1 − θ )x,

so, using (5.45),

∀ θ ∈ [θ∗0 , 1] : x∗(θ ) > 0 ⇒ x∗(θ ) = [2θ − 1]+

c∈ arg max

x≥0S(x, θ ).

Relations (5.43) and (5.44), together with the adjoint equation (5.34) andtransversality condition (5.35), can be used to determine the endpointdata (t∗0, x∗

0, θ∗0 ) (and the Lagrange multiplier λ). Indeed, (5.34) and (5.35)

are equivalent to

t∗0 − θ∗0 x∗

0 = (2θ∗0 − 1)x∗

0 − c(x∗0)2/2 = 0. (5.47)

From (5.34)–(5.35) and the endpoint-optimality conditions (5.37)–(5.38)one obtains λ = (t∗0 − c(x∗

0)2/2)/x∗0 and

ψx(θ∗0 ) = −(1 − θ∗

0 )θ∗0 = − t∗0 − c(x∗

0)2/2x∗

0θ∗

0 . (5.48)

But (5.47)–(5.48) imply that (t∗0, x∗0, θ∗

0 ) = 0, so by the state equation (5.28)it is

t∗(θ ) =∫ θ

0ϑ x∗(ϑ) dϑ = 1

c

[θ2 − 1

4

]+

, ∀ θ ∈ [0, 1].

Using the taxation principle (5.25) to eliminate the type parameterfrom the optimal price-quantity schedule (t∗(θ ), x∗(θ )), θ ∈ �, the salesmanager’s optimal nonlinear pricing scheme therefore becomes

τ ∗(x) ={

(2x + cx2)/4 if x ∈ [0, 1/c],∞ otherwise.

Page 239: Optimal Control Theory With Applications in Economics

226 Chapter 5

The corresponding second-best payoffs for the agent and the principalare

U(t∗(θ ), x∗(θ ), θ ) ≡ ([θ − (1/2)]+)2

cand

V(t∗(θ ), x∗(θ ), θ ) ≡ [8θ − 4θ2 − 3]+4c

.

The total second-best welfare and virtual surplus are therefore

W(x∗(θ ), θ ) ≡ [2θ − 1]+2c

and S(x∗(θ ), θ ) ≡ 2([θ − (1/2)]+)2

c,

respectively, where W(x, θ ) = θx − cx2/2 and S(x, θ ) = (1 − θ )x. As in thetwo-type model in section 5.2, one obtains full surplus extraction atthe bottom of the type space (for all θ ∈ [0, 1/2]) and no distortion at thetop because the highest type obtains the welfare-maximizing qualitywithout any social loss (for θ = 1: x∗(1) = 1/c ∈ arg maxx≥0 W(x, θ )).12

5.5 Notes

The implementation theorem is ascribed to Mirrlees (1971); the versionpresented here is by Guesnerie and Laffont (1984). The treatment ofmechanism design in sections 5.3 and 5.4 is inspired by Laffont (1989).Lancaster (1966) first realized, “The good, per se, does not give utilityto the consumer; it possesses characteristics, and these characteristicsgive rise to utility . . . . In general a good will possess more than onecharacteristic, and many characteristics will be shared by more thanone good” (65).

These Lancasterian characteristics are referred to as product attri-butes, and naturally products contain a number of different such attri-butes which, facing a heterogeneous consumer base of unknown types,

12. If the principal maximizes social surplus instead of her profits, then (by replac-ing V with U + V in all optimality conditions) the first-best price-quantity scheduleis (tFB(θ ), xFB(θ )) = (3[θ2 − (1/9)]+/(2c), [3θ − 1]+/c), leading to the nonlinear pricingscheme τFB(x) = (2x + cx2)/6 for x ∈ [0, 2/c] and τFB(x) = ∞ otherwise. This notion offirst-best retains the agent’s autonomy, so the optimization is carried out subject to theincentive-compatibility constraints as formulated in the implementation theorem (propo-sition 5.2). It is interesting that the first-best leads to an overprovision of quantity forhigh types, compared to a full-information command-and-control solution, which is alsosometimes meant by the term first-best (for θ = 1: xFB(1) = 2/c > 1/c).

Page 240: Optimal Control Theory With Applications in Economics

Mechanism Design 227

allows a monopolist (the principal) to screen the agents. The productattributes can be used as instruments in the screening process. Thescreening problem was first examined as such by Stiglitz (1975) (in thecontext of mitigating adverse selection in a job market). Using mul-tiple instruments to screen consumers of one-dimensional type wasconsidered by Matthews and Moore (1987), and the inverse case of asingle instrument (price) given consumers of multidimensional typesby Laffont, Maskin, and Rochet (1987), among others. This line ofwork on nonlinear pricing originates with Mussa and Rosen (1978),based on methods developed earlier by Mirrlees (1971) in the con-text of optimal income taxation; they treated the case for consumersof a single characteristic and single vertical-attribute products. Wil-son (1993) and Armstrong (1996) provided generalizations for fullynonlinear pricing models in the multiproduct case. A multidimen-sional screening model generalizing these approaches was advancedby Rochet and Choné (1998). Rochet and Stole (2003) provided anexcellent overview of recent results. Weber (2005b) generalized thescreening problem to allow for externalities between different agenttypes.

For a more general overview of mechanism design, see Hurwicz andReiter (2006). Williams (2008) focused on mechanisms with differen-tiable allocation functions using methods from differential geometry.

5.6 Exercises

5.1 (Screening) Assume that you are the product manager for a com-pany that produces digital cameras. You have identified two consumertypes θ ∈ � = {θL, θH}, “amateurs” (θL) and “professionals” (θH), whereθH > θL > 0. Based on the results of a detailed survey of the two groupsyou find that the choice behavior of a type-θ consumer can be repre-sented approximately by the utility function

U(t, x, θ ) = θ (1 − (1 − x)2)/2 − t,

where t is the price of a camera and x is an internal (scalar) quality indexthat you have developed, which orders the different camera models ofthe company. Aconsumer’s utility is zero if he does not buy. Assume thatthe production cost of a digital camera of quality x is C(x) = cx, wherec ∈ (0, 1). The survey also showed that the proportion of professionalsamong all consumers is equal to μ ∈ (0, 1).

Page 241: Optimal Control Theory With Applications in Economics

228 Chapter 5

a. If your company has production capacity for only a single cameramodel, which can only be sold at a single price, determine the profit-maximizing price and quality, (tm, xm), of that product. Under whatconditions can you avoid shutdown, so that both amateurs and pro-fessionals would end up buying this product? What are the company’sprofits, �m?

b. If you could perfectly distinguish amateurs from professionals, whatwould be the profit-maximizing menu of products, {(ti, xi)}i∈{L,H}, tooffer? Determine the associated first-best level of profits, �.

c. If you cannot distinguish between consumer types, what is theoptimal second-best menu of products, {(t∗i , x∗

i )}i∈{L,H}? Determine yourcompany’s second-best profits �∗ in that case.

d. Determine the optimal nonlinear menu of products,

{(t∗(θ ), x∗(θ ))}θ∈�,

for the case where � = [0, 1] and all types are equally likely.

5.2 (Nonlinear Pricing with Congestion Externality) Consider a con-tinuum of agents (or agent types), indexed by θ ∈ � = [0, 1] anddistributed with the continuous probability density f (θ ) = F(θ ) > 0(where F is the associated cumulative distribution function). Eachagent θ likes to consume bandwidth x(θ ) ∈ [0, 1]; and he also cares aboutthe aggregate bandwidth consumption

y =∫ 1

0x(θ ) dF(θ ).

If agent θ has to pay t(θ ) for his bandwidth consumption, then his netutility is

U(t, x, y, θ ) = θx(1 −αy) − t,

whereα ∈ [0, 1] describes the degree to which the agent is affected by thecongestion externality. Any agent can also choose to consume nothing,in which case his net utility is zero. A principal13 would like to constructan optimal nonlinear screening contract (t, x) : � → R × [0, 1] so as tomaximize her expected profit

13. Assume that the principal knows only the distribution of agent types but cannotdistinguish them other than by offering them a menu of options in the form of a screeningcontract.

Page 242: Optimal Control Theory With Applications in Economics

Mechanism Design 229

V(t, x) =∫ 1

0

(t(θ ) − c(x(θ ))2

2

)dF(θ ),

where c > 1 is a cost parameter.

a. Formulate the principal’s mechanism design problem as an OCP.

b. Find the optimal screening contract (t∗(θ ), x∗(θ )), θ ∈ �.

c. Using the taxation principle, convert the optimal screening contractof part b into an optimal price-quantity schedule τ ∗(x), x ∈ [0, 1], thatthe principal can actually advertise.

d. Analyze and interpret the dependence of your solution on α ∈ [0, 1].What is the effect of the congestion externality?

Page 243: Optimal Control Theory With Applications in Economics
Page 244: Optimal Control Theory With Applications in Economics

Appendix A: Mathematical Review

This appendix provides a loose collection of definitions and key resultsfrom mathematics that are used in the main text. In terms of notation,∃ is often used for “there exist(s),” ∀ for “for all,” and ∀ for “for almostall.” Also, “a.e.” stands for “almost everywhere” and “a.a.” for “almostall” or “almost always,” typically leaving out a set of Lebesgue measurezero from consideration when saying “almost.” The abbreviation “s.t.”is sometimes used instead of “subject to.” The set R is the set of allreal numbers; C = R + iR is the set of all complex numbers (where i =√−1);1 R+ = [0, ∞) is the set of all nonnegative real numbers; and R++ =(0, ∞) is the set of all positive real numbers. Consider two vectors, x, y ∈Rn, where n ≥ 2 is an integer, so that x = (x1, . . . , xn) and y = (y1, . . . , yn).Then x ≥ y if xj ≥ yj for all j ∈ {1, . . . , n}; x > y if x ≥ y and at least onecomponent of x is strictly greater than the corresponding componentof y; and x � y if xj > yj for all j ∈ {1, . . . , n}. For example, Rn+ = {x ∈Rn : x ≥ 0}, Rn++ = {x ∈ Rn : x � 0}, and Rn+ \ {0} = {x ∈ Rn : x > 0}.

A.1 Algebra

Consider the n-dimensional Euclidean space Rn. The scalar product oftwo vectors x = (x1, . . . , xn) and y = (y1, . . . , yn) in Rn is defined as

〈x, y〉 =n∑

i=1

xiyi.

With the scalar-product notation, the Euclidean norm ‖x‖ of x canbe expressed as ‖x‖ = √〈x, x〉. The vectors x1, . . . , xk ∈ Rn are said tobe linearly dependent if there exists a nonzero vector λ = (λ1, . . . , λk)

1. One of the most beautiful relations in mathematics is Euler’s identity: eiπ + 1 = 0.

Page 245: Optimal Control Theory With Applications in Economics

232 Appendix A

such that∑k

j=1 λjxj = 0; otherwise, the vectors are called linearlyindependent.

An m × n matrix A = [aij]m,ni,j=1 is a rectangular array of real numbers,

arranged in m rows and n columns. If the number of rows equals thenumber of columns (i.e., m = n), then the matrix A is square. The sumof two m × n matrices A = [aij]m,n

i,j=1 and B = [bij]m,ni,j=1 is given by A + B =

[aij + bij]m,ni,j=1 and is obtained by summing the corresponding elements

in A and B. The product of a matrix A with a scalar α ∈ R is given byαA = [α aij]m,n

i,j=1. The product of that matrix with an n × l matrix

B = [bij]n,li,j=1 is given by the m × l matrix AB = [∑n

k=1 aikbkj]m,l

i,j=1.2 Forconvenience, the index notation is dropped when the context is clear.The transpose of the m × n matrix A = [aij] is the n × m matrix A′ = [a′

ij]with a′

ij = aji for all i, j. If A is square (i.e., m = n) and A′ = A, it is calledsymmetric. The square matrix A has an inverse, denoted by A−1, if theproduct AA−1 is equal to the identity matrix I, which is square and suchthat all its entries are zero except for the diagonal entries, which are equalto 1. Such an inverse exists if and only if A = [aij]n

i,j=1 is nonsingular, thatis, if all its row vectors (aij)n

j=1, i ∈ {1, . . . , n}, are linearly independent. Inother words, the square matrix is nonsingular if and only if the equa-tion Ax = 0 (where the vector x = (x1, . . . , xn)′ is interpreted as an n × 1matrix) has the unique solution x = 0.

The rank of an m × n matrix A is equal to the maximum number of itsrow vectors that are linearly independent. The matrix is called of full rankif its rank is maximal, that is, equal to min{m, n}. Thus, a square matrix Ais nonsingular if and only if it is of full rank. An alternative criterioncan be formulated in terms of the determinant, which is defined by therecursive Laplace expansion rule, for any fixed j ∈ {1, . . . , n},

det A =n∑

i=1

aijAij

(=

n∑i=1

ajiAji

),

where Aij is the (sub-)determinant of matrix A after its ith row and its jthcolumn have been removed. The determinant of a 1 × 1 matrix is equalto its only element. Then, the square matrix A is nonsingular if and onlyif its determinant is nonzero.

Right-multiplying the row vector x = (x1, . . . , xn)′ with the n × nmatrix A yields the linear transformation Ax. Of particular interest are the

2. Matrix multiplication is not commutative.

Page 246: Optimal Control Theory With Applications in Economics

Appendix A 233

eigenvectors v = (v1, . . . , vn) �= 0, for which there exists a scalar λ (whichis generally a complex number) such that Av = λv. That means that thelinear transformation Av of an eigenvector leaves the vector essentiallyunchanged, except for a possible expansion (|λ| ≥ 1) or contraction(|λ| ≤ 1) of v (together with a possible 180-degree rotation). Accordingly,each eigenvalue λ of A is such that Av = λv for some eigenvector v. Thelast relation is equivalent to the matrix equation (A − λI)v = 0, whichhas a nonzero solution v if and only if the matrix A − λI is singular, sothat the characteristic equation,

det (A − λI) = 0,

is satisfied. The eigenvalues of A are the solutions of the last equa-tion, which correspond to the roots of an nth degree polynomial. A rootof an nth order polynomial p(λ) = α0 +α1λ+α2λ

2 + · · · +αnλn is such

that p(λ) = 0.

Proposition A.1 (Fundamental Theorem of Algebra) Letα = (α1, . . . ,αn) ∈ Rn \ {0}. Then the polynomial p(λ) = ∑n

i=k αkλk has

exactly n (complex) roots, λ1, . . . , λn ∈ C.

Proof See Hungerford (1974, 265–267). n

Thus, the eigenvalues of A are in general complex numbers (i.e.,elements of C). Note also that A is singular if and only if one of itseigenvalues is zero. If A is nonsingular, with eigenvalues λ1, . . . , λn,then the eigenvalues of its inverse A−1 are given by 1/λ1, . . . , 1/λn,whereas the eigenvalues of its transpose A′ are the same as those of A.

A symmetric square matrix A is called positive semidefinite if

x′Ax ≥ 0, ∀ x ∈ Rn \ {0}.If the previous inequality is strict, then A is called positive definite. Sim-ilarly, A is called negative (semi)definite if −A is positive (semi)definite.A symmetric matrix has real eigenvalues; if the matrix is also positive(semi)definite, then its eigenvalues are positive (resp., nonnegative).

A.2 Normed Vector Spaces

A set S is a collection of elements, which can be numbers, actions, out-comes, or any other objects. The set of nonnegative integers, {0, 1, 2, . . .},is denoted by N. A field F = (S, +, ·) is a set S together with the binary

Page 247: Optimal Control Theory With Applications in Economics

234 Appendix A

operations of addition, + : S × S → S, and multiplication, · : S × S →S, with the following field properties for all a, b, c ∈ S:

• Commutativity. a + b = b + a and a · b = b · a.• Associativity. (a + b) + c = a + (b + c) and (a · b) · c = a · (b · c).• Distributivity. a · (b + c) = a · b + a · c.• Additive identity. There is a zero element, 0 ∈ S, such that a + 0 = a,independent of which a ∈ S is chosen.• Additive inverse. There exists (in S) an additive inverse of a, denotedby −a, such that a + ( − a) = 0.• Multiplicative identity. There is a one element, 1 ∈ S, such that a · 1 = a,independent of which a ∈ S is chosen.• Multiplicative inverse. There exists (in S) a multiplicative inverse ofa �= 0, denoted by 1/a, such that a · (1/a) = 1.• Closure. a + b ∈ S and a · b ∈ S.

The set R of all real numbers and the set C of all complex numberstogether with the standard addition and multiplication are fields. Forsimplicity R and C are often used instead of (R, +, ·) and (C, +, ·). Avector space or linear space (over a field F = (S, +, ·) is a set X of objects,called vectors, together with the binary operations of (vector) addition,+ : X × X → X , and (scalar) multiplication, · : S × X → X , such that thefollowing vector-space properties are satisfied:

• The vector addition is commutative and associative.• There is a zero vector (referred to as origin), 0 ∈ X , such that x + 0 = xfor all x ∈ X .• For any x ∈ X there is an additive inverse −x ∈ X such that x + (−x) = 0.• For any scalars a, b ∈ S and all vectors x, y ∈ X it is a · (b · x) = (a · b) · x(associativity) as well as a · (x + y) = a · x + b · y and (a + b) · x = a · x + b · x(distributivity).• If 1 is an additive identity in S, then 1 · x = x for all x ∈ X .

In this book attention is restricted to the fields F ∈ {R, C} of the realand complex numbers. “Vector space X ” usually means a vectorspace (X , +, ·) over the fields (R, +, ·) or (C, +, ·); in the latter case itmight be referred to an “complex vector space X .”

Example A.1 As prominent examples of vector spaces consider first theset Rn of real n-vectors (for any given integer n ≥ 1), second the set of all

Page 248: Optimal Control Theory With Applications in Economics

Appendix A 235

sequences {x0, x1, . . .} = {xk}∞k=0 formed of elements xk ∈ Rn, and thirdthe set C0([a, b], Rn) of continuous functions f : [a, b] → Rn for given realnumbers a, b with a < b.3 �

Anorm on a vector space X is a real-valued function ‖ · ‖ on X such thatfor all vectors x, y ∈ X and any scalar a ∈ F the following norm propertieshold:

• Positive definiteness. ‖x‖ ≥ 0, and ‖x‖ = 0 if and only if x = 0.• Positive homogeneity. ‖ax‖ = |a|‖x‖.• Triangular inequality. ‖x + y‖ ≤ ‖x‖ + ‖y‖.

The ordered pair (X , ‖ · ‖) is termed a normed vector space (or normedlinear space). The set Y ⊆ X is a (linear) subspace of X (or more precisely,of (X , ‖ · ‖)) if x, y ∈ Y ⇒ ax + by ∈ Y , for any scalars a and b.

Example A.2 (1) "np = (Fn, ‖ · ‖p) is a normed vector space, where for

any number p with 1 ≤ p ≤ ∞ the p-norm on the vector space Fn isdefined by

‖a‖p ={ (∑n

i=1 |ai|p)1/p if 1 ≤ p < ∞,

max{|a1|, . . . , |an|} if p = ∞,

for all a = (a1, . . . , an) ∈ Fn.(2) The collection of all sequences x = {xk}∞k=0 with xk ∈ Fn, k ≥ 0, for

which with p-norm

‖x‖p ={ (∑n

i=1 ‖xk‖p)1/p if 1 ≤ p < ∞,

sup{‖xk‖ : k ∈ N} if p = ∞,

}< ∞

is a normed vector space (referred to as "p).(3) Let η be a positive measure on a σ -algebra � of subsets of a

nonempty measurable set S. For any p with 1 ≤ p ≤ ∞ the set Lp(S,�, η)of Lebesgue-integrable functions f , which are such that

‖x‖p ={ (∫

S ‖f (x)‖pdη(x))1/p if 1 ≤ p < ∞

inf{c > 0 : η({x ∈ S : ‖f (x)‖ > c}) = 0} if p = ∞

}< ∞,

is a normed vector space.

3. The concept of a function is introduced in section A.3.

Page 249: Optimal Control Theory With Applications in Economics

236 Appendix A

(4) Let � ⊂ Rn be a nonempty compact (i.e., closed and bounded)set. The space C0(�, Rn) of continuous functions f : � → Rn with themaximum norm

‖f ‖∞ = maxx∈� ‖f (x)‖,

where ‖ · ‖ is a suitable norm on Rn, is a normed vector space. �

Remark A.1 (Norm Equivalence) One can show that all norms on Fn areequivalent in the sense that for any two norms ‖ · ‖ and | · | on Fn

there exist positive constants α,β ∈ R++ such that α‖x‖ ≤ |x| ≤ β‖x‖,for all x ∈ Fn. �

A sequence x = {xk}∞k=0 of elements of a normed vector space X con-verges to a limit x, denoted by limk→∞ xk = x, if for any ε > 0 there existsan integer N = N(ε) > 0 such that

k ≥ N ⇒ ‖xk − x‖ ≤ ε.

It follows immediately from this definition that the limit of a sequence(if it exists) must be unique. If any convergent sequence {xk}∞k=0 ⊂ X hasa limit that lies in X , then X is called closed. The closure X of a set X isthe union of X and all its accumulation points, that is, the limits of anyconvergent (sub)sequences {xk}∞k=0 ⊂ X . The set X is open if for any x ∈ Xthere exists ε > 0 so that the ε-ball Bε(x) = {x : ‖x − x‖ < ε} is a subsetof X .

Let X be a nonempty subset of a normed vector space. A family ofopen sets is an open cover of X if their union contains X . The set X iscalled compact if from any open cover of X one can select a finite numberof sets (i.e., a finite subcover) that is also an open cover of X . It turnsout that in a finite-dimensional vector space this definition leads to thefollowing characterization of compactness:

X is compact ⇔ X is closed and bounded.4

Compact sets are important for the construction of optimal solutionsbecause every sequence of elements of such sets has a convergentsubsequence.

Proposition A.2 (Bolzano-Weierstrass Theorem) Every sequence x ={xk}∞k=0 in a compact subset X of a normed vector space has a convergentsubsequence (with limit in X ).

4. The set X is bounded if there exists M > 0 such that ‖x‖ ≤ M for all x ∈ X .

Page 250: Optimal Control Theory With Applications in Economics

Appendix A 237

Proof Let S = ⋃∞k=0{xk} ⊂ X be the range of the sequence x. If S is finite,

then there exist indices k1 < k2 < k3 < · · · and y ∈ S such that xk1 =xk2 = xk3 = · · · = y. The subsequence {xkj }∞j=1 converges to y ∈ X . If Sis infinite, then it contains a limit point, y.5 Indeed, if it did not con-tain a limit point, then each y ∈ S must be an isolated point of S (seefootnote 6), that is, there exists a collection of open balls, {By}y∈S ,which covers S and which is such that By ∩ S = {y} for all y ∈ S. Butthen X cannot be compact, since S ⊂ X and an open cover of X canbe found that does not have a finite subcover of X . It is thereforeenough to select a subsequence {xkj}∞j=1 of x such that k1 < k2 < · · · andxkj ∈ {y ∈ S : ‖y − y‖ < 1/j} for all j ≥ 1, which implies that xkj → yas j → ∞, completing the proof. n

The sequence {xk}∞k=0 is a Cauchy sequence if for any ε > 0 there existsan integer K = K(ε) > 0 such that

k, l ≥ K ⇒ ‖xk − xl‖ ≤ ε.

Note that any Cauchy sequence is bounded, since for ε = 1 thereexists K > 0 such that ‖xk − xK‖ ≤ 1 for all k ≥ K, which impliesthat

‖xk‖ = ‖xk − xK + xK‖ ≤ ‖xk − xK‖ + ‖xK‖ ≤ 1 + ‖xK‖,

for all k ≥ K. On the other hand, any convergent sequence x is a Cauchysequence, since (using the notation of the earlier definition of limit)

k, l ≥ K(ε) = N(ε

2

)⇒ ‖xk − xl‖ ≤ ‖xk − x + x − xl‖ ≤ ‖xk − x‖

+ ‖x − xl‖ ≤ ε.

A normed vector space X in which the converse is always true, that is, inwhich every Cauchy sequence converges, is called complete. A completenormed vector space is also referred to as a Banach space.

Example A.3 Let 1 ≤ p ≤ ∞.(1) The normed vector space "n

p is complete. Indeed, if {ak}∞k=0 isa Cauchy sequence, then it is bounded. By the Bolzano-Weierstrasstheorem any bounded sequence in Fn contains a convergent subse-quence {ak}∞k=0, the limit of which is denoted by a. Thus, for any ε > 0there exists N(ε) > 0 such that ‖ak − a‖ ≤ ε for all k ≥ N(ε). This in

5. A point y is a limit point of S if for any ε > 0 the set {x ∈ S : ‖x − x‖ < ε} contains apoint y ∈ S different from y.

Page 251: Optimal Control Theory With Applications in Economics

238 Appendix A

turn implies that the original sequence {ak}∞k=0 converges to a, since byconstruction

k ≥ k0 = max{

K(ε

2

), N(ε

2

)}⇒ ‖ak − a‖ ≤ ‖ak − ak0‖

+ ‖ak0 − a‖ ≤ ε

2+ ε

2= ε.

Because of the norm equivalence in Fn (see remark A.1) it is unimportantwhich norm ‖ · ‖ is considered.

(2) One can as well show (see, e.g., Luenberger 1969, 35–37) thatthe other spaces introduced in example A.2 (namely, "p, Lp(S,�, η),and C0(�, Rn)) are also complete. �

Proposition A.3 (Banach Fixed-Point Theorem) (Banach 1922) Let �be a closed subset of a Banach space X , and let f : � → �be a contractionmapping in the sense that

‖ f (x) − f (x)‖ ≤ K‖x − x‖, ∀ x, x ∈ �,

for some K ∈ [0, 1). Then there exists a unique x∗ ∈ � for which

f (x∗) = x∗.

Moreover, the fixed point x∗ of f can be obtained by the method of succes-sive approximation, so that starting from any x0 ∈ � and setting xk+1 =f (xk) for all k ≥ 0 implies that

limk→∞

xk = limk→∞

f k(x0) = x∗.

Proof Let x0 ∈ � and let xk = f (xk) for all k ≥ 0. Then the sequence{xk}∞k=0 ⊂ � is a Cauchy sequence, which can be seen as follows. Forany k ≥ 0 the difference of two subsequent elements of the sequence isbounded by a fraction of the difference of the first two elements, since

‖x k+1 − x k‖ = ‖f (x k) − f (x k−1)‖ ≤ K‖x k − x k−1‖ ≤ · · · ≤ Kk‖x1 − x0‖.

Thus, for any l ≥ 1 it is

‖x k+l − x k‖ ≤ ‖x k+l − x k+l−1‖ + ‖x k+l−1 − x k+l−2‖ + · · · + ‖x k+1 − x k‖≤ (Kk+l−1 + Kk+l−2 + · · · + Kk)‖x1 − x0‖

≤ Kk∞∑κ=0

Kκ‖x1 − x0‖ = Kk

1 − K‖x1 − x0‖ → 0 as k → ∞.

Page 252: Optimal Control Theory With Applications in Economics

Appendix A 239

Hence, by completeness of the Banach space X , it is limk→∞ xk = x∗ ∈ X .In addition, since � is by assumption closed, it is also x∗ ∈ �. Note alsothat because

‖ f (x∗) − x∗‖ ≤ ‖f (x∗) − xk‖ + ‖xk − x∗‖ ≤ K‖x∗ − xk−1‖ + ‖xk − x∗‖ → 0,

as k → ∞, one can conclude that f (x∗) = x∗, that is, x∗ is indeed a fixedpoint of the mapping f . Now if x∗ is another fixed point, then

‖x∗ − x∗‖ = ‖ f (x∗) − f (x∗)‖ ≤ K‖x∗ − x∗‖,

and necessarily x∗ = x∗, since K < 1. This implies uniqueness of thefixed point and concludes the proof. n

The Banach fixed-point theorem is sometimes also referred to as con-traction mapping principle. It is used in chapter 2 to establish the existenceand uniqueness of the solution to a well-posed initial value problem(IVP) (see proposition 2.3). The following example illustrates how thecontraction mapping principle can be used to establish the existence anduniqueness of a Nash equilibrium in a game of complete information.

Example A.4 (Uniqueness of a Nash Equilibrium) Consider a two-playerstatic game of complete information in the standard normal-form rep-resentation � (see section 4.2.1), with action sets Ai = [0, 1] and twicecontinuously differentiable payoff functions Ui : [0, 1]2 → [0, 1] for i ∈{1, 2}. If 0 < ri(a−i) < 1 is a best response for player i, it satisfies

Uiai (r

i(a−i), a−i) = 0.

Differentiation with respect to a−i yields

dri(a−i)da−i

= −Uiaia−i (r

i(a−i), a−i)

Uiaiai (ri(a−i), a−i)

.

Set r(a) = (r1(a2), r2(a1)); then any fixed point a∗ = (a1∗, a2∗) of r is a Nashequilibrium. Note that r maps the set of strategy profiles A = [0, 1]2 intoitself. Let a, a be two strategy profiles. Then by the mean-value theorem(see proposition A.14),

‖r(a) − r(a)‖1 = |r1(a2) − r1(a2)| + |r2(a1) − r2(a1)|≤ max{L1, L2}(|a1 − a1| + |a2 − a2|)= max{L1, L2}‖a − a‖1,

Page 253: Optimal Control Theory With Applications in Economics

240 Appendix A

Figure A.1Uniqueness of a Nash equilibrium by virtue of the Banach fixed-point theorem (seeexample A.4).

where ‖ · ‖1 is the "1-norm on the vector space R2 (see example A.2), and

Li = max(ai , a−i)∈A

∣∣∣∣∣Uiaia−i (a

i, a−i)

Uiaiai (ai, a−i)

∣∣∣∣∣ , ∀ i ∈ {1, 2}.

Thus, if L1, L2 < 1, then r is a contraction mapping, and by the Banachfixed-point theorem there exists a unique Nash equilibrium a∗ = r(a∗),that is, a fixed point of r, and this equilibrium can be found starting at anarbitrary strategy profile a ∈ A by successive approximation: iterativeapplication of r. That is, r(r(r( · · · r(a) · · · ))) → a∗ for any a ∈ A. Fig-ure A.1 illustrates the successive-approximation procedure. Note thatthis example can be generalized to games with N > 2 players. �

A.3 Analysis

Let X and Y be nonempty subsets of a normed vector space. A rela-tion that maps each element x ∈ X to an element f (x) = y ∈ Y is calleda function, denoted by f : X → Y . The sets X and Y are then called the

Page 254: Optimal Control Theory With Applications in Economics

Appendix A 241

domain and co-domain of f , respectively. If X is a normed vector spacecontaining functions, then f is often referred to as a functional (or oper-ator). The function is continuous at a point x ∈ X if for any ε > 0 thereexists a δ > 0 such that 6

∀ x ∈ X : ‖ x − x‖ < δ ⇒ ‖f (x) − f (x)‖ < ε.

The function is continuous (on X ) if it is continuous at any point of X .Note that sums, products, and compositions of continuous functionsare also continuous.7 A useful alternative characterization of continu-ity is as follows: a function f : X → Y is continuous on its domain Xif and only if the preimage of any open subset G ⊂ Y , denoted byf −1(G) ≡ {x ∈ X : f (x) ∈ G}, is open. Continuous transformations, whichare continuous mappings, are important in topology. A closed curvein R2, for example, is a continuous transformation of a unit cir-cle. A weaker notion than continuity is measurability. The function fis measurable if the preimage of any measurable set is a measurableset.8 The measurable function f is essentially bounded on the measur-able set G if there exists a constant M > 0 such that {x ∈ G : ‖f (x)‖ > M}is of (Lebesgue) measure zero.

The following is a classic result in topology, which is used in the proofof the Poincaré-Bendixson theorem (see proposition 2.13).

Proposition A.4 (Jordan Curve Theorem) (Jordan 1909) Every simple(i.e., self-intersection-free) closed curve in R2 divides the plane into twodisjunct pieces, the inside and the outside.

Proof See Armstrong (1983, 112–114).9 n

The next result characterizes the existence of a convergent sequence offunctions, which are all defined on a compact set. It is therefore similarto the Bolzano-Weierstrass theorem (see proposition A.2) for sequencesdefined on compact sets.

6. The definition of continuity is automatically satisfied at an isolated point of X , i.e., at apoint x ∈ X for which there exists ε > 0 such that {x ∈ X : ‖x − x‖ < ε} = {x}. Thus, anyfunction is continuous on a finite set.7. The same holds true for quotients f /g of continuous real-valued functions f and g aslong as g never vanishes.8. For a brief introduction to measure theory and Lebesgue integration, see, e.g., Kirillovand Gvishiani (1982).9. Guillemin and Pollack (1974, 85–89) provide an interesting outline for the proof ofthe Jordan-Brouwer separation theorem, which can be viewed as a generalization of theJordan curve theorem to higher dimensions.

Page 255: Optimal Control Theory With Applications in Economics

242 Appendix A

Proposition A.5 (Arzelà-Ascoli Theorem) Let X be a compact subsetof a normed vector space, and let F be a family of functions f : X → Rn.Then every sequence { fk}∞k=0 ⊂ F contains a uniformly convergent sub-sequence if and only if F is uniformly bounded and equicontinuous.10

Proof See, for example, Zorich (2004, II, 398–399). n

If x = (x1, . . . , xn) is an interior point of X ⊂ Rn, then the function fis (partially) differentiable with respect to (the ith coordinate) xi if the limit,called partial derivative,

fxi (x) ≡ limδ→0

f (x + δei) − f (x)δ

(= ∂f (x)

∂xi

),

exists, where ei ∈ Rn is the ith Euclidean unit vector (such that 〈x, ei〉 =xi). The function is called differentiable at x if its partial derivatives withrespect to all coordinates exist, and it is called differentiable (on an openset X ) if it is differentiable at any point x ∈ X . The Jacobian (matrix)of f = (f1, . . . , fm)′ : X → Rm is the matrix of partial derivatives,

fx(x) =[∂fi(x)∂xj

]n,m

i,j=1

.

If the Jacobian is itself (componentwise) differentiable, then the tensorfxx(x) of second derivatives is called the Hessian. In particular, for m = 1it is

fxx(x) =[∂2f (x)∂xi∂xj

]n,n

i,j=1

.

This Hessian matrix is symmetric if it is continuous, a statement alsoknown as Schwarz’s theorem (or Clairaut’s theorem).

Example A.5 (Total Derivative) The total derivative of a differentiablefunction F : R → Rn is the differential of F(t) with respect to the indepen-dent variable t: it is given by Ft(t) = limδ→0

(F(t + δ) − F(t)

)/δ, analogous

to relation (2.1) in chapter 2, and it is usually denoted by F(t). �

10. The family of functions is uniformly bounded (on X ) if the union of images⋃

f ∈F f (X ) isbounded. It is equicontinuous (on X ) if for any ε > 0 there exists δ > 0 such that ‖ x − x‖ <δ ⇒ ‖f (x) − f (x)‖ < ε, for all f ∈ F and all x, x ∈ X . A sequence of functions { fk}∞k=0 (allwith the same domain and co-domain) converges uniformly to the limit function f (with thesame domain and co-domain), denoted by fk ⇒ f , if for any ε > 0 there exists K > 0 suchthat ‖ fk − f ‖ < ε for all k ≥ K on the entire domain.

Page 256: Optimal Control Theory With Applications in Economics

Appendix A 243

The inverse operation of differentiation is integration.11 Let t0, Twith t0 < T be given real numbers. In order for a function F : [t0, T] → Rn

to be representable as an integral of its derivative F = f it is not necessarythat f be continuous, only that it be integrable.

Proposition A.6 (Fundamental Theorem of Calculus) If the real-valuedfunction F is equal to the anti-derivative of the function f a.e. on the in-terval [t0, T], namely,

F(t) = f (t), ∀ t ∈ [t0, T],and f is essentially bounded, then

F(T) − F(t0) =∫ T

t0

f (t) dt.

Remark A.2 (Absolute Continuity) A function F : [t0, T] → Rn is abso-lutely continuous (on [t0, T]) if for any ε > 0 there exists δ > 0 such thatfor any pairwise disjoint intervals Ik ⊂ [t0, T], k ∈ {1, 2, . . . , N} (allowingfor countably many intervals, so that possibly N = ∞) it is

N∑k=1

diam Ik < δ ⇒N∑

k=1

diam f (Ik) < ε,

where the diameter of a set S is defined as the largest Euclidean distancebetween two points in S, diam S = sup{‖x − x‖ : x, x ∈ S}. Intuitively, afunction is therefore absolutely continuous if the images f (Ik) togetherstay small whenever the preimages Ik stay small. One can show thatan absolutely continuous function F is differentiable with F = f a.e.,and

F(t) = F(t0) +∫ t

t0

f (t) dt, ∀ t ∈ [t0, T].

The set of all absolutely continuous functions defined on [t0, T] isgiven by the Sobolev space W1,∞[t0, T]. The total derivatives of functionsin W1,∞[t0, T] are then elements of the normed vector space of essentiallybounded integrable functions, L∞[t0, T] (see example A.2). �

11. A brief summary of the theory of Lebesgue integration (or alternatively, measuretheory) is provided by Kirillov and Gvishiani (1982, ch. 2); for more detail, see, e.g.,Lang (1993).

Page 257: Optimal Control Theory With Applications in Economics

244 Appendix A

Continuity and differentiability of functions at a point can be sufficientfor local solvability of implicit equations. The following is a key resultin nonlinear analysis.

Proposition A.7 (Implicit Function Theorem) Let X , Y , Z be normedvector spaces (e.g., Rm, Rn, and Rl) such that Y is also complete. Supposethat (x, y) ∈ X × Y and let

Bε(x, y) = {(x, y) ∈ X × Y : ‖x − x‖ + ‖y − y‖ < ε}define an ε-neighborhood of the point (x, y). Assume that the functionF : W → Z is such that F(x, y) = 0, F( · ) is continuous at (x, y), and Fis differentiable on W with a derivative that is continuous at (x, y). Ifthe linear mapping y �→ z = Fy(x, y) y is invertible, then there exist anopen set U × V , which contains (x, y), and a function f : U → V , suchthat (1) U × V ⊂ W ; (2) ∀ (x, y) ∈ U × V : F(x, y) = 0 ⇔ y = f (x); (3) y =f (x); and (4) f ( · ) is continuous at x.

Proof See, for example, Zorich (2004, vol. 2, 97–99). n

Remark A.3 (Implicit Smoothness) It is notable that the smoothnessproperties of the original function F in proposition A.7 at the point (x, y)imply the same properties of the implicit function f at x (where f (x) = y).For example, if F is continuous (resp., r-times continuously differen-tiable) in a neighborhood of (x, y), then f is continuous (resp., r-timescontinuously differentiable) in a neighborhood of x. �

The implicit function theorem can be used to prove the follow-ing result, which is useful in many practical applications that involveimplicit differentiation.

Proposition A.8 (Inverse Function Theorem) Let X , Y be normed vec-tor spaces, such that Y is complete. Assume that G ⊂ Y is an openset that contains the point y ∈ Y , and that g : G → X is a differen-tiable function such that the derivative gy is continuous at y. If thelinear transformation y �→ gy(y) y is invertible, then there exist opensets U and V with ( g(y), y) ∈ U × V , such that (1) U × V ⊂ X × Y andthe restriction of g to V (i.e., the function g : V → U) is bijective,12 andits inverse f : U → V is continuous on U and differentiable at x = g(y),and fx(x) = (gy(y))−1.

12. The function g : V → U is called bijective if it is both injective, i.e., y �= y implies thatg(y) �= g(y), and surjective, i.e., its image g(V) is equal to its co-domain U .

Page 258: Optimal Control Theory With Applications in Economics

Appendix A 245

Analogous to the statements in remark A.3, local smoothness proper-ties of g (e.g., continuity, r-fold differentiability) imply the correspond-ing local smoothness properties of f . This section concludes by providingthe proof of a well-known inequality13 that is used in chapter 2 (see proofof proposition 2.5).

Proposition A.9 (Gronwall-Bellman Inequality) Let α, β, x : [t0, T] →R be continuous functions such that β(t) ≥ 0 for all t ∈ [t0, T], givensome t0, T ∈ R with t0 < T. If

x(t) ≤ α(t) +∫ t

t0

β(s)x(s) ds, ∀ t ∈ [t0, T], (A.1)

then also

x(t) ≤ α(t) +∫ t

t0

α(s)β(s) exp[∫ t

sβ(θ ) dθ

]ds, ∀ t ∈ [t0, T]. (A.2)

Proof Let y(t) = ∫ tt0β(s)x(s) ds. Then by assumption

�(t) ≡ α(t) + y(t) − x(t) ≥ 0. (A.3)

Moreover, y(t) = β(t)x(t) = β(t)(α(t) + y(t) −�(t)

), yielding a linear IVP

(with variable coefficients),

y −β(t)y = (α(t) −�(t))β(t), y(t0) = 0.

Thus, the Cauchy formula in proposition 2.1 and inequality (A.3)together imply that

y(t) =∫ t

t0

(α(s) −�(s))β(s) exp[∫ t

sβ(θ ) dθ

]ds

≤∫ t

t0

α(s)β(s) exp[∫ t

sβ(θ ) dθ

]ds, (A.4)

for all t ∈ [t0, T]. By (A.1) it is x(t) ≤ α(t) + y(t), which, using inequal-ity (A.4), implies (A.2). n

Remark A.4 (Simplified Gronwall-Bellman Inequality) (1) The statementof the inequalities in proposition A.9 simplifies when α(t) ≡ α0 ∈ R, inwhich case

13. This result is ascribed to Gronwall (1919) and Bellman (1953).

Page 259: Optimal Control Theory With Applications in Economics

246 Appendix A

x(t) ≤ α0 +∫ t

t0

β(s)x(s) ds, ∀ t ∈ [t0, T],

implies that

x(t) ≤ α0 exp[∫ t

t0

β(θ ) dθ]

, ∀ t ∈ [t0, T].

(ii) If in addition β(t) ≡ β0 ≥ 0, then

x(t) ≤ α0 +β0

∫ t

t0

x(s) ds, ∀ t ∈ [t0, T],

implies that

x(t) ≤ α0 exp [β0(t − t0)] , ∀ t ∈ [t0, T]. �

A.4 Optimization

In neoclassical economics, optimal choice usually corresponds to anagent’s selecting a decision x from a set X ⊂ Rn of feasible actionsso as to maximize his objective function f : X → R. This amounts tosolving

maxx∈X

f (x). (A.5)

The following result provides simple conditions that guarantee theexistence of solutions to this maximization problem.

Proposition A.10 (Weierstrass Theorem) Let X ⊂ Rn be a compact set.Any continuous function f : X → R takes on its extrema (i.e., its mini-mum and maximum) in X , that is, there exist constants m, M ∈ R andpoints x

¯, x ∈ X such that

m = min{f (x) : x ∈ X } = f (x¯) and M = max{f (x) : x ∈ X } = f (x).

Proof See Bertsekas (1995, 540–541). n

If the objective function f is differentiable at an extremum, then thereis a simple (first-order) necessary optimality condition.14

14. If f is concave and the domain X is convex, then by the Rademacher theorem (Magaril-Il’yaev and Tikhomirov 2003, 160) f is almost everywhere differentiable and its set-valuedsubdifferential ∂f (x) exists at any point of x. The Fermat condition at a point x where fis not differentiable becomes 0 ∈ ∂f (x). For example, when n = 1 and f (x) = −|x|, then fattains its maximum at x = 0 and 0 ∈ ∂f (x) = [−1, 1].

Page 260: Optimal Control Theory With Applications in Economics

Appendix A 247

0

Figure A.2The continuous function f : [x0, x5] → R achieves its minimum m at the boundary pointx0 and its maximum M at the inner point x4. The points x1, x2, x3 and x4 are critical pointsof f satisfying the condition in Fermat’s lemma. The points x1 and x2 are local extrema.

Proposition A.11 (Fermat’s Lemma) If the real-valued function f :Rn → R is differentiable at an interior extremum x, then its first deri-vative vanishes at that point, fx(x) = 0.

Proof Since the function f (x) is by assumption differentiable at theinterior extremum x, it is

f (x +�) − f (x) = 〈 fx(x),�〉 + 〈ρ(x;�),�〉 = 〈 f ′(x) + ρ(x;�),�〉,whereρ(x;�) → 0 as� → 0. If fx(x) �= 0, then for small‖�‖ the left-handside of the last relation is sign-definite (since x is a local extremum),whereas the scalar product on the right-hand side can take on eithersign, depending on the orientation of �, which yields a contradiction.Hence, necessarily fx(x) = 0. n

Figure A.2 illustrates the notion of extrema for a real-valued functionon a compact interval. The extrema m and M, guaranteed to exist bythe Weierstrass theorem, are taken on at the boundary and in the in-terior of the domain, respectively. The next two propositions are aux-iliary results to establish the mean-value theorem for differentiablereal-valued functions.

Page 261: Optimal Control Theory With Applications in Economics

248 Appendix A

Proposition A.12 (Rolle’s Theorem) Let f : [a, b] → R be a continu-ous real-valued function, differentiable on the open interval (a, b),where −∞ < a < b < ∞. Suppose further that f (a) = f (b). Then thereexists a point x ∈ (a, b) such that fx(x) = 0.

Proof Since f is continuous on the compact set� = [a, b], by the Weier-strass theorem there exist points x

¯, x ∈ � such that f (x

¯) and f (x) are the

extreme values of f on �. If f (x¯) = f (x), then f must be constant on �.

If f (x¯) �= f (x), then x

¯or x must lie in (a, b), since f (a) = f (b). That point is

denoted by x, and Fermat’s lemma implies that fx(x) = 0. n

Proposition A.13 (Lagrange’s Theorem) Let f : [a, b] → R be a continu-ous real-valued function, differentiable on the open interval (a, b), where−∞ < a < b < ∞. Then there exists a point x ∈ (a, b) such that

f (b) − f (a) = fx(x)(b − a).

Proof The function f , with

f (x) = f (x) − f (b) − f (a)b − a

(x − a)

for all x ∈ [a, b], satisfies the assumptions of Rolle’s theorem. Hence,there exists a point x ∈ (a, b) such that

fx(x) = fx(x) − f (b) − f (a)b − a

= 0,

which completes the argument. n

Proposition A.14 (Mean-Value Theorem) Let f : D → R be a differen-tiable real-valued function, defined on the domain D ⊂ Rn. Assumethat a closed line segment with endpoints x and x lies in D, i.e.,θ x + (1 − θ )x ∈ D for all θ ∈ [0, 1]. Then there exists λ ∈ (0, 1) such that

f (x) − f (x) = 〈fx(λx + (1 − λ)x), x − x〉.Proof The function F : [0, 1] → R, with

F(θ ) = f (θ x + (1 − θ )x),

for all θ ∈ [0, 1], satisfies the assumptions of Lagrange’s theorem. Thus,there exists λ ∈ (0, 1) such that

f (x) − f (x) = F(1) − F(0) = Fθ (λ) = 〈 fx(λx + (1 − λ)x), x − x〉,which concludes the proof. n

Page 262: Optimal Control Theory With Applications in Economics

Appendix A 249

Consider the choice problem discussed at the outset of this sec-tion, but the now continuously differentiable objective function and thechoice set depend on an exogenous parameter, leading to the followingparameterized optimization problem:

maxx∈X (p)

f (x, p), (A.6)

where p ∈ Rm is the problem parameter and

X (p) = {x ∈ Rn : g(x, p) ≤ 0} (A.7)

is the parameterized choice set, with g : Rn+m → Rk a continuouslydifferentiable constraint function. The resulting maximized objectivefunction (also termed value function) is

F(p) = maxx∈X (p)

f (x, p). (A.8)

Some of the most important insights in economics derive from an anal-ysis of the comparative statics (changes) of solutions with respect toparameter movements. To quantify the change of the value function F(p)with respect to p, recall the standard Lagrangian formalism for solvingconstrained optimization problems. In the Lagrangian

L(x, p, λ) = f (x, p) − 〈λ, g(x, p)〉, (A.9)

λ ∈ Rk is the Lagrange multiplier (similar to the adjoint variable in theHamiltonian framework in chapter 3). The idea for solving the con-strained maximization problem (A.6) is to relax the constraint andcharge for any violation of these constraints using the vector λ ofshadow prices. The Karush-Kuhn-Tucker (necessary optimality) conditions15

become

Lx(x, p, λ) = fx(x, p) − λ′gx(x, p) = 0, (A.10)

together with the complementary slackness condition

gj(x, p) > 0 ⇒ λj = 0, ∀ j ∈ {1, . . . , k}. (A.11)

Assuming a differentiable solution x(p), λ(p) one obtains

Fp(p) = fx(x(p), p) xp(p) + fp(x(p), p). (A.12)

15. For the conditions to hold, the maximizer x(p) needs to satisfy some constraint qualifi-cation, e.g., that the active constraints are positively linearly independent (Mangasarian-Fromovitz conditions). This type of regularity condition is used in assumptions A4 and A5(see section 3.4) for the endpoint constraints and the state-control constraints of the generalfinite-horizon optimal control problem.

Page 263: Optimal Control Theory With Applications in Economics

250 Appendix A

From the first-order necessary optimality condition (A.10),

fx(x(p), p) =k∑

j=1

λj(p) gj,x(x(p), p). (A.13)

The complementary slackness condition (A.11), on the other hand,implies that 〈λ(p), g(x(p), p)〉 = 0. Differentiating this relation withrespect to p yields

k∑j=1

[λj,p(p) gj(x(p), p) + λj(p)

(gj,x(x(p), p) xp(p) + gj,p(x(p), p)

)]= 0,

so that, using (A.13),

fx(x(p), p) x(p) = −k∑

j=1

[λj,p(p) gj(x(p), p) + λj(p) gj,p(x(p), p)

].

Hence, the expression (A.12) becomes

Fp(p) = fp(x(p), p) +k∑

j=1

λj(p) gj,p(x(p), p) = Lp(x(p), p, λ(p)).

This proves the envelope theorem, which addresses changes of the valuefunction with respect to parameter variations.16

Proposition A.15 (Envelope Theorem) Let x(p) (with Lagrange multi-plier λ(p)) be a differentiable solution to the parameterized optimizationproblem (A.6), and let F(p) in (A.8) be the corresponding value function.Then Fp(p) = Lp(x(p), p, λ(p)).

In the special case where no constraint is binding at a solution x(p)of (A.6), the envelope theorem simply states that Fp(p) = fp(x(p), p), thatis, at the optimal solution the slope of the value function with respect toa change in the parameter p is equal to the slope of the objective functionwith respect to p.

In general, the solution to the constrained optimization problem (A.6)is set-valued for each parameter p. The behavior of this set

X(p) = arg maxx∈X(p)

f (x, p)

16. A more general version of this theorem was proposed by Milgrom and Segal (2002).

Page 264: Optimal Control Theory With Applications in Economics

Appendix A 251

as a function of p cannot be described using the standard continuity andsmoothness concepts.

A set-valued function ϕ : X ⇒ Y maps elements of the normed vectorspace X to subsets of the normed vector space Y . The function is calledupper semicontinuous at a point x ∈ X if for each open set B with ϕ(x) ⊂ Bthere exists a neighborhood G(x) such that

x ∈ G(x) ⇒ ϕ(x) ⊂ B.

The following result establishes the regularity of the solution to (A.6) andits value function F(p) in (A.8) with respect to changes in the parameter p.

Proposition A.16 (Berge Maximum Theorem) (Berge 1959) Assumethat the functions f and g in (A.6)–(A.7) are continuous. Then the setof solutions X(p) is nonempty, upper semicontinuous, and compact-valued. Furthermore, the value function F(p) in (A.8) is continuous.

Proof See Berge (1959; pp. 115–117 in 1963 English translation). n

The Berge maximum theorem is a useful tool to verify the plausibilityof solutions to parameterized optimization problems. First, it states thatthe solution set X(p) cannot really make jumps, but by its upper semi-continuity can add distant solutions only via indifference. For example,if X(p) ⊆ {0, 1} for all p, then along any path from p to p in the parameterspace with x(p) = {0} and X(p) = {1} there will be at least one point palong the way where X(p) = {0, 1} and the agent is indifferent betweenthe two actions. The Berge maximum theorem is used together withthe following well-known fixed-point theorem to establish the existenceof a Nash equilibrium of a finite game in proposition 4.1.

Proposition A.17 (Kakutani Fixed-Point Theorem) (Kakutani 1941)Let S �= ∅ be a convex compact subset of Rn (where n ≥ 1). Let F : S ⇒S be an upper semicontinuous set-valued mapping with the propertythat F(x) is nonempty and convex for all x ∈ S. Then F has a fixed point.

Proof See Aubin (1998, 154). n

A.5 Notes

This mathematical review has established (with a few exceptions) onlythose results used in the main text. No effort has been made to pro-vide any kind of completeness. A good source for linear algebra isStrang (2009) and for more advanced algebra, Hungerford (1974). There

Page 265: Optimal Control Theory With Applications in Economics

252 Appendix A

are many classic texts on analysis and calculus, such as Rudin (1976)and Apostol (1974). A modern treatment is provided by Zorich (2004)in two volumes. More advanced topics of nonlinear analysis and func-tional analysis are discussed in Kolmogorov and Fomin (1957), Dunfordand Schwartz (1958), Aubin and Ekeland (1984), Aubin (1998), andSchechter (2004). Border (1985) covers the application of fixed-pointtheorems in economics, while Granas and Dugundji (2003) provide acomprehensive overview of mathematical fixed-point theory. For anintroduction to optimization, see Luenberger (1969), Bertsekas (1995),and Brinkhuis and Tikhomirov (2005). More powerful results in opti-mization are obtained when the considered problems are convex (see,e.g., Boyd and Vandenberghe 2004).

Page 266: Optimal Control Theory With Applications in Economics

Appendix B: Solutions to Exercises

B.1 Numerical Methods

The optimal control problems considered in this book assume that anexplicit model of a dynamic system is available. The exercises at theend of chapters 2–5 provide examples of how structural insights can beobtained from analytical solutions. In many practical situations, becauseof the complexity of the problem at hand, it will not be possible toobtain such analytical solutions, in which case one has to rely on numer-ical approximations for specific parametrizations of the optimal controlproblem.

There are two main classes of methods to solve optimal controlproblems: indirect and direct. An indirect method relies on necessaryoptimality conditions provided by the Pontryagin maximum princi-ple (PMP) to generate solution candidates, for example, by solvinga two-point boundary value problem or by forward-simulating theHamiltonian system consisting of system equation and adjoint equa-tion for different initial values of the adjoint variable in the direction ofsteepest ascent of the objective functional. A direct method approximatestrajectories by using functions that are indexed by finite-dimensionalparameters and in this way converts the infinite-dimensional optimalcontrol problem into a finite-dimensional optimization problem (of find-ing the best parameter values), which can be solved using commerciallyavailable nonlinear programming packages. Figure B.1 provides anoverview.

Indirect Methods Given that the vast majority of the analytical solu-tions to optimal control problems in this book are obtained using thenecessary conditions provided by the PMP, it is natural to expect thatthere exist successful implementations of this approach. Ignoring any

Page 267: Optimal Control Theory With Applications in Economics

254 Appendix B

Numerical Methods for Optimal Control Problems

Indirect Direct

Multipoint BoundaryValue Problem

Shooting Algorithms

Forward Simulation

Gradient AlgorithmsSpectral Methods,

Primal-Dual Algorithms

ConsistentApproximation

Collocation,Iterative Integration

Figure B.1Overview of numerical solution methods.

constraints, the maximum principle in proposition 3.4 for the simplifiedfinite-horizon optimal control problem leads to a (two-point) boundaryvalue problem (BVP), that is, an ordinary differential equation (ODE) thatspecifies endpoint conditions at both ends of the time interval [t0, T].The split boundary conditions make the solution of a BVP much moredifficult than the integration of a standard initial value problem (IVP),especially when the time interval is large. To make things concrete, con-sider the simple optimal control problem (3.9)–(3.12) in section 3.3.1,and assume that there is a unique feedback law u = χ (t, x,ψ) such thatthe maximality condition

H(t, x,χ (t, x,ψ),ψ) = maxu∈U

H(t, x, u,ψ)

is satisfied. Then the Hamiltonian system (3.10)–(3.11) and (3.26)–(3.27)is a two-point BVP that can be written in the form

x(t) = Hψ (t, x(t),ψ(t)), x(t0) = x0, (B.1)

ψ(t) = −Hx(t, x(t),ψ(t)), ψ(T) = 0, (B.2)

where

H(t, x,ψ) ≡ H(t, x,χ (t, x,ψ),ψ).

A numerical shooting algorithm tries to solve the BVP (B.1)–(B.2) byrelaxing one of the boundary conditions, converting it to an initial con-dition instead, for instance, by imposing ψ(t0) = α instead of ψ(T) = 0,yielding the IVP

Page 268: Optimal Control Theory With Applications in Economics

Appendix B 255

x(t) = Hψ (t, x(t),ψ(t)), x(t0) = x0, (B.3)

ψ(t) = − Hx(t, x(t),ψ(t)), ψ(t0) = α, (B.4)

with (x(t,α), ψ(t,α)), t ∈ [t0, T], as solution trajectories for different val-ues of the parameter α = α. Provided the BVP (B.1)–(B.2) possesses asolution, it may be obtained by finding α such that 1

ψ(T, α) = 0.

The underlying reason that shooting methods work for well-posed sys-tems is the continuous dependence of solutions to ODEs on the initialconditions (see proposition 2.5 and footnote 25 in section 3.6.1). The maindrawback of using the BVP (B.1)–(B.2) to determine solution candidatesfor an optimal control problem is that the success invariably dependson the quality of the initial guess for the parameter α and the norm ofthe discretization grid (which has the same dimensionality as the co-state ψ). For more details on shooting methods for solving two-pointBVPs, see, for example, Roberts and Shipman (1972), Fraser-Andrews(1996), and Sim et al. (2000).

Consider now an alternative to a regular shooting method, which isuseful especially in cases where |T − t0| is large, so that small perturba-tions of the parameter α from the unknown value α can cause a largedeviation of ψ(T,α) from ψ(T, α) = 0. For any intermediate time τ ∈[t0, T], the parameterized IVP (B.3)–(B.4) can be forward-simulated onthe interval [t0, τ ] by selecting the trajectory (x(t,α),ψ(t,α)) (i.e., select-ing the parameter α = α(τ )) that maximizes z(τ ,α), which is obtainedfrom the solution trajectory z(t,α), t ∈ [t0, τ ], of the IVP

z(t) = h(t, x(t,α),ψ(t,α)), z(t0) = 0,

where h(t, x,ψ) ≡ h(t, x,χ (t, x,ψ)). In this setting, the function z(τ ,α)represents an approximation of the value of the objective functionalwhen the optimal control problem is truncated to the interval [t0, τ ].Figure B.2 provides the intuition of this method, where at each τ ∈ (t0, T)the parameter α is selected based on the largest attainable value of theobjective function at t = τ . By Bellman’s principle of optimality theconcatenation of optimal policies for all τ describes a solution to theoptimal control problem over the entire time interval [t0, T]. For moredetails on this gradient method of steepest ascent (or steepest descent

1. Similarly, it is possible to consider the terminal condition x(T) = β instead of the initialcondition x(t0) = x0, and then determine trajectories (x(t,β), ψ(t,β)), to determine β = β

such that x(t0, β) = x0.

Page 269: Optimal Control Theory With Applications in Economics

256 Appendix B

*

Figure B.2Forward simulation of Hamiltonian system (B.3)–(B.4) with parameter α selected basedon steepest ascent of z(t,α).

for the minimization problems commonly considered in engineering)see, for instance, Lee and Markus (1967, app. A). Indirect methodsmay lack robustness because a numerical solution of the HamiltonianBVP may be possible only if the initial guess of the parameter α issufficiently close to its actual value. Note also that the necessary opti-mality conditions provided by the PMP are satisfied by state-controltrajectories that maximize (resp. minimize) the objective functional,are only local extrema, or (as saddle points) represent no extremaat all.

Direct Methods In contrast to the indirect methods discussed earlier,direct methods attempt to maximize the objective functional, subject

Page 270: Optimal Control Theory With Applications in Economics

Appendix B 257

to the system equation and state-control constraints. For this, a directmethod converts the (by its very nature) infinite-dimensional opti-mal control problem to a finite-dimensional nonlinear programmingproblem. The class of conceptual algorithms achieves this by usingfinite-dimensional families of admissible controls (e.g., piecewise con-stant functions), but typically retaining the use of infinite-dimensionaloperations such as integration and differentiation.2 These operations areusually fraught with numerical errors, which may limit the accuracyof the methods.3 Discretization techniques to approximate the originalproblem arbitrarily closely using sufficiently many discretization pointsare referred to as consistent approximation methods. So-called Galerkinmethods are based on selecting appropriate collocation points (orknots), at which a discretized version of the optimal control problemis solved, usually respect to a finite set Z of basis functions u(t; z),z ∈ Z . The optimal control problem is then transcribed in terms of thefinite-dimensional parameter vector z ∈ Z as a nonlinear optimizationproblem of the form

F(z) −→ maxz

, (B.5)

G(z) = 0, (B.6)

H(z) ≤ 0, (B.7)

where the functions G and H encapsulate the state equation, endpointconstraints, and state-control constraints of an optimal control problem.The mix of ODEs and algebraic equations constraining the path leadsto differential algebraic equations (DAEs), which can be solving usingpseudospectral methods (transforming the problem into a differentspace, for example, by using the Laplace transform) or iterative integra-tion techniques. An implementation of a collocation method based ona Runge-Kutta integration method of piecewise-polynomial functionswas realized as a MATLAB toolbox, Recursive Integration Optimal Tra-jectory Solver (RIOTS), by Schwartz et al. (1997). Implementations ofthe pseudospectral methods are provided by the MATLAB toolboxesPROPT by TOMLAB (Rutquist and Edvall 2009) and DIDO (Ross andFahroo 2003).

2. For references to such methods, see, e.g., Schwartz (1996).3. Numerical errors can sometimes be avoided using more sophisticated methods such asthe algorithmic differentiation technique implemented by TOMLAB in the optimal controltoolbox PROPT for MATLAB.

Page 271: Optimal Control Theory With Applications in Economics

258 Appendix B

Remark B.1 (Approximate Dynamic Programming) In a discrete-timeframework and using the Hamilton-Jacobi-Bellman (HJB) equation (orthe Bellman equation for infinite-horizon problems) there exist approx-imation methods usually referred to as approximate dynamic program-ming techniques. For more details, see, for example, Bertsekas (1996),Sutton and Barto (1998), and Powell (2007). �

B.2 Ordinary Differential Equations

2.1 (Growth Models)

a. (Generalized Logistic Growth) The ODE x = f (t, x) can be rewrittenas a Bernoulli ODE of the form

ddt

(xx

)−αγ

(xx

)+αγ

(xx

)(1/γ )+1 = 0,

so that the standard substitution ϕ = −γ (x/x)−1/γ leads to the linearfirst-order ODE

ϕ+αϕ = −αγ .

Taking into account the initial condition ϕ(t0) = −γ (x0/x)−1/γ , theCauchy formula yields

ϕ(t) = − γ(x0

x

)−1/γe−α(t−t0) −αγ

∫ t

t0

e−α(t−s)ds

= − γ

[(x0

x

)−1/γe−α(t−t0) + (1 − e−α(t−t0))

],

so that

x(t) = x[1 +

(x1/γ−x1/γ

0

x1/γ0

)e−α(t−t0)

]γ , t ≥ t0,

solves the original ODE with initial condition x(t0) = x0 > 0.

b. (Gompertz Growth) Taking the limit for γ → ∞ on the right-handside of the ODE for generalized logistic growth in part a, one obtains,using l’Hôpital’s rule,4

4. In its simplest form, l’Hôpital’s rule states, for two real-valued functions f , g : R → R

and given t ∈ R, that if limt→t f (t) = g(t) ∈ {0, ±∞}, then limt→t f (t)/g(t) = limt→t f (t)/g(t),provided that the functions f ( · ) and g( · ) are differentiable at t = t.

Page 272: Optimal Control Theory With Applications in Economics

Appendix B 259

limγ→∞αγ

(1 −

(xx

)1/γ)

x = αx limγ→∞

1 − (x/x)1/γ

1/γ

= αx limγ→∞

(1/γ 2) (x/x)1/γ ln (x/x)−1/γ 2

= αx ln (x/x).

Thus, Gompertz growth can be viewed as a limit of generalized logisticgrowth, for γ → ∞. The ODE for Gompertz growth is separable, sothat by direct integration

ln(

ln(x

x

))− ln

(ln(x0

x

))=∫ x

x0

dξξ ln (ξ/x)

= −α(t − t0),

for all t ≥ t0, and therefore

x(t) = x(x0

x

)e−α(t−t0)

, t ≥ t0,

solves the IVP for Gompertz growth with initial condition x(t0) = x0 > 0.

c. (Bass Diffusion) Using the separability of f (t, x), it is5

xαx +β

ln(αx +β

αx0 +β· x − x0

x − x

)=∫ x

x0

dξ(1 − (ξ/x)

)(αξ +β)

=∫ t

t0

ρ(s) ds ≡ R(t),

or equivalently,

αx +β

x − x= αx0 +β

x − x0exp

[(α+ β

x

)R(t)

].

Hence,

x(t) = x − αx +β

α+ αx0+βx−x0

exp[(α+ β

x

)R(t)

] , t ≥ t0,

solves the Bass diffusion IVP with initial condition x(t0) = x0, providedthat 0 < x0 < x.

d. (Estimation of Growth Models) The preceding models are fitted to thetotal shipments of compact discs in the United States from 1985 to 2008,

5. The integral on the left-hand side is computed by a partial fraction expansion.

Page 273: Optimal Control Theory With Applications in Economics

260 Appendix B

(see table B.1). At time tk = 1985 + k, k ∈ {0, 1, . . . , 23}, the number xk =x(tk) of compact discs shipped (net, after returns) corresponds to thechange yk ≈ x(tk) = f (tk , xk) + εk, where εk is the residual at time tk . Themodel parameters are determined (e.g., using a numerical spreadsheetsolver) so as to minimize the sum of squared residuals,

SSR =∑23

k=0ε2

k =23∑

k=0

(yk − f (tk, xk))2.

Table B.2 reports the parameter values together with estimates for thestandard error, SE = √

SSR/d, where the number of degrees of freedom dis equal to the number of data points (24) minus the number of free modelparameters, minus 1. The corresponding diffusion curves are depictedin figure B.3.

2.2 (Population Dynamics with Competitive Exclusion)

a. Consider a linear transformation of the form

τ = γ t, x1(τ ) = ξ1(t)A1

, x2(τ ) = ξ2(t)A2

,

which, when substituted in the original system of ODEs, yields

Table B.1Total Shipments of Compact Discs in the United States, 1985–2008 (millions)

Year 1985 1986 1987 1988 1989 1990 1991 1992Units 22.6 37.6 62.4 103.7 172.4 286.5 343.8 412.6

Year 1993 1994 1995 1996 1997 1998 1999 2000Units 495.4 662.1 722.9 778.9 753.1 847.0 938.9 942.5

Year 2001 2002 2003 2004 2005 2006 2007 2008Units 881.9 803.3 745.9 720.5 696.0 616.0 511.1 384.7

Sources: Recording Industry Association of America; Kurian and Chernov (2007, 323).

Table B.2Model Parameters That Minimize SSR in Exercise 2.1d

Model Parameters SE

Generalized logistic growth α = 0.1792, γ = 2.9059, x = 15915.8 40.78Gompertz growth α = 0.1494, x = 16073.7 73.16Bass diffusion α = 0.2379, β = 77.9227, x = 14892.6 49.48

Page 274: Optimal Control Theory With Applications in Economics

Appendix B 261

0

100

200

300

400

500

600

700

800

900

1,000

1985 1990 1995 2000 2005 2010

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

Un

its

(mil

lio

ns) Gompertz Growth

Gompertz Growth

Bass Diffusion

Bass Diffusion

Data Point

Data Point

Generalized Logistic Growth

Generalized LogisticGrowth

Year

1985 1990 1995 2000 2005 2010

Year

Cu

mu

lati

ve

Un

its

(mil

lio

ns)

(a)

(b)

Figure B.3Approximation of (a) shipments, and (b) cumulative shipments, of compact discs usingdifferent growth models.

Page 275: Optimal Control Theory With Applications in Economics

262 Appendix B

γA1x1(τ ) = a1A1x1(τ )(

1 − A1

ξ1x1(τ ) − A2b12

ξ1x2(τ )

),

γA2x2(τ ) = a2A2x2(τ )(

1 − A2

ξ2x2(τ ) − A1b21

ξ2x1(τ )

).

Hence, for γ = a1, A1 = ξ1, and A2 = ξ2, one obtains the desired systemof ODEs, with β12 = ξ2

ξ1b12, β21 = ξ1

ξ2b21, and α = a2

a1. The coefficient β12

determines how x2 affects growth or decay of x1 depending on the signof 1 − x10 −β12x20. Similarly, α and αβ21 determine the relative growthor decay of x2 as a function of x2 and x1, respectively, based on the signof 1 − x20 −β21x10.

b. The steady states (or equilibria) of the system are the roots of thesystem function f (x). Indeed, it is f (x) = 0 if and only if

x ∈{

(0, 0), (0, 1), (1, 0),(

1 −β12

1 −β12β21,

1 −β21

1 −β12β21

)}.

From this it can be concluded that there is a (unique) equilibrium in thestrict positive quadrant R2++ if (provided that β12β21 �= 1)

min{(1 −β12)(1 −β12β21), (1 −β21)(1 −β12β21)} > 0.

Figure B.4 depicts the corresponding values of β12 and β21.

c. Using the linearization criterion one can examine, based on the eigen-values of the Jacobian matrix fx(x), the local stability properties of anyequilibrium point x ∈ R2+ (as determined in part b). Indeed, it is

fx(x) =(

1 − 2x1 −β12x2 −β12x1

−αβ21x2 α(1 − 2x2 −β21x1)

).

Thus• x = 0 is unstable, since

fx(x) =(

1 00 α

)has the positive eigenvalues λ1 = 1 and λ2 = α.• x = (0, 1) is stable if β12 > 1, since

fx(x) =(

1 −β12 0−αβ21 −α

)

Page 276: Optimal Control Theory With Applications in Economics

Appendix B 263

0 1

1

Figure B.4Parameter region (shaded) that guarantees a positive equilibrium for the system ofexercise 2.2.

has the negative eigenvalues λ1 = −α and λ2 = 1 −β12.• x = (1, 0) is stable if β21 > 1, since

fx(x) =(−1 −β12

0 α(1 −β21)

)has the negative eigenvalues λ1 = −1 and λ2 = α(1 −β21).

It was shown in part b when β12,β21 are positive, that x =(

1−β121−β12β21

,

1−β211−β12β21

)exists as a positive equilibrium if and only if (β12 − 1)(β21 − 1) >

0, that is, both parameters are either greater or smaller than 1. Theequilibrium x is stable if the eigenvalues of fx(x),

λi ∈{

(β12 − 1) +α(β21 − 1) ± √a2 − 4αb

2(1 −β12β21)

}, i ∈ {1, 2},

where a = β12 − 1 +α(β21 − 1) and b = (1 −β12β21)(β12 − 1)(β21 − 1),have negative real parts (i.e., for β12,β21 < 1).

d. Figure B.5 depicts phase diagrams for the cases where eitherβ12,β21 >

1 or 0 < β12,β21 < 1. In the unstable case a separatrix divides the statespace into initial (nonequilibrium) states from which the system con-verges either to exclusion of the first population with limit state (0, 1) orto exclusion of the second population with limit state (1, 0). Similarly, in

Page 277: Optimal Control Theory With Applications in Economics

Separatrix

(a)

(b)

Figure B.5Phase diagrams for the system of exercise 2.2d: (a) stable case where β12β21 < 1, and (b)unstable case where β12β21 > 1.

Page 278: Optimal Control Theory With Applications in Economics

Appendix B 265

the absence of a positive steady state, that is, when (β12 − 1)(β21 − 1) < 0,the system trajectory x(t) converges either to (0, 1) (for β12 > 1) or to (1, 0)(for β21 > 1) as t → ∞.

e. The nature of the population coexistence (in the sense of competitionversus cooperation) is determined by the existence of a stable positivesteady state. As seen in parts c and d, coexistence is possible if 0 <β12,β21 < 1. As a practical example, one can think of the state variables x1

and x2 as the installed bases for two firms that produce two perfectlysubstitutable products. For almost all initial values, either the first orthe second firm will monopolize the market.

2.3 (Predator-Prey Dynamics with Limit Cycle)

a. As in exercise 2.2, a steady state x of the system is such that f (x) = 0,which is equivalent to

x ∈{

(0, 0), (1, 0),

(a − √

b2

,a − √

b2

),

(a + √

b2

,a + √

b2

)},

where a = 1 −β − δ and b = a2 + 4β. Note that since β, δ > 0, only thelast solution is positive.

b. Using the linearization criterion, one can determine the stability of thepositive steady state by examining the eigenvalues λ1, λ2 of the systemmatrix

fx(x) = −((

1 − δx1(x1+β)2

)x1

δx1x1+β

−α α

).

This matrix is asymptotically stable if Re(λ1), Re(λ2) < 0, that is, if

trace ( fx(x)) = λ1 + λ2 = −(

1 − δx1

(x1 +β)2

)x1 −α < 0,

and

det (fx(x)) = λ1λ2 = α

(1 + βδ

(x1 +β)2

)x1 > 0.

The second inequality is always satisfied, which means that both eigen-values always have the same sign. Thus, in the generic case where theeigenvalues are nonzero the system is either asymptotically stable orunstable, depending on whether the first inequality holds or not. Fig-ure B.6 shows the (α,β, δ)-region for which fx(x) is asymptotically stable.In this region, it is δ ≥ 1/2.

Page 279: Optimal Control Theory With Applications in Economics

266 Appendix B

5

0

1

1

1

AsymptoticallyStable

Unstable

Figure B.6Parameter region that guarantees asymptotic stability for the unique positive equilibriumof exercise 2.3b.

c. Consider a parameter tuple (α,β, δ) such that the system is unstable,as determined in part b, for instance, α = β = 1/5 and δ = 2 (see fig-ure B.6). Given the system function f (x) = ( f1(x), f2(x)), the two nullclinesare given by the set of states x for which f1(x) = 0 and f2(x) = 0, respec-tively. These nullclines divide the state space into four regions wherethe direction of the state trajectories is known. Note that the positiveequilibrium x is located at the intersection of these nullclines. Sinceboth eigenvalues of the system have positive real parts, no trajectorystarting at a point x0 different from x can converge to x. Thus, if thereis a nonempty compact invariant set � ⊂ R2++, such that any initialcondition x(0) = x0 ∈ � implies that x(t) = φ(t, x0) ∈ � for all t ≥ 0, thenby the Poincaré-Bendixson theorem the set � must contain a nontriv-ial limit cycle.6 Figure B.7 shows that the invariant set � does indeedexist, and a formal proof (which would consist in explicitly checking thedirection of the vector field that generates the flow of the system at theboundary of �) can be omitted.

6. By Lemma 2.4 the positive limit set L+x0

of x0 is nonempty, compact, and invariant.

Page 280: Optimal Control Theory With Applications in Economics

Appendix B 267

Figure B.7Invariant set � and (stable) limit cycle for the system of exercise 2.3c when the positiveequilibrium x is unstable.

d. In the Lotka-Volterra model (see example 2.8), the system trajectorystarts at a limit cycle for any given initial condition, whereas in thismodel the system trajectory approaches a certain limit cycle as t → ∞.Thus, in the Lotka-Volterra model, a slight change in the initial con-ditions results in a consistently different trajectory, whereas the modelconsidered here exhibits a greater robustness with respect to the initialcondition because of the convergence of trajectories to a zero-measureset of limit points.

2.4 (Predator-Prey Dynamics and Allee Effect)

Part 1: System Description

a. Consider, for example, the functions

g(x1) = exp[−α(x − x1)2] and h(x2) = √1 +βx2,

which satisfy the given assumptions for any desired x1 > 0 as long as0 < α ≤ α = 1

2(x1)2and β > 0. The classical Lotka-Volterra predator-prey

system obtains by setting α = β = 0.

Page 281: Optimal Control Theory With Applications in Economics

268 Appendix B

The right-hand sides of the two ODEs are zero, if x2 = g(x1) and x1 =h(x2), respectively. These relations describe the nullclines of the system,the intersection of which determines the equilibria of the system (seepart 2). Depending on the particular parametrization of the functionsg( · ) and h( · ), the nullclines may intersect in R2++ either before orafter x1. For example, using the parametrization for x1 = 5, α = α =1/50, we have that for β = 1/10 the only positive equilibrium is at x ≈(1.0359, 0.7303), and for β = 1 it is x ≈ (1.3280, 0.7636).

Figure B.8 depicts the phase diagrams for β ∈ {1/10, 1}. In contrast tothe classical Lotka-Volterra system, depending on the parameter valuesone may obtain either a stable (for β = 1) or an unstable (for β = 1/10)positive steady state.

b. Since g(x1) is strictly increasing on [0, x1], the relative growth rate ofx1 (i.e., g(x1) − x2) increases with the population size x1 and decreasesas x1 passes x1, finally approaching zero. This phenomenon is known asthe (weak) Allee effect and does not appear in the classical Lotka-Volterramodel, where g(x1) ≡ 1 is constant.

Part 2: Stability Analysis

c. From figure B.8 it is apparent that the origin is not a stable equi-librium. In order to prove this claim formally, consider the linearizedsystem

x = fx(0) x =[

g(0) 00 −h(0)

]x.

The system matrix fx(0) has the eigenvalues g(0) and −h(0), the firstof which is by assumption positive. This implies that the origin isunstable.

d. If x = (x1, x2) ∈ R2++ is an equilibrium, then it must be a positivesolution of

x2 = g(x1),

x1 = h(x2).

Concerning existence, note first that h( · ) is by assumption increasingand concave, so that its (increasing and convex) inverse h−1( · ) exists.Thus, x2 = h−1(x1), and a solution can be obtained, provided that

h−1(x1) = g(x1).

Page 282: Optimal Control Theory With Applications in Economics

Appendix B 269

(a)

(b)

Figure B.8Predator-prey dynamics of exercise 2.4: (a) unstable equilibrium (β = 0.1), and (b) stableequilibrium (β = 1).

Page 283: Optimal Control Theory With Applications in Economics

270 Appendix B

But the previous equation possesses a solution x1 > h(0), since h−1

(h(0)) = 0 < g(x1) for all x1 ≥ h(0), and at the same time h−1(x1) isincreasing while limx1→∞ g(x1) = 0 by assumption. Hence, a posi-tive equilibrium x � 0 exists. Concerning uniqueness, note that ifthe smallest solution x1 ≤ x1, then since g(x1) is concave by assump-tion, the function h( g(x1)) is increasing and concave. Therefore thefixed-point problem x1 = h( g(x1)) = (h ◦ g)(x1) has a unique solutionon [h(0), x1] (e.g., by the Banach fixed-point theorem (proposition A.3),since the concave function h ◦ g is a contraction mapping on the compactdomain [h(0), x1]). On the other hand, if the smallest solution x1 > x,there cannot be any further solutions because h−1(x1) is increasingand g(x1) decreasing for x1 > x1.

The linearization criterion provides sufficient conditions for the sta-bility of the positive equilibrium x. The linearized system is of the form

x = fx(x) x =[

x1g′(x1) −x1

x2 −x2h′(x2)

]x.

The system matrix fx(x) is Hurwitz if (using the equilibrium relations)

g′(x1)g(x1)

<h′(x2)h(x2)

and g′(x1)h′(x2) < 1,

which is sufficient for the (local asymptotic) stability of the nonlinearsystem x = f (x) at x = x. Note that both inequalities are satisfied ifx1 > x1. On the other hand, if

max{

g′(x1)g(x1)

− h′(x2)h(x2)

, g′(x1)h′(x2) − 1}> 0,

then one of the eigenvalues of fx(x) has a positive real part, which impliesby the linearization criterion that x must be an unstable equilibrium ofthe nonlinear system x = f (x).

Remark B.2 For the particular parametrization of g and h in exercise2.4a, we obtain that g′(x1)/g(x1) = 2α(x1 − x1) and h′(x2)/h(x2) = 2

β(1+βx2) .When α = 1/50, x1 = 5, then the conditions yield instability of theunique positive equilibrium forβ = 1/5 and stability forβ ∈ {1, 100}. �

Part 3: Equilibrium Perturbations and Threshold Behavior

e. As in part d, the fact that x1 > x1 implies that the positive equilib-rium x is stable. Furthermore, since x0 > x, we have x1 < 0, and thesystem will start going back toward x1. Depending on the value of x2,the system may or may not converge to x without passing by x1. In the

Page 284: Optimal Control Theory With Applications in Economics

Appendix B 271

situation where the state trajectory passes over x1, which happens when‖x0 − x‖ is large enough, x1 starts to decrease on [0, x1]. This will con-tinue until x2 becomes less than g(x1) which makes x1 positive. Therefore,passing x1 the state trajectory will cause the system to orbit around thesteady state until it converges to x. When the state trajectory does notpass x1, it converges to x immediately.

f. If the conditions given in part d are satisfied, that is, x is a stablesteady state, and if ‖x0 − x‖ is small enough, by the definition of a stablesteady state, the state trajectory will converge to x.7 However, if x is notstable, since X = R2++ is an invariant set for the system, there exists anonempty compact invariant subset � of X that does not include anysteady state. Consequently, by the Poincaré-Bendixson theorem thereexists a limit cycle to which the state trajectory converges.

Part 4: Interpretation and Verification

g. One can easily show that the relative growth rate of population x1

increases for x1 < x1 and decreases for x1 > x1. Furthermore, the largerthe population x2, the slower the growth rate of x1. On the other hand, therelative growth rate of population x2 increases with x1. Consequently,one can interpret x1 as a platform for the growth of x2. For example,if x1 represents the number of highways in a state and x2 denotes thenumber of cars in that state, then the more highways, the greater thenumber of cars. As the number of highways increases, the need foradditional highways will diminish. This reflects the property of g(x1)which decreases when x1 > x1. Moreover, having a large number of carsincreases the highway depreciation through overuse, which justifies thefactor −x2 in the relative growth rate of x1.

h. See parts a and d.

B.3 Optimal Control Theory

3.1 (Controllability and Golden Rule)

a. Since one can always consider the control γu2 instead of u2, it ispossible to set γ = 1, without any loss of generality. Moreover, usingthe substitution

u2 = γu2

1 − x2∈ [0, 1],

7. If the perturbation is large, the state trajectory may not converge to x. In this case, it ispossible to have a limit cycle to which the state trajectory converges.

Page 285: Optimal Control Theory With Applications in Economics

272 Appendix B

the system dynamics remain unchanged, since

D(x, u2) = max{0, 1 − x2 − γu2}(α2x1 +α3x2)

= (1 − x2)(α2x1 +α3x2)(1 − u2) ≡ D(x2, u2).

Remark B.3 The objective functional in exercise 3.1d with the trans-formed control variable u = (u1, u2), where u1 = u1, becomes

J(u) = γ

∫ ∞

0e−rt((1 − x2(t)) u2(t) D(x, u2(t)) − c u1(t)) dt,

where c = c/γ . �

b. With the new control variable u ∈ U = [0, u1] × [0, 1] as in part a, thesystem is described by the ODE

x =[

−α1x1 + u1

D(x, u2) −βx2

]≡ f (x, u), (B.8)

with initial state x0 = (x10, x20) ∈ (0, u1/α1) × (0, 1). The components v1,v2 of the vector field on the right-hand side of (B.8) are containedin [v1 min, v1 max] and [v2 min, v2 max], respectively, where v1 min = −α1x1,v1 max = −α1x1 + u1, v2 min = −βx2, v2 max = (1 − x2)(α2x1 +α3x2) −βx2.The bounding box [v1 min, v1 max] × [v2 min, v2 max] describes the vectogramf (x, U). Thus, taking into account the possibility of control, the dynamicsof the system are captured by the differential inclusion

x ∈ f (x, U).

Given a set-valued vector field on the right-hand side, useful phasediagrams are those at the corner points of the bounding box, notablywhere f ∈ {(v1 min, v2 min), (v1 max, v2 max)}. Figures B.9a and B.9b depictphase diagrams for the cases where β < α2 and β ≥ α3, respectively.

c. Let x1 = u1/α1. To determine a set C of controllable states, first notethat since x1 ∈ [v1 min, v1 max] = [−α1x,α1x + u1], the advertising effect x1

can be steered from any x10 ∈ (0, x1) to any x1 ∈ (0, x1) in finite time,independent of x2 and u2. One can do so in the fastest manner using aconstant control u1 ∈ {0, u1} such that

(x1 − x10

)u1 ≥ 0, which (using the

Cauchy formula) yields the trajectory

x1(t) = x10e−α1t + u1

α1(1 − e−α1t), ∀ t ∈ [0, T1],

Page 286: Optimal Control Theory With Applications in Economics

Appendix B 273

(a)

(b)

Figure B.9Phase diagram of the system of exercise 3.1 with parameters (α1,α2,α3,β, γ ) =(1, .05, .1, .6): (a) u(t) ≡ (0, 0); (b) u(t) ≡ (u1, 0); (c) u(t) ≡ (0, 1); and (d) u(t) ≡ (u1, 1).

where T1 = ( 1α1

) ln(α1x10−u1α1x1−u1

). Regarding the controllability of x2,

because of the negative drift term −βx2 the set of installed-base values x2

that can be reached from a given state x20 ∈ (0, 1) in finite time is limited.Indeed,

v2 max ≥ 0 ⇔ (1 − x2) (α2x1 +α3x2) ≥ βx2 ⇔ x2 ≤ x2(x1),

where

x2(x1) = α3 −α2x1 −β +√(α3 −α2x1 −β)2 + 4α2α3x1

2α3.

Page 287: Optimal Control Theory With Applications in Economics

274 Appendix B

(d)

(c)

Figure B.9(continued)

Thus, if the target state x = (x1, x2) is such that x2 ∈ (0, x2(x1)), it is pos-sible to steer the system first to the state (x10, x2) in finite time T2 byapplying the control u = (αx10, 0), keeping the marketing effect constantand charging zero price, and then to keep installed base at the targetlevel x2 by using u2 = βx2(1 − x2)−1(α2x1 +α3x2)−1 while adjusting themarketing effect from x10 to x1 using a constant marketing effort u1

(either zero or u1) as described earlier.Hence, it is possible to choose Cε = {(x1, x2) ∈ R2+ : ε ≤ x1 ≤ x1 − ε,

ε ≤ x2 ≤ x2(x1) − ε} as C, given any sufficiently small ε > 0 (figure B.10).

Page 288: Optimal Control Theory With Applications in Economics

Appendix B 275

(a) (b)

Figure B.10Set of controllable states C: (a) when β < α3, and (b) when β ≥ α3.

Moreover, the (open) set of states that can be reached from an initialstate x0 = (x10, x20) ∈ (0, x1) × (0, x2(x10))

R = {(x1, x2) ∈ R2+ : 0 < x1 < x1, 0 < x2 < x2(x1)}.

When the initial state x(0) = x0 � 0 is not in C, then the trajectory willenter the invariant set R in finite time and remain there forever. Notealso that limε→0+ Cε = R.

d. Clearly, the system equation (B.8) implies that the set X = [0, x1] ×[0, 1] is a compact invariant subset of the state space R+ × [0, 1]. There-fore, intuitively, the optimal solution for the infinite-horizon profit-maximization problem, if it exists, should approach either a limit cycleor a steady state. The dynamically optimal steady-state solution, givenits existence, should satisfy the necessary conditions given by the PMP.The current-value Hamiltonian is

H(t, x, u, ν) = (1 − x2)u2D(x, u2) − cu1

+ ν1( −α1x1 + u1) + ν2(D(x, u2) −βx2),

which is maximized with respect to u ∈ U by

u∗1 ∈

⎧⎪⎪⎨⎪⎪⎩

{u1} if ν1 > c,

[0, u1] if ν1 = c,

{0} otherwise,

Page 289: Optimal Control Theory With Applications in Economics

276 Appendix B

and

u∗2 = max

{0, min

{1,

12

(1 − ν2

1 − x2

)}}.

The adjoint equations are

ν1 = (r +α1)ν1 −α2(1 − x2)(1 − u2)ν2

− α2(1 − x2)2(1 − u2)u2,

ν2 = ((α2x1 + 2α3x2 −α3)(1 − u2) + r +β)ν2

+ (1 − x2) (2α2x1 + 3α3x2 −α3) (1 − u2)u2.

Provided that the dynamically optimal steady-state solution is interior,by the maximality of the Hamiltonian it is ν1 = c and u2 = − ν2

2(1−x2) .

From x = 0 one can conclude u1 = α1x1 and D(x, u2) = βx2, so

α2x1 = 2β −α3(1 − x2 + ν2)1 − x2 + ν2

x2,

and by ν = 0,

0 = (r +α1)c − α2

4(1 − x2 + ν2)2,

0 = (r +β)ν2 − α3

4(1 − x2 + ν2)2 +βx2,

and thus, after back-substitution, the dynamically optimal steady state(turnpike)

x1 = x2

[β√

α2c(r +α1)− α3

α2

]+

,

x2 = min

⎧⎪⎨⎪⎩1,

⎡⎢⎣ (r +β)(1 − 2

√c(r+α1)α2

) + c(r +α1)α3α2

r + 2β

⎤⎥⎦

+

⎫⎪⎬⎪⎭ .

For (α1,α2,α3,β, r, c) = (1, .05, .1, .6, .1, .0005) it is x = (x1, x2) ≈ (47.93,.4264).

e. The turnpike state x = (x1, x2) in part d is contained in R and there-fore also in Cε for small enough ε > 0, since

(1 − x2)(α2x1 +α3x2) > (1 − x2)(α2x1 +α3x2)(1 − u2) = βx2,

Page 290: Optimal Control Theory With Applications in Economics

Appendix B 277

as long as the firm charges a positive price u2 > 0 (using the argument ofpart c). The system exhibits a turnpike in the sense that optimal station-ary growth is achieved when the system starts at x0 = x and then neverleaves that state. Thus, if x0 �= x, it is (at least intuitively) optimal for thedecision maker to steer the system to the dynamically optimal steadystate x as profitably as possible. Even if the decision maker is ignorantabout the optimal trajectory, he can obtain close-to-optimal profits bysteering the system to x quickly as described for part c. Under that strat-egy, for a small x0, the decision maker would offer the product initiallyfor free to increase the customer base up to the desired level (penetrationpricing) and then compensate for product obsolescence or depreciation(i.e., the term −βx2 in the system equation (B.8)) and for the eventuallypositive price by sufficient advertising (branding).

3.2 (Exploitation of an Exhaustible Resource)

a. To deal with the state-dependent control constraint,

c(t) ∈ [0, 1{x(t)≥0}c],which expresses the requirement that consumption at time t ∈ [0, T] canbe positive only if the capital stock x(t) at that time is positive, it is usefulto introduce the endpoint constraint x(T) = 0, where T ∈ (0, T] is free.The social planner’s optimal control problem is therefore

J(c) =∫ T

0e−rtU(c(t)) dt −→ max

c(·),T

s.t. x = −c,

x(0) = x0, x(T) = 0,

c(t) ∈ [0, c], ∀ t ∈ [0, T],T ∈ (0, T].b. The current-value Hamiltonian associated with the social planner’soptimal control problem in part a is

H(t, x, c, ν) = U(c) − νc,

where ν : [0, T] → R is the (absolutely continuous) current-value adjointvariable. Given an optimal terminal time T ∈ [0, T] and an optimal

Page 291: Optimal Control Theory With Applications in Economics

278 Appendix B

admissible state-control trajectory (x∗(t), c∗(t)), t ∈ [0, T∗], the PMPyields the following necessary optimality conditions:

• Adjoint equation

ν(t) = rν(t) − Hx(t, x∗(t), c∗(t), ν(t)) = rν(t), ∀ t ∈ [0, T∗],so ν(t) = ν0ert, t ∈ [0, T∗], for some constant ν0 = ν(0).

• Maximality

c∗(t) = min{U−1c (ν0ert), c} ∈ arg max

c∈[0,c]H(t, x∗(t), c, ν(t)),

for all t ∈ [0, T∗].• Time optimality

H(T∗, x∗(T∗), c∗(T∗), ν(T∗)) − λ = 0,

where λ ≥ 0 is a constant such that λ(T − T∗) = 0.

c. For any given t ∈ [0, T∗] the optimal consumption c∗(t) in part b isobtained (as long as it is interior) from the condition

0 = Hc(t, x∗(t), c, ν(t)) = Uc(c) − ν(t).

Differentiating this condition for c = c∗(t) with respect to t yields

0 = Ucc(c∗(t)) c∗(t) − ν(t)

= Ucc(c∗(t)) c∗(t) − rν(t)

= Ucc(c∗(t)) c∗(t) − rUc(c∗(t)),

for all t ∈ [0, T∗], which is equivalent to the Hotelling rule. The economicinterpretation of the rule is that the relative growth c/c of consumptionc over time is negative. The absolute value of this negative relativeconsumption growth is proportional to the discount rate r and inverselyproportional to society’s relative risk aversion (another name for η). Thismeans that when society becomes more risk-averse, its consumptionshould be more even over time (often termed consumption smoothing).On the other hand, if society is very impatient, that is, when the socialdiscount rate r is very large, then most of the consumption should takeplace early on.

d. For U(c) = ln (c), one obtains from the maximality condition in part bthat

c∗(t) = min{e−rt/ν0, c},

Page 292: Optimal Control Theory With Applications in Economics

Appendix B 279

for all t ∈ [0, T∗]. Using both state-endpoint constraints (for x∗(0)and x∗(T∗)), it is therefore

x∗(T∗) = x0 −∫ T∗

0c∗(t) dt = x0 −

∫ T∗

0min{e−rt/ν0, c} dt

= x0 − cts −∫ T∗

ts

e−rt

ν0dt

= 0,

or equivalently,

x0 = cts + 1rν0

(e−rts − e−rT∗) ≡ cts + c

r(1 − e−r(T∗−ts)), (B.9)

where ts = min{T∗, [−(1/r) ln (ν0c)]+} ∈ [0, T∗] is a switching time. Thereare two cases to consider:

1. T∗ = T. Under this condition the time-optimality condition given inpart (b) is not active. Consequently, the optimal switching time is givenby (B.9) when T∗ is replaced by T.8

2. T∗ < T. Under this condition, consider two possibilities: (i) c∗(T∗) = c.

In this situation, ts=0 and T∗ = 1r ln

(c

c−x0r

). (ii) c∗(T∗) �= c. In this situ-

ation, the switching time can be determined using the time-optimalitycondition in part b:

U(c∗(T∗)) − ν(T∗)c∗(T∗) = ln (cer(T∗−ts)) − 1 = 0,

which implies that ts = T∗ − 1r ln

( ec

). Therefore,

ts = xc

− e − 1re

, T∗ = ts + 1r

ln( e

c

).

e. Since the Hamiltonian is independent of the state x, the feedback-control law μ(t, x) cannot depend on x. Thus, μ(t, x) = c∗(t). That is, nonew information or rule can be obtained for the decision maker aboutthe optimal spending policy as a function of the remaining wealth.

f. When T → ∞, the constraint T ≤ T becomes inactive and everythingelse in the social planner’s optimal control problem remains the same.In other words, only the second case of part d should be considered.

8. In this solution, it is assumed that x0 < cT. If x0 ≥ cT, i.e., the initial amount of theresource is more than what can be consumed at the maximum extraction rate, then theoptimal solution is given by c = c.

Page 293: Optimal Control Theory With Applications in Economics

280 Appendix B

3.3 (Exploitation of a Renewable Resource)

a. The firm’s profit-maximization problem can be written as the follow-ing optimal control problem:

J(u) =∫ T

0e−rtu(t)F(x(t)) dt −→ max

u(·)

s.t. x = α(x − x) + (1 − u(t))F(x),

x(0) = x(T) = x0,

u(t) ∈ [0, 1].Remark B.4 One could remove x from the formulation by introducingthe new state variable ξ = x − x; however, this is not without loss ofgenerality, since G(ξ ) = F(ξ + x) is generally equal to zero at ξ = 0. �

b. The current-value Hamiltonian is

H(t, x, u, ν) = uF(x) + ν(α(x − x) + (1 − u)F(x)

).

Given an optimal state-control trajectory (x∗(t), u∗(t)), t ∈ [0, T], the PMPyields the following necessary optimality conditions:

• Adjoint equation

ν(t) = −(α− r)ν(t) − (u∗(t) + (1 − u∗(t))ν(t))

Fx(x∗(t)),

for all t ∈ [0, T].• Maximality

u∗(t) ∈ arg maxu∈[0,1]

H(t, x∗(t), u, ν(t)),

so

u∗(t) ∈

⎧⎪⎪⎨⎪⎪⎩

{1} if ν(t) > 1,

[0, 1] if ν(t) = 1,

{0} otherwise,

for all t ∈ [0, T].Consider now the corresponding Hamiltonian system,

x = −α(x − x),

ν = − (α− r) ν− Fx(x),

Page 294: Optimal Control Theory With Applications in Economics

Appendix B 281

Figure B.11Optimal state and control trajectories for exploitation of renewable resource.

for ν > 1, and

x = −α(x − x) + F(x),

ν = − (α− r) ν− νFx(x),

for ν < 1.9 A phase diagram is given in figure B.11.10

c. Because the Hamiltonian is linear in the control variable, the opti-mal policy u∗(t) becomes discontinuous whenever the function ν(t) − 1changes signs on the interval (0, T). An interior solution u∗(t) ∈ (0, 1) ona time interval of nonzero measure implies that ν(t) = 1 on that interval,and thus ν(t) = 0 there. But the adjoint equation in part b would thenyield

−(α− r) − Fx = 0,

which is an impossibility for α ≥ r, since Fx > 0 by assumption. Thus,u∗(t) ∈ {0, 1} a.e. on [0, T]. Furthermore, since ν < 0 in the Hamiltoniansystem, independent of the control, it is not possible to have more thanone control switch. Such a switch, say, at time ts ∈ (0, T), must move thecontrol from u∗(t−s ) = 1 to u∗(t+s ) = 0.

Consider now the state trajectory x∗(t) = φ1(t, x0), t ∈ [0, ts], when thecontrol u∗(t) = 1 is applied and the system is exploited. The Cauchyformula yields that

x∗(t) = x − (x − x0)eαt, ∀ t ∈ [0, ts],9. The singular case where ν = 1 is eliminated in part c.10. The state x

¯in figure B.11 is the lower bound for initial states at which the ecological

system can survive on its own without dying out; it is determined as the solution of αx¯+

F(x¯) = αx.

Page 295: Optimal Control Theory With Applications in Economics

282 Appendix B

so x = φ1(ts, x0) = x − (x − x0)eαts is the size of the animal population atthe end of the exploitation phase [0, ts], or equivalently, at the beginningof the regeneration phase [ts, T]. The sustainability condition x∗(T) = x0

can be satisfied if and only if φ0(T − ts, x) = x0, where φ0 is the flowof the state equation when a zero control is applied. Clearly, if onedefines

G(x) =∫ x

x0

dξF(ξ ) −α(x − ξ )

,

which is an increasing function in x, then

x = φ0(ts − T, x0) = G−1(ts − T).

Hence, the switching time ts ∈ [0, T] is determined as the solution of

G−1(ts − T) = x − (x − x0)eαts ,

provided that a solution to this equation exists.11

d. Given the switching time ts ∈ (0, T), the optimal policy, as character-ized in part c, can be described in words as follows. Exploit the systemfully for t ∈ [0, ts], harvesting all the population surplus F(x), and sub-sequently regenerate the population to its original state by ceasing toharvest for t ∈ (ts, T]. Note that since ts is independent of r, the optimalstate-control trajectory (x∗(t), u∗(t)), t ∈ [0, T], is independent of r. Theeffect of an increase in α on the switching time can be found by implic-itly differentiating the equation that characterizes the switching time.Indeed,

−∂ts

∂α

1Gx(x)

= −(x − x0)eαts

(ts +α

∂ts

∂α

),

so

∂ts

∂α= (x − x0)eαtsGx(x)

1 − (x − x0)eαtsGx(x).

That is, at least for small α, the switching time is increasing in α. Thismakes intuitive sense, as the length of the exploitation phase can increasewhen the population exhibits faster dynamics.

e. Let F(x) = √x. Then

11. If there is no such solution, then the firm’s problem does not have a solution; thesystem is not sufficiently controllable.

Page 296: Optimal Control Theory With Applications in Economics

Appendix B 283

G(x) =∫ x

x0

dξ√ξ −α(x − ξ )

=∫ √

x

√x0

2zdzz −α(x − z2)

= 2a

∫ √x

√x0

[1 + a

1 + a + 2αz− 1 − a

1 − a + 2αz

]dz

= 1αa

((1 + a) ln

[1 + a + 2α

√x

1 + a + 2α√

x0

]− (1 − a) ln

[1 − a + 2α

√x

1 − a + 2α√

x0

]),

where a = √1 + 4α2x, and the switching time ts solves the fixed-point

problem

ts = T + G(x − (x − x0)eαts ).

For (α, x0, x, 2) = (1, 10, 12, 2), it is ts ≈ 0.3551. For t ∈ [0, ts], it is x∗(t) =12 − 2et, so x = x∗(ts) ≈ 9.1472. The optimal control is u∗(t) = 1{t≤ts}(figure B.12).

3.4 (Control of a Pandemic)

a. The social planner’s welfare maximization problem is given by

J(u) = −∫ T

0e−rt (x(t) + cuκ(t)

)dt −→ max

u(·)

s.t. u(t) ∈ [0, u],x = α(1 − x)x − ux, x(0) = x0,

x(t) ≥ 0,

where the finite horizon T > 0, the discount rate r > 0, the initialstate x0 ∈ (0, 1), and the constants α, c, u > 0 are given.

b. The current-value Hamiltonian associated with the social planner’soptimal control problem12 in part a is

H(t, x, u, ν) = −x − cuκ + ν(α(1 − x)x − ux

),

where ν : [0, T] → R is the (absolutely continuous) current-valueadjoint variable. Given an optimal admissible state-control trajectory(x∗(t), u∗(t)

), t ∈ [0, T], the PMP yields the following necessary opti-

mality conditions:For κ = 1,

12. In this problem, assume that u > α.

Page 297: Optimal Control Theory With Applications in Economics

284 Appendix B

Figure B.12Phase diagram of Hamiltonian system.

• Adjoint equation

ν = (r + u∗ −α+ 2αx∗)ν+ 1.

• Maximality

u∗(t) ∈⎧⎨⎩

{0} if ν(t) > −c/x∗(t),[0, u] if ν(t) = −c/x∗(t),{u} otherwise,

for (almost) all t ∈ [0, T].• Transversality

ν(T) = 0.

For κ = 2,

• Adjoint equation

ν = (r + u∗ −α+ 2αx∗)ν+ 1.

• Maximality

u∗(t) = max{

0, min{

u, −ν(t)x∗(t)2c

}},

Page 298: Optimal Control Theory With Applications in Economics

Appendix B 285

for (almost) all t ∈ [0, T].• Transversality

ν(T) = 0.

c. Let κ = 1. Since −c/x∗(T) < 0, and since by the transversality condi-tion, it is ν(T) = 0, it follows from the maximality condition that u∗(t) = 0for t ∈ [ts, T] where ts ∈ [0, T) is a switching time. By the system equationit is therefore

x∗(t) = xs

xs + (1 − xs)e−α(t−ts) , ∀ t ∈ [ts, T],

where xs = x∗(ts). Using the Cauchy formula, the adjoint equation,together with the transversality condition, therefore yields that

ν(t) = −∫ T

texp

[−∫ s

t

(r −α(1 − 2x∗(ς ))

)dς]

ds

= −∫ T

t

(xs + (1 − xs)e−α(t−ts)

xs + (1 − xs)e−α(s−ts)

)2

e−(α+r)(s−t)ds,

for all t ∈ [ts, T]. By the definition of the switching time, at t = ts, it isx∗(ts) = xs and

− cxs

= ν(ts) = −∫ T

ts

e−(α+r)(s−ts) ds(xs + (1 − xs)e−α(s−ts))2 ,

which directly relates xs and ts. Now examine the possibility of a singularcontrol arc, where ν(t) = −c/x∗(t) on a time interval. In that case, thesystem equation yields that u∗ = α(1 − x∗) − (x∗/x∗), whence the adjointequation takes the form

ν = rν+αx∗ν− x∗

x∗ ν+ 1 = rν+ ν+ 1 −αc,

using the fact that νx∗ = −c implies νx∗ + νx∗ = 0 on the relevant timeinterval. Thus, necessarily

ν(t) = −1 −αcr

= const.

on such a singular arc, which in turn also implies that

x∗(t) = cr1 −αc

= const. and u∗(t) = α

(1 − cr

1 −αc

)= const.

Page 299: Optimal Control Theory With Applications in Economics

286 Appendix B

Thus, for (α+ r)c > 1, no singular arc can exist. For (α+ r)c ≤ 1, it isin principle possible to have a singular arc (provided that u is largeenough), which would then correspond to a temporary turnpike of thesystem, as state, co-state (adjoint variable), and control are all constantin this scenario.

If, starting at t = 0, the regulator finds it optimal to apply the constantcontrol u, the state of the system evolves according to

x∗(t) = x0(1 − u

α

)x0 + ((1 − u

α

)− x0)

e−α(

1− uα

)t.

Provided that u ≥ α− r, this implies via the adjoint equation that ν(t)increases, thus approaching the switching point ν(ts) = −c/x∗(ts) frombelow. Thus, there can be at most one switching point, with possiblya singular control arc along which the system effectively stays still untilswitching to the zero control at the end of the trajectory. Let ts ≤ ts be thetime at which the system reaches the switching point or singular controlarc. Then

ν(ts) = − cxs

= −∫ T

ts

e−(α+r)(s−ts) ds(xs + (1 − xs)e−α(s−ts))2 = ν(ts),

and

xs = x0(1 − u

α

)x0 + ((1 − u

α

)− x0)

e−α(

1− uα

)ts

.

Omitting a detailed proof, it is now fairly straightforward to piece thesocially optimal policy together. Assume that x ∈ (0, 1) is the turnpikestate. Then, as long as the time horizon T is long enough, it is optimalto steer the system as fast as possible to xs = x by either applying zeroor maximal control, and to stay on the turnpike as long as necessary tomake sure that the adjoint variable can move from −c/x to 0 using a zerocontrol. If the time horizon is too short to reach the turnpike, then ts = ts,and ts is determined as solution of(

x0 + ((1 − uα

)− x0)

e−α(

1− uα

)ts)

c

x0(1 − u

α

) =∫ T

ts

e−(α+r)(s−ts) ds(xs + (1 − xs)e−α(s−ts)

)2 ,

provided that it is positive. Otherwise, it is optimal for the regulator notto intervene at all, that is, to select u∗(t) ≡ 0. One can see that the latter

Page 300: Optimal Control Theory With Applications in Economics

Appendix B 287

can happen if the intervention cost c is too large, that is, if (consideringthe previous equation for xs = x0 and ts = 0)

c >∫ T

0

x0 e−(α+r)s ds(x0 + (1 − x0)e−αs)2 ,

independent of u.Let κ = 2. Assuming that the intervention limit u is not binding on

the optimal trajectory, the Hamiltonian system becomes

x∗ = α(1 − x∗)x∗ + νx∗

2c,

ν = (r −α+ 2αx∗)ν− x∗ν2

2c+ 1.

This system of ODEs is highly nonlinear, so one can proceed via a qual-itative analysis. The nullclines for x = ν = 0 are described, for x ∈ (0, 1)and ν < 0, by

ν = −2αc(1 − x) and x = 1 − (α− r)ν(−ν)

(2α− ν

2c

) .

Both relations are satisfied at a turnpike (x, ν), of which (because the null-cline for ν = 0 does not intersect the axes) there can be none, one, or twovalues. The first of these can be neglected as an uninteresting measure-zero case because it is not robust to parameter perturbations. Considerthe normal case when there are two possible turnpikes. An examinationof the phase diagram reveals that only the one with the larger x-valuecan be reached along monotonic trajectories13 of the Hamiltonian systemin (x, ν)-space. If x0 < x, the optimal trajectory (x∗, ν) first approachesthe turnpike before increasing to the point (xT , 0) toward the end ofthe horizon. The terminal state xT is implied by the solution to theHamiltonian system of ODEs, together with the two-point boundaryconditions x(0) = x0 and ν(T) = 0. For x0 > x the optimal state trajec-tory x∗ will generally be nonmonotonic in t (as long as T is finite), firstapproaching x and then ending up at x∗(T) = xT .

13. When the time horizon T → ∞, the optimal policy becomes independent of time, andthe principle of optimality (for this one-dimensional system) therefore implies that theoptimal trajectories must be monotonic. (Otherwise, at a given state it would be sometimesoptimal to go up and sometimes optimal to go down.)

Page 301: Optimal Control Theory With Applications in Economics

288 Appendix B

For both κ = 1 and κ = 2, the optimal policy is such that xT > x, thatis, toward the end of the horizon it is optimal for the social planner toreduce the intervention effort and let the epidemic spread beyond thelong-run steady state. In contrast to the linear-cost case, when κ = 2 theregulator will exercise some intervention effort. In the linear-cost case,the regulator stops all action at the end of any finite time horizon.

d. The dynamically optimal steady state can be be understood as thespread of the epidemic, at which it would be cost-efficient to stabi-lize the population using a steady healthcare intervention effort. Inthe case κ = 1, where the solution can be given explicitly, the steadystate x = cr/(1 −αc) increases in the infectivity α, the cost c, and thediscount rate r. The results are qualitatively equivalent for κ = 2.

The longer the planning horizon T, the more important this steadystate becomes as a goal for public policy, and the implemented health-care measures can be considered permanent. In the given model,attempting to eradicate the disease completely is never optimal becausedoing so requires that u be unbounded. Last, if the intervention bound uis too small, it may not be possible to reach a dynamically optimal steadystate; yet the regulator should try to come as close as possible before(toward the end of the horizon) ceasing or dampening the interventioneffort.

e. See also the discussion in part d. It becomes optimal to steer thesystem in the least-cost way to its turnpike and keep it there indefinitely.

f. Figure B.13 shows the optimal state-control trajectories for somenumerical examples. To avoid the numerical problems associated witha singular solution, the case where κ = 1 is approximated by settingκ = 1.05 instead. This explains why the trajectories do not exactly attainthe turnpike computed in part c.

3.5 (Behavioral Investment Strategies)

Part 1: Optimal Consumption Plan

a. The investor’s optimal control problem is

JT(c) =∫ T

0e−rt ln (c(t)) dt −→ max

c(.)

s.t. x = αx − c,

x(0) = x0, x(t) ≥ 0,

c(t) ∈ [0, c], ∀ t ∈ [0, T].

Page 302: Optimal Control Theory With Applications in Economics

0 1 2 3 4 5

1

0.8

0.6

0.4

0.2

0

1

0.8

0.6

0.4

0.2

0

1

0.8

0.6

0.4

0.2

0

Disease Population

0 1 2 3 4 50

2

4

6

8

10Health Intervention

0 1 2 3 4 5

Disease Population

0 1 2 3 4 5

Health Intervention

(a)

(b)

Figure B.13Optimal disease-population and health-intervention trajectories for (α, c, r, u) =(5, .15, .2, 10) and x0 ∈ {5%, 50%} (a) when κ = 1.05, and (b) when κ = 2.

Page 303: Optimal Control Theory With Applications in Economics

290 Appendix B

Note that the state constraint x(t) ≥ 0 for t ∈ [0, T) is automaticallysatisfied, since otherwise the investor’s objective would diverge tominus infinity because the log does not allow a zero consumption overnonzero-measure time intervals. Hence, the state constraint x(t) ≥ 0can be written equivalently as the state endpoint inequality constraintx(T) ≥ 0.

The current-value Hamiltonian associated with the optimal controlproblem is then given by

H(t, x, c, ν) = ln (c) + ν(αx − c),

where ν is the current-value adjoint variable. Given an optimal admis-sible state-control trajectory (x∗(t), c∗(t)), t ∈ [0, T], the PMP yields thefollowing necessary optimality conditions:• Adjoint equation

ν(t) = −(α− r)ν(t), ∀ t ∈ [0, T].• Transversality

ν(T) ≥ 0.

• Maximality

c∗(t) = min{

1ν(t)

, c}

∈ arg maxc∈[0,c]

H(t, x∗(t), c, ν(t)),

for all t ∈ [0, T].From the adjoint equation and the transversality condition,

ν(t) = ν(T)e−(α−r)(t−T) ≥ 0, ∀ t ∈ [0, T].Clearly, the capital stock x∗(T) at the end of the horizon can be positiveonly if it is not possible to exhaust all resources by consuming at themaximum rate c on the entire time interval [0, T], namely,

x∗(T) > 0 ⇔ x0 > x0 ≡ cα

(1 − e−αT)

⇔ c∗(t) ≡ c.

When the initial capital stock x0 is limited, that is, when it does notexceed x0 = (c/α)(1 − e−αT), then necessarily x∗(T) = 0. By the Cauchyformula it is

Page 304: Optimal Control Theory With Applications in Economics

Appendix B 291

x∗(t) = x0eαt −∫ t

0eα(t−s)c∗(s) ds

= x0eαt +∫ t

0eα(t−s) min{c0e(α−r)s, c} ds,

where c0 = c∗(0) = ν(T)e(α−r)T is the initial consumption rate, so

x∗(T) = x0eαT −∫ T

0eα(T−s) min{c0e(α−r)s, c} ds = 0.

If the spending limit c is large enough, it will never constrain the agent’sconsumption, so

x∗(T) =(

x0 − c0

r(1 − e−rT)

)eαT = 0,

and thus c0 = rx0/(1 − e−rT). Hence,14

x0 ≤ x¯0 ≡ c

r

(1 − e−rT) e−(α−r)T

if and only if

c∗(t) = rx0e(α−r)t

1 − e−rT= c

(x0

x¯0

)e−(α−r)(T−t),

for all t ∈ [0, T]. Finally, if the initial capital stock x0 lies between x¯0

and x0, then there exists a switching time ts ∈ (0, T] such that c0e(α−r)ts =c, and

x∗(T) =(

x0 − c0

r(1 − e−rts ) − c

α(e−αts − e−αT)

)eαT = 0.

Consequently, the initial consumption level c0 is determined implicitlyby

x0 = c0

r

(1 −

(c0

c

) rα−r)

+ cα

((c0

c

) αα−r − e−αT

), (*)

or equivalently, by

αρ

1 − ρ

(x0

c+ e−αT

α

)= 1

1 − ρ

(c0

c

)−(c0

c

) 11−ρ

,

14. One can show that α > r implies that x0 > x¯0, for all T > 0.

Page 305: Optimal Control Theory With Applications in Economics

292 Appendix B

where ρ = r/α ∈ (0, 1). To summarize, the investor’s optimal T-horizonconsumption plan, as a function of his initial capital x0, the investmentreturn α, and his opportunity cost of capital r, is given by

c∗T(t) =

⎧⎪⎪⎨⎪⎪⎩

rx0e(α−r)t

1 − e−rTif x0 ≤ x

¯0,

c if x0 > x0,min{c0e(α−r)t, c} otherwise,

∀ t ∈ [0, T],

where c0 = c0(α, r, x0) is determined by (*).

b. Building on the finite-horizon solution in part a, note first that asT → ∞, the thresholds x

¯0 → 0+ and x0 → c/α. In addition, since ln (c(t))is bounded from above by ln (c), the difference between the finite-horizon objective JT(c) and the corresponding infinite-horizon objec-tive J∞(c) goes to zero as T → ∞ for any feasible consumption plan c(t) ∈[0, c], t ≥ 0. One can therefore expect pointwise convergence of the finite-horizon consumption plan c∗

T(t), t ≥ 0, to the optimal infinite-horizonconsumption plan c∗∞(t), t ≥ 0, as T → ∞. The consumption plan

c∗∞(t) = lim

T→∞c∗

T(t) ={

c if x0 > c/α,min{c0e(α−r)t, c} otherwise,

for all t ≥ 0, where c0 satisfies

rx0

c=(

1 −(c0

c

) rα−r)(c0

c

)+ rα

(c0

c

) αα−r

,

is indeed a solution to the infinite-horizon optimal consumption prob-lem, since there can be no policy that can produce a strictly higher valueof the objective function.15 Note that under the optimal infinite-horizonconsumption plan it becomes eventually optimal to consume at thespending limit c.

Part 2: Myopic Receding-Horizon Policy

c. Periodic updating with limited implementation may be due to uncer-tainty or limited commitment ability. For example, an elected officialmay not be able to implement a policy over a horizon that exceeds thelength of his mandate. Such receding-horizon decision making mayalso result from periodic reporting and decision-making schedules in

15. If for some ε > 0 there is a policy c such that J∞(c) ≥ J∞(c∗∞) + ε, then that immediatelyproduces a contradiction for large enough horizons, T >

∣∣ln (ε/| ln (c)|)∣∣, because then thevalue of the infinite-horizon objective J∞(c) must be less than ε away from the optimalfinite-horizon objective JT (c∗

T ).

Page 306: Optimal Control Theory With Applications in Economics

Appendix B 293

organizations. Figure B.14 depicts how, given a planning horizon Tand an implementation horizon τ , the receding-horizon policy c∗

T,τis obtained as concatenation of partially implemented finite-horizonpolicies ck(t) = c∗

T(t − kτ ) for t ∈ Ik = [kτ , (k + 1)τ ], k ∈ N.

d. Yes. When T → ∞, the investor’s optimal infinite-horizon consump-tion policy does depend only on his current capital and not on the timethat is left to consume. Indeed, the solution to the infinite-horizon opti-mal consumption problem does not depend on when it starts in absolutetime, but only on the investor’s initial capital x0 (and, of course, on theparameters α, r, which are assumed fixed). By construction, the limitingreceding-horizon policy is equal to the infinite-horizon optimal policy,

limT→∞

c∗T,τ (t) = c∗

∞(t), ∀ t ≥ 0.

e. Yes. Given an initial capital x0 below the threshold x0 = cα

(1 − e−αT),and a planning horizon T > ( 1

r ) ln(

αα−r

), the optimal T-horizon state

trajectory is initially upward-sloping, since

x∗(t)x0

= αe−rt − e−rT

1 − e−rTeαt − r

e(α−r)t

1 − e−rT= eαt

1 − e−rT[(α− r)e−rt −αe−rT]

in a right-neighborhood of t = 0. If τ is the time such that x∗(τ ) =x∗(0) = x0, then the corresponding receding-horizon state-controlpath (x∗

T,τ (t), u∗T,τ (t)) is τ -periodic. Figure B.15b shows numerical

examples with and without periodicity, for (α, r, c, x0) = (.2, .1, 4, 15)and (T, τ ) ∈ {(10, 6), (10, 2.7)}.

0 1 2 3 4 5 6 7 8 9

...

...

Time

Receding-Horizon Policy

Figure B.14Receding-horizon decision making with (T, τ ) = (5, 1).

Page 307: Optimal Control Theory With Applications in Economics

294 Appendix B

0 2 4 6 8 10 12 14 16 189

10

11

12

13

14

15

16

Time

Wea

lth

0 2 4 6 8 10 12 14 16 180

1

2

3

4

5

Time

Co

nsu

mp

tio

n

(a)

Figure B.15(a) Receding-horizon consumption plan with nonmonotonic state-control trajectory andcapital exhaustion (limt→∞ x∗

T,τ (t) = 0). (b) Receding-horizon consumption plan withperiodic state-control trajectory.

Part 3: Prescriptive Measures

f. Somewhat surprisingly, a long-run steady state x∗∞ does not alwaysexist. Indeed, if x0 > c/α, then the investor’s capital continues togrow despite his consuming at the spending limit c at all times t ≥ 0.When x0 ≤ c/α, the findings in part b imply that x∗∞ = c/α. Yet, becauseconvergence to the steady state along an optimal state trajectory ispossible only from below, it is not stable.

Page 308: Optimal Control Theory With Applications in Economics

Appendix B 295

0 2 4 6 8 10 1214.95

15

15.05

15. 1

15.15

15. 2

15.25

15. 3

Time

Wea

lth

0 2 4 6 8 10 12

3.5

3.4

3.3

3.2

3.1

3

2.9

2.8

2.7

2.6

Time

Co

nsu

mp

tio

n

(b)

Figure B.15(continued)

g. The only initial states x0 for which the modified (T, τ )-receding-horizon consumption plan with the additional endpoint constraint isfeasible are such

x0 ≤ x∗∞ = c/α,

implying solutions identical to the corresponding optimal infinite-horizon plans. By setting consumption to zero, the long-run steady stateis reached in the fastest possible way, which bounds the planning hori-zons T that can be used to implement the modified receding-horizon

Page 309: Optimal Control Theory With Applications in Economics

296 Appendix B

policy from below, requiring that

T ≥ T¯

≡ 1α

ln(

cαx0

).

Note also that under the modified receding-horizon consumption planit is not possible for the investor to consume at the spending limit towardthe end of the planning horizon, since that would not allow the capitalto grow to the long-run steady state. Thus, the optimal consumptionplan c∗

T(t), t ∈ [0, T], is interior, in the sense that

c∗T(t) = c0e(α−r)t ∈ (0, c), ∀ t ∈ [0, T].

The constant c0 = c∗T(0) is determined by the state-endpoint constraint,

so (analogous to part a)

x∗(T) =(

x0 − c0

r(1 − e−rT)

)eαT = c

α,

whence

c0 = r1 − e−rT

(x0 − c

αe−αT

)< c0,

and

c∗T(t) = re(α−r)t

1 − e−rT

(x0 − c

αe−αT

), ∀ t ∈ [0, T].

One can think of the modified finite-horizon consumption plan as aregular finite-horizon consumption plan where initial capital x0 hasbeen reduced (virtually) by the amount (c/α)e−αT (< x0), that is,

c∗T(t; x0) = c∗

T(t; x0 − (c/α)e−αT), ∀ t ∈ [0, T].As a consequence, it is not possible that within the planning horizonthe trajectory becomes downward-sloping. Hence, under the modifiedreceding-horizon consumption plan c∗

T,τ (t), t ≥ 0, there cannot be anyperiodic cyclical consumption behavior. Because of the monotonicity ofthe state trajectory with the state-endpoint constraint, there is conver-gence to the dynamically optimal steady state for any implementationhorizon τ ∈ (0, T), a significant improvement, which comes at the priceof a more moderate consumption, effectively excluding consumptionlevels at the spending limit.

3.6 (Optimal Consumption with Stochastic Lifetime)

Page 310: Optimal Control Theory With Applications in Economics

Appendix B 297

a. Given any admissible control trajectory c(t) > ε, t ≥ 0, for some ε ∈(0, min{αx0, c}),16 the investor’s objective functional is (using integrationby parts)

J(c) =∫ ∞

0g(T)

(∫ T

0e−rtU(c(t)) dt

)dT

=[

G(T)∫ T

0e−rtU(c(t)) dt

]∞

0−∫ ∞

0G(T)e−rTU(c(T)) dT,

where G(T) = ∫ T0 g(τ ) dτ = λ

∫ T0 e−λτdτ = 1 − e−λT is the cumulative

distribution function for the Poisson random variable T. Thus,

J(c) =∫ ∞

0e−rtU(c(t)) dt −

∫ ∞

0

(1 − e−λT) e−rTU(c(T)) dT

=∫ ∞

0e−(λ+r)tU(c(t)) dt,

since the fact that U([ε, c]) is compact implies that the function U(c(t))is bounded (whence all integrals in the last relation converge). One cantherefore rewrite the investor’s stochastic optimal consumption prob-lem (as formulated in part a) as the following infinite-horizon optimalcontrol problem.

J(c) =∫ ∞

0e−(λ+r)t ln (c(t)) dt −→ max

c(·)

s.t. x(t) = αx(t) − c(t),

x(0) = x0, x(t) ≥ 0,

c(t) ∈ [0, c],∀ t ≥ 0.

b. The solution to the optimal control problem in part a can beobtained by approximating the infinite-horizon optimal control prob-lem by a sequence of finite-horizon optimal control problems of theform

16. The constant ε > 0 needs to be chosen small enough so that the initial capital x0 is notexhausted in finite time.

Page 311: Optimal Control Theory With Applications in Economics

298 Appendix B

JTk (c) =∫ Tk

0e−(λ+r)t ln (c(t)) dt −→ max

c(·)

s.t. x(t) = αx(t) − c(t),

x(0) = x0, x(t) ≥ 0,

c(t) ∈ [0, c],∀ t ∈ [0, Tk],for k ∈ N, where 0 < Tk < Tk+1 such that Tk → ∞ as k → ∞, and thentaking the limit for k → ∞ in the optimal consumption policy cTk (t),t ∈ [0, Tk] of the kth finite-horizon problem.

First formulate the deterministic finite-horizon optimal control prob-lem. Let Tk = T > 0 and set ρ = λ+ r. Note that, depending on the valueof λ, it is possible that α > ρ or α ≤ ρ. The corresponding current-valueHamiltonian is

H(x, c, ν) = ln (c) + ν(αx − c),

and the PMP yields the following necessary optimality conditions foran optimal state-control trajectory (x∗

T(t), c∗T(t)), t ∈ [0, T]:

• Adjoint equation

ν(t) = −(α− ρ)ν(t), ∀ t ∈ [0, T].• Transversality

ν(T) ≥ 0.

• Maximality

c∗T(t) = min

{1ν(t)

, c}

∈ arg maxc∈[0,c]

H(x∗T(t), c, ν(t)), ∀ t ∈ [0, T].

The adjoint equation and the transversality condition yield that

ν(t) = ν(T)e−(α−ρ)(t−T) ≥ 0, ∀ t ∈ [0, T].The capital stock x∗(T) at t = T can be positive only if it is not possible toexhaust all resources by consuming at the maximum rate c on the entiretime interval [0, T], namely,

x∗(T) > 0 ⇔ x0 > x0 ≡ cα

(1 − e−αT) ⇔ c∗(t) ≡ c.

Page 312: Optimal Control Theory With Applications in Economics

Appendix B 299

When the initial capital stock x0 is limited, that is, when it does notexceed x0 = (c/α)

(1 − e−αT

), then necessarily x∗(T) = 0. By the Cauchy

formula it is

x∗(t) = x0eαt −∫ t

0eα(t−s)c∗(s) ds

= x0eαt +∫ t

0eα(t−s) min{c0e(α−ρ)s, c} ds,

where c0 = c∗(0) = e(α−ρ)T/ν(T) is the (possibly fictitious)17 initial con-sumption rate. Thus,

x∗(T) = x0eαT −∫ T

0eα(T−s) min{c0e(α−ρ)s, c} ds = 0.

If the spending limit c is large enough, it will never constrain theinvestor’s consumption, so

x∗(T) =(

x0 − c0

ρ(1 − e−ρT)

)eαT = 0,

and thus c0 = ρx0/(1 − e−ρT). Hence,18

x0 ≤ x¯0 ≡ c

ρ(1 − e−ρT)e−([α−ρ]+)T ⇔

c∗(t) = ρx0e(α−ρ)t

1 − e−ρT= c

(x0

x¯0

)e−([α−ρ]+)Te(α−ρ)t, ∀ t ∈ [0, T].

Last, if the initial capital stock x0 lies between x¯0 and x0, then there

exist a switching time ts ∈ (0, T] and a constant c0 such that c0e(α−ρ)ts = c,and

x∗(T) ={

(x0 − c0ρ

(1 − e−ρts) − cα

(e−αts − e−αT))eαT if α > ρ

(x0 − cα

(1 − e−αts) − cρ

(e−ρts − e−ρT))eαT if α ≤ ρ

}

= 0.

The (possibly fictitious) initial consumption level c0 is determinedimplicitly by the equation

17. It is possible that c0 > c when α < ρ.18. One can show that x0 > x

¯0, for all T > 0.

Page 313: Optimal Control Theory With Applications in Economics

300 Appendix B

x0 =⎧⎨⎩

c0ρ

(1 − ( c0

c

) ρα−ρ)

+ cα

(( c0c

) αα−ρ − e−αT

)if α > ρ,

(1 − ( c0

c

) αρ−α)

+ cρ

(( c0c

) ρρ−α − e−ρT

)otherwise.

(**)

Hence, the investor’s optimal T-horizon consumption plan, as a func-tion of his initial capital x0, the investment return α, and his (effective)opportunity cost of capital ρ, is given by

c∗T(t) =

⎧⎪⎪⎨⎪⎪⎩ρx0e(α−ρ)t

1 − e−ρTif x0 ≤ x

¯0,

c if x0 > x0,min{c0e(α−ρ)t, c} otherwise,

∀ t ∈ [0, T],

where c0 = c0(α, ρ, x0) is determined by (**).Now the deterministic infinite-horizon optimal control problem can

be formulated. Take the limit in the foregoing optimal finite-horizonconsumption plan.19 For this, note first that as T → ∞, the thresholdsx¯0 → (c/ρ)1{α≤ρ} and x0 → c/α. The consumption plan

c∗(t) = limT→∞

c∗T(t) =

⎧⎨⎩ρx0e(α−ρ)t if x0 ≤ (c/ρ)1{α≤ρ},c if x0 > c/α,min{c0e(α−ρ)t, c} otherwise,

for all t ≥ 0, where c0 satisfies (**), is the solution to the infinite-horizonoptimal consumption problem and thus also to the investor’s originalstochastic optimal consumption problem.

c. When T is perfectly known, it is possible to set λ = 0 and then obtainfrom part b the optimal T-horizon consumption plan,

c∗T(t) =

⎧⎪⎪⎨⎪⎪⎩

rx0e(α−r)t

1 − e−rTif x0 ≤ c

r (1 − e−rT)e−([α−r]+)T ,

c if x0 >cα

(1 − e−αT),min{c0e(α−r)t, c} otherwise,

for all t ∈ [0, T], where c0 = c0(α, r, x0) is determined by (**) for ρ = r.Unless the initial capital is large, namely, when x0 > x0 = c

α

(1 − e−αT

),

19. Since ln (c(t)) is bounded from above by ln (c), the difference between the finite-horizonobjective JT (c)andthecorrespondinginfinite-horizonobjective J∞(c)goes tozeroasT → ∞for any feasible consumption plan c(t) ∈ [0, c], t ≥ 0. One can therefore expect pointwiseconvergence of the optimal finite-horizon consumption plan c∗

T (t), t ≥ 0, to the optimalinfinite-horizon consumption plan c∗(t), t ≥ 0, as T → ∞, since there is no policy that canproduce a strictly higher value of the objective function. Indeed, if for some ε > 0 thereexists a policy c such that J(c) ≥ J(c∗∞) + ε, one obtains a contradiction for large enoughT >

∣∣ln (ε/| ln (c)|)∣∣ because then the value of the infinite-horizon objective J∞(c) mustbe less than ε away from the optimal finite-horizon objective JT (c∗

T ).

Page 314: Optimal Control Theory With Applications in Economics

Appendix B 301

the optimal finite-horizon policy uses up the entire capital within T. Theinfinite-horizon policy, on the other hand, ensures that the investor’sbank balance stays positive.

The expected value of perfect information (EVPI) about the length ofthe planning horizon is the expected difference of the investor’s optimalpayoffs when using a policy c∗

T(t), t ∈ [0, T], that is perfectly adaptedto the planning-horizon realization T versus using the general-purposepolicy c∗(t), t ≥ 0, in the case where the horizon is unknown. In otherwords, EVPI = E[ JT(c∗

T)] − J(c∗), or equivalently,

EVPI =∫ ∞

0g(T)

(∫ T

0e−rt (U(c∗

T(t)) − U(c∗(t)))

dt)

dT.

Using similar techniques as in (ii), the expected value of perfectinformation (which must be nonnegative) becomes

EVPI =∫ ∞

0e−rt

(ln

c∗∞(t)c∗(t)

− (1 − e−λt) lnc∗

t (t)c∗(t)

)dt −

(Ts − 1 − e−λTs

λ

),

where Ts is a switching time such that c∗T(t) depends nontrivially on

the horizon T for T < Ts.

d. The optimal (deterministic) infinite-horizon consumption plan c∗∞(t),t ≥ 0, is

c∗∞(t) =

{c if x0 > c/α,min{c0e(α−r)t, c} otherwise,

∀ t ≥ 0.

Consider first the case where ρ < α and the initial capital is small, sothat x0 < c/α. From (*), c0 is implicitly given by

x0 =(

− cρ

)(c0

c

) αα−ρ + c0

ρ.

Calculating the first derivative of c0 with respect to ρ using this equationshows that c0 is increasing in ρ. Consequently, the initial optimal con-sumption rate in the stochastic optimal consumption problem exceedsthe optimal deterministic infinite-horizon consumption rate. Note thatin this situation both c∗(t) and c∗∞(t) will reach c at the switchingtime

ts = 1α− ρ

ln(

cc0

).

Page 315: Optimal Control Theory With Applications in Economics

302 Appendix B

When α ≤ ρ, that is, when the expected lifetime is relatively short, theoptimal consumption c∗(t) is a nonincreasing function, in contrast toc∗∞(t), which is nondecreasing.

e. On the one hand, when the personal lifetime is not known, it is best tospend more conservatively than when the horizon is perfectly known,since in the latter case, nothing has to be left at the end (unless initialcapital is so large that it is essentially impossible to consume). Yet, onthe other hand, while c∗∞ is always nondecreasing (as a consequence ofthe assumption that α > r), it is possible (when α ≤ ρ = λ+ r) that c∗

is decreasing. For a given α, r this happens when λ is large (when theexpected lifetime is short). This means that when the remaining lifetimeis unknown but likely to be short, it becomes optimal to consume a lot inthe present and the near future, and when—against expectations—lifeturns out to be longer, to reduce consumption later, which is exactly theopposite philosophy of the deterministic infinite-horizon policy whenexcess return α− r is positive. To interpret both in the same context, onecan think of α− ρ as the effective excess return and thus of λ as a priceof lifetime risk.

B.4 Game Theory

4.1 (Linear-Quadratic Differential Game)

a. As in the analysis of the linear-quadratic regulator in example 3.3,consider an HJB equation for each player i ∈ {1, . . . , N},rVi(t, x) − Vi

t(t, x) =

maxui∈U i

⎧⎨⎩−x′Ri(t)x −

N∑j=1

(uj)′Sij(t)uj + 〈Vix(t, x), A(t)x +

N∑j=1

Bj(t)uj〉⎫⎬⎭ .

Analogous to earlier developments assume that the value function isquadratic, of the form

Vi(t, x) = −x′Qi(t)x,

where Qi(t) a continuously differentiable matrix function with symmet-ric positive definite values in Rn×n for all t ∈ [0, T]. Substituting the valuefunction into the HJB equation yields a linear feedback law,

μi(t, x) = −(Sii(t))−1(Bi(t))′Qi(t)x,

Page 316: Optimal Control Theory With Applications in Economics

Appendix B 303

for all i ∈ {1, . . . , N}. The matrix functions Q1, . . . , QN satisfy a system ofRiccati differential equations,

− Ri(t) = Qi − rQi + QiA(t) + A′(t)Qi

+N∑

j=1

(2Qi − Q j)B j(t)(S jj(t))−1Sij(t)(S jj(t))−1(B j(t))′Q j,

for all t ∈ [0, T], with endpoint conditions

Qi(T) = Ki, i ∈ {1, . . . , N}.The equilibrium state trajectory x∗(t), t ∈ [0, T], is then obtained assolution to the linear IVP

x =⎡⎣A(t) −

N∑j=1

Bj(t)(Sjj(t))−1(Bj(t))′Qj(t)

⎤⎦ x, x(0) = x0.

The (unique) open-loop Nash-equilibrium strategy profile is thereforeu∗ = (u1∗, . . . , uN∗), with

ui∗(t) = μi(t, x∗(t)) = −(Sii(t))−1(Bi(t))′Qi(t)x∗(t), ∀ t ∈ [0, T],for all i ∈ {1, . . . , N}.b. Assume that each player i ∈ {1, . . . , N} uses a linear feedback law ofthe form

μi(t, x) = Mi(t)x,

where the matrix function Mi is continuously differentiable with valuesin Rm×n. Instead of the HJB equation, which is useful for the compu-tation of open-loop Nash equilibria, an application of the PMP (seeproposition 3.5), proves more productive. Substituting the other players’feedback laws in player i’s objective functional and the state equation,player i’s current-value Hamiltonian has the form

Hi(t, x, ui, ν i) = − x′⎛⎝Ri +

N∑j �=i

(Mj)′SijMj

⎞⎠ x − (ui)′Siiui

+ 〈ν i, Ax +∑j �=i

B jMjx + Biui〉,

Page 317: Optimal Control Theory With Applications in Economics

304 Appendix B

where ν i is the current-value adjoint variable. The maximality conditionyields that ui∗ = (Sii)−1(Bi)′(νi/2), so the adjoint equation becomes

ν i = rν i + 2

⎛⎝Ri +

N∑j �=i

(Mj)′SijMj

⎞⎠ x −

⎛⎝A +

∑j �=i

B jMj

⎞⎠ νi,

with transversality condition

ν i(T) = −2Kix(T).

To solve the adjoint equation, use the intuition from the solution forpart a, and assume that ν i = −2Qix, building on the conceptual prox-imity of the HJB equation and the PMP (see section 3.3). This guess forthe relation between ν i is arrived at by first assuming that player i’svalue function is quadratic, of the form Vi(t, x) = −x′Qi(t)x, as inpart a, and then setting νi = Vi

x = −2Qix, where Qi(t) is a continu-ously differentiable matrix function with symmetric positive definitevalues in Rn×n. The adjoint equation, together with the transversalitycondition, is solved by ν i = −2Qix (so Mi = −(Sii)−1(Bi)′Qi), providedthat the functions Q1, . . . , QN satisfy the system of Riccati differentialequations

− Ri(t) = Qi − rQi + QiA(t) + A′(t)Qi

+N∑

j=1

(Qi − Qj)B j(t)(Sjj(t))−1Sij(Sjj(t))−1(B j(t))′Q j,

for all t ∈ [0, T], with endpoint conditions

Qi(T) = Ki, i ∈ {1, . . . , N}.The corresponding closed-loop equilibrium state-control trajectory(x∗(t), u∗(t)), t ∈ [0, T], is then obtained as for part a.

c. The fundamental difference between the open-loop equilibrium inpart a and the closed-loop equilibrium in part b is that in the open-loop formulation each player i considers the other players’ strategiesonly as a function of time, whereas in the closed-loop formulation thecontrol law for the other players appears in player i’s objective andin his considerations about the evolution of the state (see remark 4.6).Consequently, the Riccati differential equations in parts a and b differby the term QiB j(t)(Sjj(t))−1Sij(Sjj(t))−1(B j(t))′Qj, although the structuralproperties (such as linearity) of the equilibria under the two solutionconcepts are essentially the same.

Page 318: Optimal Control Theory With Applications in Economics

Appendix B 305

4.2 (Cournot Oligopoly)

a. The differential game�(x0) consists of a set of players N = {1, . . . , N},a set of objective functionals, {Ji(ui)}N

i=1, where Ji(ui) (for i ∈ N ) is speci-fied in the problem, a description of the evolution of the state p(t) fromits known initial value p0, given by the IVP

p = f (p, u1, . . . , uN), p(0) = p0,

and the control constraints ui ∈ U = [0, u], where u > 0 is given.

b. In the stationary Cournot game, in which the price p0 and all outputstrategies ui

0 ∈ U = [0, u], i ∈ N , stay constant, the equilibrium price p0

is determined by

p0 = a −N∑

i=1

ui0.

Given the other firms’ stationary strategy profile u−i0 ∈ [0, u]N−1, each

firm i ∈ N determines its best-response correspondence

BRi(u−i0 ) = arg max

ui∈[0,u]

⎧⎨⎩⎛⎝a − ui −

∑j �=i

uj0

⎞⎠ ui − C(ui)

⎫⎬⎭

=⎧⎨⎩min

⎧⎨⎩u,

13

⎡⎣a − c −

∑j �=i

uj0

⎤⎦

+

⎫⎬⎭⎫⎬⎭ .

At the stationary Nash-equilibrium strategy profile u0 = (u10, . . . , uN

0 ) oneobtains

ui0 ∈ BRi(u−i

0 ), ∀ i ∈ N .

Using the symmetry yields

ui0 = min

{u,

a − cN + 2

}∈ [0, u], ∀ i ∈ N .

The corresponding stationary equilibrium price is

p0 = c + (2a/N)1 + (2/N)

.

Note that p0 → c as N → ∞.

c. Given the other players’ strategy profile u−i(t), t ≥ 0, player i ∈ Nsolves the optimal control problem

Page 319: Optimal Control Theory With Applications in Economics

306 Appendix B

Ji(ui) =∫ ∞

0e−rt(p(t)ui(t) − C(ui(t))) dt −→ max

ui(·)

s.t. p = α

⎛⎝a − p − ui −

∑j �=i

uj

⎞⎠ ,

p(0) = p0,

ui(t) ∈ [0, u], ∀ t ≥ 0.

The current-value Hamiltonian of player i’s optimal control problem isgiven by

Hi(t, p, ui, νi) = pui − cui − (ui)2

2+α

⎛⎝a − p − ui −

∑j �=i

uj(t)

⎞⎠ ν i,

where νi denotes the current-value adjoint variable. Applying the PMPyields the following necessary optimality conditions for any optimalstate-control trajectory (p∗(t), ui∗(t)), t ≥ 0:

• Adjoint equation

ν i(t) = (α+ r) ν i(t) − ui∗(t), ∀ t ≥ 0.

• Transversality

e−rtν i(t) → 0, as t → ∞.

• Maximality

ui∗(t) = p∗(t) −αν i(t) − c ∈ arg maxui∈[0,u]

Hi(t, p∗(t), ui, νi(t)),

for all t ≥ 0.

From the maximality condition one finds that ν i = (p∗ − ui∗ − c

)/α, and

by differentiating with respect to time, ν i = (p∗ − ui∗) /α. Using the

adjoint equation and the state equation, one can eliminate the adjointvariable and obtain that in equilibrium

ui∗(t) = α

⎛⎝a − p∗(t) −

∑j �=i

uj∗(t)

⎞⎠− (α+ r)

(p∗(t) − c − ui∗(t)

),

for all t ≥ 0 and all i ∈ N . Together with the state equation, theseODEs determine the evolution of the optimal state-control trajectory.

Page 320: Optimal Control Theory With Applications in Economics

Appendix B 307

Restricting attention to a symmetric solution (which by uniqueness ofsolutions to ODEs is also the only solution), one obtains a system of twolinear ODEs (with constant coefficients),

p(t) = α(a − p(t) − Ny(t)),

y(t) = αa + (α+ r)c − (2α+ r)p(t) − ((N − 2)α− r)y(t),

where ui∗ = y for all i ∈ N ; any remaining superscripts have beendropped for convenience. Note first that the system described by theseODEs has the unique equilibrium

(p, y) =(

c + 2α+rα+r

( aN

)1 + 2α+r

α+r

( 1N

) ,a − c

N + 2 − rα+r

),

and can therefore be written in the form[py

]= A

[p − py − y

],

with the system matrix

A = −[

α αN2α+ r α(N − 2) − r

].

Thus, given an initial value (p0, y0), the Cauchy formula yields that[p(t)y(t)

]=[

py

]+[

p0 − py0 − y

]eAt, ∀ t ≥ 0.

There are two remaining problems. First, the determinant of A isnegative,

det (A) = λ1λ2 = −(N + 1)αr − (N + 2)α2 < 0,

so the (real) eigenvalues λ1 < 0 < λ2 of A,

λ1,2 = r − (N − 1)α2

± 12

√((N − 1)α+ r)2 + 8αr + 4(N + 2)α2,

must have different signs. This means that the system is unstable andcannot be expected to converge to the steady state (p, y) computed ear-lier, as long as (p0, y0) �= (p, y). Second, the initial value y0 is not knownand has to be determined from the transversality condition. The first dif-ficulty can be addressed by realizing that the equilibrium (p, y), though

Page 321: Optimal Control Theory With Applications in Economics

308 Appendix B

unstable, is in fact a saddle point, so there are trajectories (in the direc-tion of eigenvectors associated with the negative eigenvalue λ1) thatconverge to it. The second difficulty is resolved by choosing y0 suchthat (p0 − p, y0 − y) becomes an eigenvector associated with the negativeeigenvalue λ1, namely,20[

p0 − py0 − y

]⊥ A − λ1I,

where I is the 2 × 2 identity matrix. The last relation is equivalentwith

(A − λ1I)[

p0 − py0 − y

]= 0,

which in turn implies that

y0 = y + p0 − p2αN

((N − 3)α− r

+√

((N − 1)α+ r)2 + 8αr + 4(N + 2)α2).

Note that the convergence of (p(t), y(t)) to the steady state (p, y) is com-patible with the transversality condition, which can be used as analternative justification for the choice of the initial value y0. This yields

p(t) = p + (p0 − p)eλ1t,

and

y(t) = y + (y0 − y)

eλ1t.

Noting that p∗(t) = p(t) and ui∗(t) = y(t), the equilibrium turn-pike

(p∗, u∗) can be directly compared to the static solution in part a,

(p∗, ui∗

)=(

c + 2α+rα+r

( aN

)1 + 2α+r

α+r

( 1N

) ,a − c

N + 2 − rα+r

)

=(

p0 − rα+r

( 1N+2

)1 − r

α+r

( 1N+2

) ,u0

1 − rα+r

( 1N

))

,

for all i ∈ N , provided that the control bound u is large enough (andthus nonbinding). While the dynamic steady-state production is always

20. I denotes “is orthogonal to.”

Page 322: Optimal Control Theory With Applications in Economics

Appendix B 309

larger than for the static solution, the same holds true for price only ifthe static solution p0 is large (greater than 1).

The equilibrium price monotonically decreases if p0 > p and increasesotherwise, converging to the steady state p. Firm i’s equilibrium produc-tion output ui∗(t) either decreases or increases, depending on the signof y0 − ui∗, converging to the steady state ui∗ as t → ∞.

Remark B.5 Instead of solving the preceding system of first-orderODEs, it is possible instead to transform this system into a singlesecond-order ODE for price,

p(t) + σ p(t) + ρp(t) = θ ,

where σ = (N − 1)α− r, ρ = −α((N + 2)α+ (N + 1)r), θ = −aα2 −α(a +cN)(α+ r). Using the substitution ξ = Bp − R, this inhomogeneoussecond-order linear differential equation with constant coefficients canbe transformed to the corresponding homogeneous equation,

ξ (t) + σ ξ (t) + ρξ (t) = 0,

with solution

ξ (t) = C1eλ1t + C2eλ2t,

where λ1,2 are the roots of the characteristic equation

λ2 + σλ+ ρ = 0 ⇔ λ1,2 = −σ ±√σ 2 − 4ρ2

,

which are identical to the eigenvalues of A determined earlier. Note thatsince ρ < 0, the roots λ1, λ2 are always real and distinct. Then,

p(t) = θ

ρ+ C1eλ1t + C2eλ2t,

where C1 = C1/ρ and C2 = C2/ρ. The control becomes

ui∗(t) = aρ− θ

ρN− C1

N

(1 + λ1

N

)eλ1t − C2

N

(1 + λ2

N

)eλ2t,

and the adjoint variable is

νi(t) = (N + 1)θ − aρNαρ

− c + C1

(N + 1 + λ1

N

)eλ1t + C2

(N + 1 + λ2

N

)eλ2t.

The transversality condition implies that

Page 323: Optimal Control Theory With Applications in Economics

310 Appendix B

e−rt(

(N + 1)θ − aρNαρ

− c)

C1

(N + 1 + λ1

n

)e(λ1−r)t

+ C2

(N + 1 + λ2

N

)e(λ2−r)t → 0, as t → ∞.

Note that λ1 < 0 (λ1 is the smallest root of the characteristic equation).Hence, e(λ1−r)t → 0 as t → ∞. The sign of λ2 − r is analyzed as

λ2 − r = −(α(N − 1) + r) +√(α(N − 1) + r)2 + 4α2(N + 2) + 8αr2

> 0.

Therefore, the transversality condition holds if and only if C2 = 0. Theinitial condition p(0) = p0 yields

C1 = p0 − θ

ρ,

where θ/ρ is equal to the steady state p∗. The resulting equilibriumprice,

p(t) = θ

ρ+(

p0 − θ

ρ

)eλ1t = p∗ + (p0 − p∗) eλ1t,

corresponds to the earlier solution. The equilibrium production outputbecomes

ui∗(t) = 1N

(a − θ

ρ

)− 1

N

(p0 − θ

ρ

)(1 + λ1

N

)eλ1t

= a − p∗

N+(

a − p0

N− a − p∗

N

)(1 + λ1

N

)eλ1t

= ui∗ +(

a − p0

N− ui∗

)(1 + λ1

N

)eλ1t

= ui∗ + (y0 − ui∗)eλ1t,

for all i ∈ N , where

y0 = a − p0

N+ λ1

N

(a − p0

N− ui∗

)

is firm i’s initial production as previously determined. �

d. First compute the feedback law corresponding to the open-loop Nashequilibrium determined for part c. By eliminating time from the relations

Page 324: Optimal Control Theory With Applications in Economics

Appendix B 311

for p∗(t) and ui∗(t), it is

ui − ui∗

y0 − ui∗ = p − p∗

p0 − p∗

for any (p, ui) on the equilibrium path. The corresponding feedbacklaw,

μi∗(t, p) = ui∗ + (y0 − ui∗)p − p∗

p0 − p∗ , ∀ i ∈ N

is affine in the price p and independent of time. To determine firm i’s bestresponse, assume that all other firms have affine production strategiesof the form

μ(p) = γ1 + γ2p,

where γ1, γ2 are appropriate constants. In that case, player i ∈ N solvesthe optimal control problem

Ji(ui) =∫ ∞

0e−rt(p(t)ui(t) − C(ui(t))) dt −→ max

ui(·)

s.t. p = α(a − p − ui − (N − 1)(γ1 + γ2p)),

p(0) = p0,

ui(t) ∈ [0, u], ∀ t ≥ 0.

The corresponding current-value Hamiltonian is

Hi(t, p, ui, ν i) = pui − cui − (ui)2

2

+ α(a − p − ui − (N − 1)(γ1 + γ2p))νi,

where νi denotes the current-value adjoint variable. Applying the PMPyields the following necessary optimality conditions for any optimalstate-control trajectory (p∗(t), ui∗(t)), t ≥ 0:

• Adjoint equation

ν i(t) = (α+α(N − 1)γ2 + r)ν i(t) − ui∗(t), ∀ t ≥ 0.

• Transversality

e−rtν i(t) → 0, as t → ∞.

Page 325: Optimal Control Theory With Applications in Economics

312 Appendix B

• Maximality

ui∗(t) = p∗(t) −αν i(t) − c ∈ arg maxui∈[0,u]

Hi(t, p∗(t), ui, νi(t)),

for all t ≥ 0.Substituting the firms’ symmetric closed-loop strategy profile in thestate equation, together with the initial condition p(0) = p0, yields

p∗(t) = a − Nγ1

1 + Nγ2+(

p0 − a − Nγ1

1 + Nγ2

)e−α(1+Nγ2)t, ∀ t ≥ 0.

The maximality condition implies that

νi = p∗ − ui∗ − cα

= (1 − γ2)p∗ − γ1 − cα

.

Differentiating this relation with respect to time and using the stateequation as well as the adjoint equation one obtains

1 − γ2

αp = (1 − γ2)(a − p∗ − N(γ1 + γ2p∗))

= (α+α(N − 1)γ2 + r)(1 − γ2)p∗ − γ1 − c

α− γ1 − γ2p∗.

The last equation holds on the entire (nonconstant) price path p∗(t), t ≥ 0,if and only if

γ2

1 − γ2− (2N − 1)γ2 − 2 = r

α,

and

(1 − γ2)(a − Nγ1) + c + γ1

α

(α+ r +α(N − 1)γ2

)+ γ1 = 0.

Thus,

γ2 = 2Nα− r − 4α±√(2Nα+ r)2 + 4α(α+ r)2α(2N − 1)

,

and

γ1 = −α(a + c) − rc +α(a − Nc + c)γ2

−α(N − 2) + r +α(2N − 1)γ2.

Note that because of the transversality condition and the explicit solu-tion for p∗(t), one must have that 1 + Nγ2 ≥ 0. Consequently, thenegative solution for γ2 will not be acceptable, since

Page 326: Optimal Control Theory With Applications in Economics

Appendix B 313

1 + Nγ2 = 1 + N2Nα− r − 4α−√(2Nα+ r)2 + 4α(α+ r)

2α(2N − 1)

< 1 − N2α+ r

(2N − 1)α< 0.

Using the explicit solution for p∗(t) and u∗(t) = γ1 + γ2p∗(t), the long-runclosed-loop state-control tuple (p∗

c , u∗c ) is given by

p∗c = a − Nγ1

1 + Nγ2,

and

u∗c = γ1 + γ2

a − Nγ1

1 + Nγ2.

e. When the market becomes competitive, that is, as N → ∞, in all thesolutions (static, open-loop, and closed-loop) the price converges to cand aggregate production approaches a − c.

4.3 (Duopoly Pricing Game)

a. Given a strategy p j(t), t ≥ 0, for firm j ∈ {1, 2}, firm i �= j solves theoptimal control problem

Ji(pi|pj) =∫ ∞

0e−rtpi(t)[xi(t)(1 − xi(t) − xj(t))

× (α(xi(t) − xj(t)) − (pi(t) − pj(t)))]dt −→ maxpi(·)

s.t. xi = xi(1 − xi − xj)[α(xi − xj) − (pi(t) − p j(t))],xi(0) = xi0,

pi(t) ∈ [0, P], ∀ t ≥ 0,

where the constant P > 0 is a (sufficiently large) maximum willingnessto pay.

b. From the state equations for the evolution of x(t) = (xi(t), xj(t)), t ≥ 0,it is possible to eliminate time, since

dx2

dx1= x2(1 − x1 − x2)[α(x1 − x2) − (p1 − p2)]

x1(1 − x1 − x2)[α(x2 − x1) − (p2 − p1)] = −x2

x1,

so that by direct integration

Page 327: Optimal Control Theory With Applications in Economics

314 Appendix B

ln(

x2

x20

)=∫ x2

x20

dξ2

ξ2= −

∫ x1

x10

dξ1

ξ1= − ln

(x1

x10

),

which implies that

x1x2 = x10x20

along any admissible state trajectory. Thus, any admissible state tra-jectory x(t) of the game �(x0) moves along the curve C(x0) = {(x1, x2) ∈[0, 1]2 : x1x2 = x10x20}.c. The current-value Hamiltonian for firm i’s optimal control problem is

Hi(t, x, p, ν i) = (1 − xi − xj)(α(xi − xj) − (pi − pj))

× (xi(pi + ν ii ) − xjν

ij ),

where ν i = (νii , ν

ij ) denotes the current-value adjoint variable asso-

ciated with the state x = (xi, xj). The PMP yields the followingnecessary optimality conditions for any equilibrium state-control tra-jectory (x∗(t), p∗(t)), t ≥ 0:

• Adjoint equation

ν ii = rν i

i − (pi∗ + ν ii )[(1 − 2x∗

i − x∗j )(α(x∗

i − x∗j )

− (pi∗ − p j∗)) +αx∗i (1 − x∗

i − x∗j )]

− (p j∗ + νij )x

∗j [(p j − pi) −α(1 − 2x∗

i )],ν i

j = rν ij − (pi + νi

j )x∗i [(pi − p j) −α(1 − 2x∗

j )]− (p j + νi

j )[(1 − x∗i − 2x∗

j )(α(x∗j − x∗

i ))

− (p j − pi) +αx∗j (1 − x∗

i − x∗j )].

• Transversality

e−rtν i(t) → 0, as t → ∞.

• Maximality

pi∗ = 12[x∗

i (p j∗ − νii +α(x∗

i − x∗j )) − x∗

j (p j∗ + νij )].

To examine the qualitative behavior of the optimal state-control trajecto-ries it is useful to first examine any turnpikes that the model may have.Note first that

Page 328: Optimal Control Theory With Applications in Economics

Appendix B 315

xi = 0 ⇒⎧⎨⎩

either xi = 0,or xi + xj = 1,or α(xi − xj) = pi − pj.

Since x0 > 0 by assumption, the first point, xi = 0, does not belong tothe curve C(x0) determined for part b and therefore cannot be a steadystate. The second option implies two equilibria where the line xi + xj = 1intersects the curve C(x0). The third option,

α(xi − xj) = pi − pj, (B.10)

implies, together with the maximality condition, that

xi(pi + ν ii ) = −xj(pj + νi

j ). (B.11)

In addition, νii = νi

j = 0 implies that

(pi + ν ii )[(1 − 2xi − xj)(α(xi − xj) − (pi − p j) +αxi(1 − xi − xj))]

= rνii − (p j + ν i

j )xj[(p j − pi) −α(1 − 2xi)], (B.12)

and

(p j + νij )[(1 − xi − 2xj)(α(xj − xi)) − (p j − pi) +αxj(1 − xi − xj)]

= rνij − (pi + νi

j )xi[(pi − p j) −α(1 − 2xj)], (B.13)

so that combining (B.10)–(B.13) yields νii = νi

j = 0, and

pi = α(xi − xj)xj

xj + xi, pj = α(xj − xi)xj

xj + xi,

for all i, j ∈ {1, 2} with i �= j. Since the same holds for firm j, one canconclude that pi = pj = 0, and xi = xj. Taking into account the fact thatthe steady state lies on the curve C(x0), one obtains that

x = (xi, xj) =(

1 −√1 − 4xi0xj0

2,

1 +√1 − 4xi0xj0

2

).

Thus, the model has three stationary points: two of them are given by theintersections of C(x0) with the line xi + xj = 1, and the third stationarypoint lies at the intersection of C(x0) with the line xi = xj (figure B.16).The dynamics are such that optimal state trajectory lies on the curveC(x0).

For x10 = x20, the initial state coincides with a stationary point, so thesystem will remain at this state forever.

Page 329: Optimal Control Theory With Applications in Economics

316 Appendix B

Figure B.16Equilibrium phase trajectories.

Consider a small neighborhood of the stationary point x with xj > xi,and assume that the system starts in this neighborhood (at a point x0 �= xon C(x0)). Then xi < 0 and xj > 0, so the system moves along the curvefrom the third stationary point toward this first stationary point (seefigure B.16).

d. For a typical Nash-equilibrium state trajectory x∗(t), t ≥ 0, seefigure B.16.

e. When firm i starts out with a higher installed base, then it is bestfor firm j to charge nothing for its product, since it cannot prevent theerosion of its customer base. Charging zero is the best that firm i cando in order to have positive sales in the future (which, at least in thestandard open-loop equilibrium, will not happen). On the other hand,firm i is able to charge a positive price and still gain new customers,slowly spending its brand premium xi − xj (i.e., the difference betweenits installed base and firm j’s installed base).

Remark B.6 Using trigger-strategy equilibria it is possible to obtainperiodic equilibria in which the firms “pump” customers back andforth along C(x0), alternating between approaching neighborhoods ofstationary points. �

4.4 (Industrial Pollution)It is convenient to consider throughout the general case where

(γ 1, γ 2) ∈ {0, 1}2.

a. In the static version of the duopoly game the state of the system isnot evolving, so necessarily

Page 330: Optimal Control Theory With Applications in Economics

Appendix B 317

q10 + q2

0 = βx0

and

ui0 = δyi0

for all i ∈ {1, 2}, where (x0, yi0) and (qi0, ui

0) correspond to the stationaryvalues of the state and control variables. Given firm j’s stationary strat-egy (qj

0, uj0), firm i ∈ {1, 2} \ {j} chooses a (set of) best response(s) that

maximizes its total payoff,

BRi(q j0) = arg max

qi0∈[0,u/δ]

{(1 − qi

0 − qj0)qi

0 − ciqi0 − γ i (qi

0 + qj0)2

2β2 − κδ2(qi

0)2

2

},

where the fact that qi0 = yi0 is used because any overcapacity is not in

firm i’s best interest. Carrying out the straightforward maximizationyields

BRi(qj0) =

⎧⎪⎨⎪⎩min

⎧⎪⎨⎪⎩[1 − ci −

(1 + γ i

β2

)qj

0

]+

2 + γ i

β2 + κδ2,

⎫⎪⎬⎪⎭⎫⎪⎬⎪⎭ .

At a Nash equilibrium (qi∗0 , qj∗

0 ) of the static duopoly game, qi∗0 ∈ BRi(qj∗

0 ),or equivalently,

qi∗0 = min

⎧⎪⎨⎪⎩⎡⎣ (1 − ci)

(2 + γ j

β2 + κδ2)

− (1 − cj)(

1 + γ i

β2

)(

2 + γ i

β2 + κδ2) (

2 + γ j

β2 + κδ2)

−(

1 + γ i

β2

) (1 + γ j

β2

)⎤⎦

+

,

1 − ci

2 + γ i

β2 + κδ2

⎫⎬⎭ ,

provided that the (by assumption large) upper bound u on the capacity-expansion rate (here only used for maintenance) is not binding. Firm i’sequilibrium expansion rate is ui∗

0 = δqi∗0 . In addition, at an interior Nash

equilibrium the pollution level is

x∗0 = qi∗

0 + qj∗0

β

= 1β

(1 + κδ2)(2 − ci − cj)(2 + γ i

β2 + κδ2) (

2 + γ j

β2 + κδ2)

−(

1 + γ i

β2

) (1 + γ j

β2

) ,

and firm i’s production capacity is

Page 331: Optimal Control Theory With Applications in Economics

318 Appendix B

y∗i0 = qi∗

0 .

b. Given an admissible initial state (x0, y0), the differential game�(x0, y0)is defined by player i’s optimal control problem, given player j’s strat-egy. To keep the solution simple, assume that y0 is sufficiently small sothat qi = yi, that is, the firm produces at its full capacity.

J(ui|uj) −→ maxui(·)

s.t. x(t) = yi(t) + yj(t) −βx(t), x(0) = x0,yi(t) = ui(t) − δyi(t), yi(0) = yi

0,yj(t) = uj(t) − δyj(t), yj(0) = yj

0,

ui(t) ∈ [0, u], ∀ t ≥ 0,

where

J(ui|uj) =∫ ∞

0e−rt

((1 − yi(t) − yj(t) − ci)yi(t) − κ(ui(t))2 + γ ix2(t)

2

)dt.

The current-value Hamiltonian corresponding to player i’s optimalcontrol problem is

Hi(x, y, u, νi) = (1 − yi − yj − ci)yi − κ(ui)2 + γ ix2(t)2

+ νix(yi + yj −βx) + ν i

i (ui − δyi) + νi

j (uj − δyj),

where νi = (ν ix, ν i

i , νij ) is the current-value adjoint variable associated

with the state (x, yi, yj). The PMP yields the following necessary opti-mality conditions for an open-loop Nash-equilibrium state-controltrajectory (x∗(t), y∗(t), u∗(t)), t ≥ 0:

• Adjoint equation

ν ix(t) = (r +β)ν i

x(t) + γ ix∗(t),

νii (t) = (r + δ)νi

i (t) − νix(t) + 2y∗

i (t) + y∗j (t) − (1 − ci),

ν ij (t) = (r + δ)νi

j (t) − νix(t) + y∗

i (t),

for all t ≥ 0.

• Transversality

limt→∞ e−rt(νi

x(t), νii (t), ν

ij (t)) = 0.

Page 332: Optimal Control Theory With Applications in Economics

Appendix B 319

• Maximality

ui∗(t) ∈ arg maxui∈[0,u]

Hi(x∗(t), y∗(t), (ui, uj∗(t)), ν i(t)),

for all t ≥ 0, so

ui∗(t) = min

{ν i

i (t)κ

, u

}, ∀ t ≥ 0.

The resulting combined linear Hamiltonian system for both players(which includes the maximality condition) is

z = Az − b,

where

z =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

x∗

y∗i

y∗j

ν ixν i

iν i

j

νj

x

νj

i

νj

j

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, b =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0000

1 − ci

00

1 − cj

0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

,

and

A =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−β 1 1 0 0 0 0 0 00 −δ 0 0 1/κ 0 0 0 00 0 −δ 0 0 0 0 0 1/κγ i 0 0 r +β 0 0 0 0 00 2 1 0 r + δ 0 0 0 00 1 0 −1 0 r + δ 0 0 0γ j 0 0 0 0 0 r +β 0 00 0 1 0 0 0 −1 r + δ 00 1 2 0 0 0 0 0 r + δ

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

.

The equilibrium turnpike (x∗, y∗, u∗, ν i, ν j)′ = z of the Hamiltonian sys-tem is determined by

z = A−1b.

Page 333: Optimal Control Theory With Applications in Economics

320 Appendix B

Omitting the expressions for the turnpike values of the adjoint variablesone obtains

x∗ = 2 − ci − cj

(3 + δκr + δ2κ)β,

y∗i = 1 − 2ci + cj − δ(δ+ r)(1 − ci)κ

3 + 4δκr + δ2(κ2r2 + 4κ) + 2δ3κ2r + δ4κ2 ,

y∗j = 1 − 2cj + ci − δ(δ+ r)(1 − cj)κ

3 + 4δκr + δ2(κ2r2 + 4κ) + 2δ3κ2r + δ4κ2 ,

and thus ui∗ = δy∗i and uj∗ = δy∗

j . The eigenvaluesλ1, . . . , λ9 of the systemmatrix A are all real and such that

λi ∈{

−β, r +β, r + δ,κr ± √

κ2r2 + 4κ2rδ+ 4κ + 4δ2κ2

2κ,

κr ± √κ2r2 + 4κ2rδ+ 12κ + 4δ2κ2

};

except for r +β and r + δ the multiplicity of all eigenvalues is 1.Since some of the eigenvalues are positive and some are negative, theturnpike z, which is an equilibrium of the linear system

z = Az − b = A(z − z),

is a saddle point. This insight is consistent with the transversality con-dition and provides conditions for obtaining the initial values for theadjoint variables. Indeed, the initial values (including the six missingones) have to be chosen such that the starting vector z0 is orthogonal toall eigenvectors associated with the six positive eigenvalues. The cor-responding six conditions provide the six missing components of theinitial value z(0) = z0 (those for the adjoint variable), and then

z(t) = z + (z0 − z)eAt, (B.14)

which converges to z as t → ∞. Explicitly, using the eigenvaluedecomposition of A, equation (B.14) for [x∗(t), y∗

i (t), y∗j (t)]′ becomes⎡

⎢⎢⎣x∗(t) − x∗

y∗i (t) − y∗

i

y∗j (t) − y∗

j

⎤⎥⎥⎦ =

⎡⎢⎢⎣

0 − (2β+r)(r+δ+β)γ j

2κ(λ3+δ)(λ3+β)

−1κ(λ2+δ) 0 1

κ(λ3+δ)1

κ(λ2+δ) 0 1κ(λ3+δ)

⎤⎥⎥⎦⎡⎢⎢⎣ρ1eλ1t

ρ2eλ2t

ρ3eλ3t

⎤⎥⎥⎦ ,

Page 334: Optimal Control Theory With Applications in Economics

Appendix B 321

where

ρ1 = 12κ(λ2 + δ)(yj0 − y∗

j − (yi0 − y∗i )),

ρ2 =γ2(yi0 − y∗

i ) − λ3(x0 − x∗) + (yj0 − y∗j ) −β(x0 − x∗)

(λ3 +β)(rδ+ r2 + 2βδ+ 2β2) + 3βr(λ3 +β2),

ρ3 = 12κ(λ3 + δ)λ3(yi0 − y∗

i + yj0 − y∗j ),

and

λ1 = κr − √κ2r2 + 4κ2rδ+ 4κ + 4δ2κ2

2κ,

λ2 = −β,

λ3 = κr − √κ2r2 + 4κ2rδ+ 12κ + 4δ2κ2

2κ.

c. It is now possible to derive a closed-loop Nash equilibrium of�(x0, y0)in strategies that are affine in the players’ capacities, such that

ui∗(t) = μi(x∗(t), y∗(t)) = αi0 −αi

xx(t) −αii y

∗i (t) −αi

jy∗j (t),

where αi0,αi

x,αii ,α

ij are appropriate coefficients. Firm i’s corresponding

current-value Hamiltonian becomes

Hi(x, y, ui, ν i) = (1 − yi − yj − ci)yi − κ(ui)2 + γ ix2(t)2

+ νix(yi + yj −βx) + ν i

i (ui − δyi)

+ ν ij (μ

j(x, y) − δyj),

where, as in part b, ν i = (ν ix, ν i

i , νij ) is the current-value adjoint vari-

able associated with the state (x, yi, yj). The PMP yields the followingnecessary optimality conditions for a closed-loop Nash-equilibriumstate-control trajectory (x∗(t), y∗(t), u∗(t)), t ≥ 0:

• Adjoint equation

ν ix(t) = (r +β)ν i

x(t) + γ ix∗(t) +αjxν

ij (t),

ν ii (t) = (r + δ)ν i

i (t) − νix(t) + 2y∗

i (t) + y∗j (t) +α

jiν

ij (t) − (1 − ci),

ν ij (t) = (r + δ+α

jj )ν

ij (t) − ν i

x(t) + y∗i (t),

Page 335: Optimal Control Theory With Applications in Economics

322 Appendix B

for all t ≥ 0.

• Transversality

limt→∞ e−rt(νi

x(t), νii (t), ν

ij (t)) = 0.

• Maximality

ui∗(t) ∈ arg maxui∈[0,u]

Hi(x∗(t), y∗(t), ui, ν i(t)),

for all t ≥ 0, so that

ui∗(t) = min

{ν i

i (t)κ

, u

}, ∀ t ≥ 0.

The combined linear Hamiltonian system for both players (includingthe maximality condition) becomes

z = Az − b,

where z and b are as in part b and

A =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

−β 1 1 0 0 0 0 0 00 −δ 0 0 1/κ 0 0 0 00 0 −δ 0 0 0 0 0 1/κγ i 0 0 r +β 0 α

jx 0 0 0

0 2 1 0 r + δ αji 0 0 0

0 1 0 −1 0 r + δ+αjj 0 0 0

γ j 0 0 0 0 0 r +β αix 0

0 0 1 0 0 0 −1 r + δ+αii 0

0 1 2 0 0 0 0 αij r + δ

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

.

The equilibrium turnpike (x∗, y∗, u∗, ν i, ν j)′ = z of the Hamiltonian sys-tem is determined by

z = A−1b.

The detailed analysis of the closed-loop equilibrium turns out to becomplicated but can be sketched as follows. It is convenient to usethe Laplace transform because instead of a linear system of ODEsit allows one to consider an equivalent algebraic system. Taking theLaplace transform of the initial value problem z = Az − b, z(0) = z0

yields

sZ − z0 = AZ − b,

Page 336: Optimal Control Theory With Applications in Economics

Appendix B 323

where Z(s) = L[z](s) = ∫∞0 e−stz(t) dt for s ∈ C is the Laplace transform

of z( · ). Hence,

Z = (A − sI)−1(b − z0),

which yields that

Nii (s) ≡ L[ν i

i ](s) = [0, 0, 0, 0, 1, 0, 0, 0, 0](A − sI)−1(b − z0)

= L[κui∗](s) = κL[αi0 +αi

xx +αiiyi +αi

j yj](s),

or equivalently,

αi0

s+ [αi

x,αii ,α

ij , 0, −1/κ , 0, 0, 0, 0](A − sI)−1(b − z0) = 0,

for all s �= 0 (as long as convergent). An analogous relation is obtainedfor Nj

j (s) = κL[αj0 +α

jxx +α

ji yi +α

jjyj](s):

αj0

s+ [αj

x,αji ,α

jj , 0, 0, 0, 0, 0, −1/κ](A − sI)−1(b − z0) = 0,

for all s �= 0 (as long as convergent). If one can determine α (possiblytogether with the missing initial conditions in z0), then a linear closed-loop equilibrium exists. Yet, the corresponding algebraic calculationsturn out to be complicated and are therefore omitted.

d. To illustrate the differences between the static and open-loop Nashequilibria, compare the respective steady states in the symmetric casewith δ = 0, as discussed in part c. In that case the static solution becomes

x∗0 = 2(1 − c)

3β + (1/β), y∗

i0 = 1 − c3 + (1/β2)

,

and the open-loop solution is given by

x∗∣∣open-loop

= 2(1 − c)3β

, y∗i

∣∣open-loop

= 1 − c3

.

Because of the time consistency of the open-loop equilibrium comparedto the stationary solution, which is not a Nash equilibrium, output islarger and thus pollution levels higher in the long run.

e. The static and open-loop equilibria were derived for the general casewith (γ 1, γ 2) ∈ {0, 1}2 in parts a and b, respectively. It is important tonote that the long-run steady state in the open-loop equilibrium is infact independent of γ 1, γ 2.

Page 337: Optimal Control Theory With Applications in Economics

324 Appendix B

f. When firms lack the ability to commit to their production paths, theaggregate production increases. While this is generally good for mar-kets, as prices become more competitive, decreasing a deadweight lossin the economy, the drawback is that pollution levels are also increasing.In order to reduce publicly harmful stock pollution levels, a regulatorneeds to find a way to force firms to internalize at least a portion of thesocial costs of pollution. This could be achieved by a tax on the firms’output or a cap-and-trade market, for instance.

B.5 Mechanism Design

5.1 (Screening)

a. Assume without any loss of generality that the total mass of con-sumers is 1, divided into masses μ of type-H consumers and 1 −μ oftype-L consumers. In order to generate a positive utility for consumers,(1 − x)2 must be less than 1, which implies that x ∈ [0, 2]. Further observethat since customers care only about the quality through (1 − x)2 andthat a higher quality is more costly to the producer than a lower one, theonly economically meaningful qualities are x ∈ [0, 1]. If a single cameramodel is produced, there are two cases: either all consumers buy it, oronly type-H consumers buy it. In the former case, for a given quality xm,set the price so that type-L consumers get zero utility, that is, tm = θL(1 −(1 − xm)2)/2. Since the consumers’ mass sums to 1, the profit becomes

θL(1 − (1 − xm)2)

2− cxm;

it is maximized at

xm = 1 − c/θL.

The corresponding price and profit are

tm = θL

2

[1 − c2

θ2L

]

and

�m = (θL − c)2

2θL.

In the second case, to sell only to type-H consumers, raise the price totm = θH(1 − (1 − xm)2)/2, setting the utility of the high-type consumersto zero. The profit is then (since there is a mass μ of type-H consumers)

Page 338: Optimal Control Theory With Applications in Economics

Appendix B 325

μ

[θH

(1 − (1 − xm)2)2

− cxm]

,

which is maximized at

xm = 1 − c/θH .

The corresponding price and profit are

tm = θH

2

[1 − c2

θ2H

]and

�m = μ(θH − c)2

2θH.

In particular, shutdown is avoided as long as

μ ≤ θH

θL

(θL − cθH − c

)2

.

b. If one can perfectly distinguish consumers types, set their respec-tive utilities to zero by letting ti = θi(1 − (1 − xi)2)/2, and optimize withrespect to xi. In part a, this corresponds to

xL = 1 − c/θL, tL = θL

[1 − c2

θ2L

],

xH = 1 − c/θH , tH = θH

[1 − c2

θ2H

].

The resulting profit is

� = μ(θH − c)2

2θH+ (1 −μ)

(θL − c)2

2θL,

clearly higher than the profit derived earlier in part a.

c. Observe that the utility function u(t, x, θ ) = θ (1 − (1 − x)2)/2 − tsatisfies the Spence-Mirrlees condition (5.6), since ∂2u

∂x∂θ = θ (1 − x) ≥ 0.Therefore (see section 5.3), at the optimal contract,

• the individual-rationality constraint of type L is binding,

t∗L = θL(1 − (1 − x∗

L)2)2

; (IR-L)

Page 339: Optimal Control Theory With Applications in Economics

326 Appendix B

• the incentive-compatibility constraint of type H is binding,

θH(1 − (1 − x∗

H)2)2

− t∗H = θH(1 − (1 − x∗

L)2)2

− t∗L; (IC-H)

• x∗H = xH (where xH was determined in part b);

• the individual-rationality constraint of type H and the incentive-compatibility constraint of type-L consumers can be ignored.

Moreover,

x∗L ∈ arg max

xL∈[0,1]

{u(xL, θL) − cxL − μ

1 −μ

(u(xL, θH) − u(xL, θL)

)},

which in the current setting yields

x∗L = 1 − c

θL − μ

1−μ (θH − θL).

Observe that the no-shutdown condition μ ≤ θL−cθH−c is equivalent to

μ

1−μ (θH − θL) < θL − c, which has two consequences. First, the denom-inator, θL − μ

1−μ (θH − θL), in the expression for x∗L is nonnegative. This,

together with the fact that μ

1−μ (θH − θL) is nonnegative, implies thatx∗

L ≤ xL, since xL = 1 − cθL

. This result was expected, given the resultsin section 5.3. Second, the denominator is in fact greater than c, whichimplies that x∗

L is nonnegative, so that under the no-shutdown condi-tion of part a, screening implies that both types of consumers will be ableto buy cameras. From x∗

L, the prices can be easily computed. t∗L is givenby (IR-L):

t∗L = θL

2

(1 − c2

[θL − μ

1−μ (θH − θL)]2

).

Similarly, t∗H is given by (IC-H):

t∗H = θH1 − (1 − xH)2

2− (θH − θL)(1 − (1 − x∗

L)2)

= tH − (θH − θL)(1 − (1 − x∗L)2) < tH .

Therefore, type-H consumers receive an information rent, since they arecharged a lower price. The second-best profit is

�∗ = μ(t∗H − cx∗H) + (1 −μ)(t∗L − cx∗

L),

which, after some computation, can be rewritten as

Page 340: Optimal Control Theory With Applications in Economics

Appendix B 327

�∗ = μ{(tH − cxH) − [(θH − θL)(1 − (1 − x∗L)2)]}

+ (1 −μ){(tL − cxL) − [(xL − x∗L)(c + θL(xL − x∗

L))]},(B.15)

from which it is clear that �∗ ≤ �. It can also verified that �∗ ≥ �m =(θL − c), the last equality holding when both consumer types can buycameras. From (B.15),

�∗ − (θL − c) = μ(t∗H − cx∗H) −μ(θL − c)

− (1 −μ)((xL − x∗L)(c + θL(xL − x∗

L)))

= (1 −μ)[β(1 − xL)2 − c

θL −β(c + θL(xL − x∗

L))]

,

(B.16)

where β = μ

1−μ (θH − θL). Using the fact that 1 − x∗L = c

θL+ cβ

θL−β and

xL − x∗L = cβ

θL−β , expression (B.16) is nonnegative if and only if

(cθL

+ cβθL −β

)2

− cθL −β

(c + θL

cβθL −β

)≥ 0,

or equivalently, after simplifications,

(1 − θL)(θL −β) ≥ 0,

which is true, since β ≤ θL − c. Thus, it has been verified that �∗ ≥ �m.When μ >

θL−cθH−c , the market for low-type consumers shuts down, and

the profits �∗ and �m are equal, with value μ(θH − c).

d. Referring to section 5.3,

�(x, θ ) = S(x, θ ) − uθ (x, θ )(1 − F(θ ))/f (θ ),

where f (θ ) ≡ 1, F(θ ) ≡ θ ,

S(x, θ ) = θ1 − (1 − x)2

2− cx,

and

uθ (x, θ ) = 1 − (1 − x)2

2.

Therefore, x∗(θ ) maximizes

(2θ − 1)1 − (1 − x2)

2− cx,

which yields

Page 341: Optimal Control Theory With Applications in Economics

328 Appendix B

x∗(θ ) ={

1 − c/(2θ − 1) if θ ≥ (1 + c)/2,0 otherwise.

With this, the optimal nonlinear pricing schedule becomes

t∗(θ ) =∫ θ

(1+c)/2

(ux(x∗(ϑ),ϑ)

dx∗(ϑ)dϑ

)dϑ

={

14 + c

2 −(

4θ−14(2θ−1)2

)c2 if θ ≥ (1 + c)/2,

0 otherwise.

Note that for c ≥ 1 a complete shutdown would occur, with the firm notproviding any product to any type.

5.2 (Nonlinear Pricing with Congestion Externalities)

a. By the revelation principle, the principal can restrict attention to directrevelation mechanisms M = (�, ρ) with message space � (equal to thetype space) and allocation function ρ = (t, x) : � → R × [0, 1], whichsatisfy the incentive-compatibility constraint

θ ∈ arg maxθ∈�

U(t(θ ), x(θ ), y, θ ) = arg maxθ∈�

{θx(θ )(1 −αy) − t(θ )}, (B.17)

for all θ ∈ �, where

y =∫ 1

0x(θ ) dF(θ ).

The first-order condition corresponding to agent θ ’s problem of sendinghis optimal message in (B.17) is

Ut(t(θ ), x(θ ), y, θ )t(θ ) + Ux(t(θ ), x(θ ), y, θ )x(θ ) = − t(θ ) + θ (1 −αy)x(θ )

= 0,

for all θ ∈ �. As shown in Weber (2005b), this first-order conditionand the single-crossing condition,

(θ − θ )(Uθ (t(θ ), x(θ ), y, θ ) − Uθ (t(θ ), x(θ ), y, θ ))

= (θ − θ )(x(θ ) − x(θ ))(1 −αy) ≥ 0,

for all θ , θ ∈ �, together are equivalent to the incentive-compatibilityconstraint (B.17). The single-crossing condition can be simplified to(θ − θ )(x(θ ) − x(θ )) ≥ 0 for all θ , θ ∈ � (since in general αy < 1), whichis equivalent to x(θ ) being nondecreasing on the type space�. Thus, the

Page 342: Optimal Control Theory With Applications in Economics

Appendix B 329

principal’s mechanism design problem can be formulated equivalentlyin terms of the following optimal control problem:

J(u) =∫ 1

0

(t(θ ) − c(x(θ ))2

2

)f (θ ) dθ −→ max

u(·)

s.t. t = θ (1 −αy)u, t(0) = 0,

x = u, x(0) = 0,

z = xf (θ ), (z(0), z(1)) = (0, y),

u(θ ) ∈ [0, u],∀ θ ∈ [0, 1],where u > 0 is a given (large) control bound.

b. The Hamiltonian for the optimal control problem formulated inpart a. is

H(θ , t, x, z, u,ψ) =(

t − cx2

2

)f (θ ) +ψtθ (1 −αy)u +ψxu +ψzxf (θ ),

where ψ = (ψt,ψx,ψz) is the adjoint variable associated with thestate (t, x, z) of the system. The PMP provides the following nec-essary optimality conditions, satisfied by an optimal state-controltrajectory (t∗(θ ), x∗(θ ), z∗(θ )), θ ∈ [0, 1]:• Adjoint equation

−ψt(θ ) = f (θ ),

−ψx(θ ) = (ψz(θ ) − cx∗(θ )

)f (θ ),

−ψz(θ ) = 0,

for all θ ∈ [0, 1].• Transversality

(ψt(1),ψx(1),ψz(1)) = 0.

• Maximality

u∗(θ ) ∈ arg maxu∈[0,u]

{(ψt(θ )θ (1 −αy) +ψx(θ ))u},

for (almost) all θ ∈ [0, 1].From the adjoint equation and transversality condition it follows im-mediately that

Page 343: Optimal Control Theory With Applications in Economics

330 Appendix B

ψt(θ ) = 1 − F(θ ), ψx(θ ) = −c∫ 1

θ

x∗(ϑ) dF(ϑ), ψz(θ ) = 0,

for all θ ∈ [0, 1]. The maximality condition yields that

u∗(θ ) ∈⎧⎨⎩

{u} if (1 −αy)θ ψt(θ ) +ψx(θ ) > 0,[0, u] if (1 −αy)θ ψt(θ ) +ψx(θ ) = 0,{0} otherwise,

for all θ ∈ [0, 1]. Because of the linearity of the Hamiltonian in thecontrol u, first examine the possibility of a singular control, that is,the situation where (1 −αy)θψt(θ ) +ψx(θ ) ≡ 0 on an interval [θ0, 1], forsome θ0 ∈ [0, 1) that represents the lowest participating type. This isequivalent to

(1 −αy)θ (1 − F(θ )) = c∫ 1

θ

x∗(ϑ) dF(ϑ), ∀ θ ∈ [θ0, 1].

Differentiating with respect to θ yields that

x∗(θ ) = 1 −αyc

(θ − 1 − F(θ )

f (θ )

), ∀ θ ∈ [θ0, 1].

The latter satisfies the control constraint x∗(θ ) = u∗(θ ) ≥ 0 as long as thefunction

θ − 1 − F(θ )f (θ )

is nondecreasing, which is a standard distributional assumption inmechanism design. For simplicity, this is assumed here.21 In addition,x∗(θ ) ≥ 0 is needed, so (omitting some of the details),

x∗(θ ) = 1 −αyc

[θ − 1 − F(θ )

f (θ )

]+

={

1−αyc

(θ − 1−F(θ )

f (θ )

)if θ ∈ [θ0, 1],

0 otherwise,

21. For example, this assumption is satisfied by any distribution F that has a nonde-creasing hazard rate, such as the normal, exponential, or uniform distributions. Whenthis assumption is not satisfied, then based on the maximality condition, it becomesnecessary to set u(θ ) on one (or several) type intervals, which is usually referred to asironing in the mechanism design literature. Ironing leads to bunching of types, i.e., dif-ferent types are pooled together, and all obtain the same bundle for the same price. Forbunched types, the principal does not obtain full revelation; the types are not separatedby a revenue-maximizing screening mechanism.

Page 344: Optimal Control Theory With Applications in Economics

Appendix B 331

for all θ ∈ [0, 1]. The missing constants θ0 and y are such that

θ0 − 1 − F(θ0)f (θ0)

= 0,

and (because y = ∫ 10 x∗(ϑ) dF(ϑ))

y =∫ 1θ0

(θ − 1−F(θ )

f (θ )

)dF(θ )

c +α∫ 1θ0

(θ − 1−F(θ )

f (θ )

)dF(θ )

.

In addition,

t∗(θ ) = (1 −αy)∫ θ

θ0

ϑ x∗(ϑ) dϑ

= (1 −αy)2

c

∫ θ

θ0

ϑ

(1 − d

dϑ1 − F(ϑ)

f (ϑ)

)dϑ .

Remark B.7 If the agent types are distributed uniformly on � = [0, 1],so that F(θ ) ≡ θ , then θ0 = 1/2, and

y =∫ 1

1/2 (2θ − 1) dθ

c +α∫ 1

1/2 (2θ − 1) dθ= 1α+ 4c

,

whence

x∗(θ ) = [2θ − 1]+c + (α/4)

and t∗(θ ) = c[θ2 − (1/4)

]+

(c + (α/4))2 ,

for all θ ∈ [0, 1]. �

c. The taxation principle states that given an optimal screening con-tract (t∗(θ ), x∗(θ )), θ ∈ �, the optimal price-quantity schedule τ ∗(x),x ∈ [0, 1], can be found as follows:

τ ∗(x) ={

t∗(θ ) if ∃ θ ∈ � s.t. x∗(θ ) = x,∞ otherwise.

Given that x∗(θ ) is nondecreasing on�, letϕ(x) be its (set-valued) inversesuch that

θ ∈ ϕ(x∗(θ )), ∀ θ ∈ �.

By incentive compatibility,

∀ x ∈ x∗(�) : θ , θ ∈ ϕ(x) ⇒ t∗(θ ) = t∗(θ ).

Page 345: Optimal Control Theory With Applications in Economics

332 Appendix B

Hence, the optimal price-quantity schedule τ ∗(x) is single-valued, and

τ ∗(x) ∈ t∗(ϕ(x)), ∀ x ∈ x∗(�),

whereas τ ∗(x) = ∞ for all x /∈ x∗(�).

Remark B.8 A uniform type distribution (see remark B.7) results inx∗(�) = [0, 1/(c +α/4)] and

ϕ(x) ={{ 1

2

(1 + (c + α

4

)x)}

if 0 < x ≤ 1c+(α/4) ,

[0, 12 ] if x = 0.

This implies that

τ ∗(x) ={

c/4(c+(α/4))2

((1 + (c + α

4

)x)2 − 1

)if 0 < x ≤ 1

c+(α/4) ,

∞ otherwise.

d. Clearly, an increase in all agents’ sensitivity α to the congestion exter-nality decreases any type θ ’s consumption x∗(θ ;α) as well as the totaltransfer t∗(θ ;α) that this type pays to the principal. Somewhat less obvi-ous is the fact that for a given bandwidth, the optimal price-quantityschedule τ ∗(x;α) is also decreasing inα (which can be shown by straight-forward differentiation, at least for the simple example discussed in thepreceding remarks).

Page 346: Optimal Control Theory With Applications in Economics

Appendix C: Intellectual Heritage

Page 347: Optimal Control Theory With Applications in Economics

Figure C.1Source: Mathematics Genealogy Project, http://genealogy.math.ndsu.nodak.edu.

Page 348: Optimal Control Theory With Applications in Economics

References

Abreu, D., P. K. Dutta, and L. Smith. 1994. The Folk Theorem for Repeated Games: A NEUCondition. Econometrica 62 (2): 939–948.

Acemoglu, D. 2009. Introduction to Modern Economic Growth. Princeton, N.J.: PrincetonUniversity Press.

Afanas’ev, A. P., V. V. Dikusar, A. A. Milyutin, and S. V. Chukanov. 1990. NecessaryConditions in Optimal Control. Moscow: Nauka. In Russian.

Aghion, P., and P. Howitt. 2009. The Economics of Growth. Cambridge, Mass.: MIT Press.

Akerlof, G. A. 1970. The Market for “Lemons”: Quality Uncertainty and the MarketMechanism. Quarterly Journal of Economics 84 (3): 488–500.

Aleksandrov, A. D., A. N. Komogorov, and M. A., Lavrentyev. 1969. Mathematics: Its Con-tent, Methods, and Meaning. Cambridge, Mass.: MIT Press. Reprinted: Dover Publications,Mineola, N.Y., 1999.

Allee, W. C. 1931. Animal Aggregations: A Study in General Sociology. Chicago: Universityof Chicago Press.

———. 1938. The Social Life of Animals. London: Heinemann.

Ampère, A.-M. 1843. Essai sur la Philosophie des Sciences. Seconde Partie. Paris: Bachelier.

Anderson, B.D.O., and J. B. Moore. 1971. Linear Optimal Control. Englewood Cliffs, N.J.:Prentice-Hall.

Aoki, M. 2001. Toward a Comparative Institutional Analysis. Cambridge, Mass.: MIT Press.

Apostol, T. M. 1974. Mathematical Analysis. 2d ed. Reading, Mass.: Addison Wesley.

Armstrong, M. 1996. Multiproduct Nonlinear Pricing. Econometrica 64 (1): 51–75.

Armstrong, M. A. 1983. Basic Topology. New York: Springer.

Arnold, V. I. 1973. Ordinary Differential Equations. Cambridge, Mass.: MIT Press.

———. 1988. Geometrical Methods in the Theory of Ordinary Differential Equations. 2d ed.New York: Springer.

———. 1989. Mathematical Methods of Classical Mechanics. New York: Springer.

———. 1992. Catastrophe Theory. 3d ed. New York: Springer.

Page 349: Optimal Control Theory With Applications in Economics

336 References

———. 2000. “Zhestkie” i “Miagkie” Matematicheskie Modeli (“Rigid” and “Flexible” Modelsin Mathematics). Moscow: Mcnmo.

Arnold, V. I., V. S. Afrajmovich, Y. S. Il’yashenko, and L. P. Shil’nikov. 1999. BifurcationTheory and Catastrophe Theory. New York: Springer.

Arnold, V. I., and Y. S. Il’yashenko. 1988. Ordinary Differential Equations. In Encyclopediaof Mathematical Sciences, Vol. 1, ed. D. V. Anosov and V. I. Arnold, 1–148.

Arrow, K. J. 1968. Applications of Control Theory to Economic Growth. In Mathematics ofthe Decision Sciences. Part 2, ed. G. B. Dantzig and A. F. Veinott, 85–119. Lectures in AppliedMathematics, Vol. 12. Providence, R.I.: American Mathematical Society.

Arrow, K. J., and M. Kurz. 1970a. Optimal Growth with Irreversible Investment in aRamsey Model. Econometrica 38 (2): 331–344.

———. 1970b. Public Investment, the Rate of Return, and Optimal Fiscal Policy. Baltimore,Md.: Johns Hopkins Press.

Arutyunov, A. V. 1999. Pontryagin’s Maximum Principle in Optimal Control Theory.Journal of Mathematical Sciences 94 (3): 1311–1365.

———. 2000. Optimality Conditions: Abnormal and Degenerate Problems. Boston: Kluwer.

Aseev, S. M. 1999. Methods of Regularization in Nonsmooth Problems of DynamicOptimization. Journal of Mathematical Sciences 94 (3): 1366–1393.

———. 2009. Infinite-Horizon Optimal Control with Applications in Growth Theory.Lecture Notes. Moscow: Moscow State University; MAKS Press.

Aseev, S. M., and A. V. Kryazhimskii. 2007. The Pontryagin Maximum Principle andOptimal Economic Growth Problems. Proceedings of the Steklov Institute of Mathematics 257(1): 1–255.

Aubin, J.-P. 1998. Optima and Equilibria. 2d ed. New York: Springer.

Aumann, R. J. 1976. Agreeing to Disagree. Annals of Statistics 4 (6): 1236–1239.

Axelrod, R. 1984. The Evolution of Cooperation. New York: Basic Books.

Balder, E. J. 1988. Generalized Equilibrium Results for Games with Incomplete Informa-tion. Mathematics of Operations Research 13 (2): 265–276.

Banach, S. 1922. Sur les Opérations dans les Ensembles Abstraits et leur Application auxÉquations Intégrales. Fundamenta Mathematicae 3: 133–181.

Barbashin, E. A., and N. N. Krasovskii. 1952. On the Stability of Motion as a Whole.Doklady Akademii Nauk SSSR 86 (3): 453–456. In Russian.

Barrow-Green, J. 1997. Poincaré and the Three-Body Problem. History of Mathematics, Vol. 11.Providence, R.I.: American Mathematical Society.

Basar, T., and G. J. Olsder. 1995. Dynamic Noncooperative Game Theory. 2d ed. New York:Academic Press.

Bass, F. M. 1969. A New Product Growth Model for Consumer Durables. ManagementScience 15 (5): 215–227.

Page 350: Optimal Control Theory With Applications in Economics

References 337

Bass, F. M., T. V. Krishnan, and D. C. Jain. 1994. Why the Bass Model Fits without DecisionVariables. Marketing Science 13 (3): 203–223.

Bellman, R. E. 1953. Stability Theory of Differential Equations. New York: McGraw-Hill.

———. 1957. Dynamic Programming. Princeton, N.J.: Princeton University Press.

Bendixson, I. O. 1901. Sur les Courbes Définies par des Équations Différentielles. ActaMathematica 24 (1): 1–88.

Berge, C. 1959. Espaces Topologiques et Fonctions Multivoques. Paris: Dunod. English trans-lation: Topological Spaces, trans. E. M. Patterson. Edinburgh: Oliver and Boyd, 1963.Reprinted: Dover Publications, Mineola, N.Y., 1997.

Bertsekas, D. P. 1995. Nonlinear Programming. Belmont, Mass.: Athena Scientific.

———. 1996. Neuro-Dynamic Programming. Belmont, Mass.: Athena Scientific.

———. 2007. Dynamic Programming and Optimal Control. 2 vols. Belmont, Mass.: AthenaScientific.

Besicovitch, A. S. 1928. On Kakeya’s Problem and a Similar One. Mathematische Zeitschrift27 (1): 312–320.

Blåsjö, V. 2005. The Isoperimetric Problem. American Mathematical Monthly 112 (6): 526–566.

Bluman, G. W., and S. Kumei. 1989. Symmetries and Differential Equations. New York:Springer.

Boltyanskii, V. G. 1958. The Maximum Principle in the Theory of Optimal Processes.Doklady Akademii Nauk SSSR 119 (6): 1070–1073. In Russian.

———. 1994. The Maximum Principle: How It Came to Be. Report No. 526, TechnicalUniversity Munich, Germany. Also in Boltyanski[i], Martini, and Soltan 1998, 204–230.

Boltyanski[i], V. G., H. Martini, and P. S. Soltan. 1998. Geometric Methods and OptimizationProblems. New York: Springer.

Border, K. C. 1985. Fixed Point Theorems with Applications to Economics and Game Theory.Cambridge: Cambridge University Press.

Boyd, S., and L. Vandenberghe. 2004. Convex Optimization. Cambridge: CambridgeUniversity Press.

Brinkhuis, J., and V. Tikhomirov. 2005. Optimization: Insights and Applications. Princeton,N.J.: Princeton University Press.

Bronshtein, I. N., K. A. Semendyayev, G. Musiol, and H. Muehlig. 2004. Handbook ofMathematics. 4th ed. New York: Springer.

Buckingham, E. 1914. On Physically Similar Systems: Illustrations of the Use ofDimensional Equations. Physical Review 4 (4): 345–376.

Bulow, J. 1982. Durable Goods Monopolists. Journal of Political Economy 90 (2): 314–332.

Byrne, O. 1847. The First Six Books of the Elements of Euclid. London: Pickering.

Cajori, F. 1919. A History of the Conceptions of Limits and Fluxions in Great Britain. London:Open Court Publishing.

Page 351: Optimal Control Theory With Applications in Economics

338 References

Campbell, D. M., and J. C. Higgins, eds. 1984. Mathematics: People, Problems, Results. 3vols. Belmont, Calif.: Wadsworth.

Cantor, M. B. 1907. Vorlesungen über Geschichte der Mathematik. Erster Band. 3d ed. Leipzig:B.G. Teubner.

Carlson, D. A., A. B. Haurie, and A. Leizarowitz. 1991. Infinite Horizon Optimal Control. 2ded. New York: Springer.

Cass, D. 1965. Optimum Growth in anAggregative Model of CapitalAccumulation. Reviewof Economic Studies 32 (3): 233–240.

Cauchy, A.-L. 1824/1913. Oeuvres Complètes. Deuxième Série. Tome XI. Académie desSciences, Ministère de l’Instruction Publique. Paris: Gauthiers-Villars.

Cesari, L. 1973. Closure Theorems for Orientor Fields. Bulletin of the American MathematicalSociety 79 (4): 684–689.

———. 1983. Optimization Theory and Applications: Problems with Ordinary DifferentialEquations. New York: Springer.

Cetaev, N. G. 1934. Un Théoreme sur l’Instabilité. Doklady Akademii Nauk SSSR 2: 529–534.

Cho, I.-K., and D. M. Kreps. 1987. Signaling Games and Stable Equilibria. Quarterly Journalof Economics 102 (2): 179–222.

Clarke, F. H. 1983. Optimization and Nonsmooth Analysis. New York: Wiley Interscience.

Clarke, F. H., Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski. 1998. Nonsmooth Analysis andControl Theory. New York: Springer.

Coase, R. H. 1972. Durability and Monopoly. Journal of Law and Economics 15 (1): 143–149.

Coddington, E. A., and N. Levinson. 1955. Theory of Ordinary Differential Equations. NewYork: McGraw-Hill.

Dikusar, V. V., and A. A. Milyutin. 1989. Qualitative and Numerical Methods in MaximumPrinciple. Moscow: Nauka. In Russian.

Dixit, A. K., and B. J. Nalebuff. 1991. Thinking Strategically. New York: Norton.

Dmitruk, A. V. 1993. Maximum Principle for a General Optimal Control Problem with Stateand Regular Mixed Constraints. Computational Mathematics and Modeling 4 (4): 364–377.

Dockner, E., S. Jørgensen, N. Van Long, and G. Sorger. 2000. Differential Games in Economicsand Management Science. Cambridge: Cambridge University Press.

Dorfman, R. 1969. An Economic Interpretation of Optimal Control Theory. AmericanEconomic Review 59 (5): 817–831.

Douglas, J. 1931. Solution of the Problem of Plateau. Transactions of the AmericanMathematical Society 33 (1): 263–321.

Dubovitskii, A.Y., andA.A. Milyutin. 1965. Extremum Problems in the Presence of Restric-tions. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 5 (3): 395–453. In Russian.English translation: U.S.S.R. Computational Mathematics and Mathematical Physics 5 (3):1–80.

Page 352: Optimal Control Theory With Applications in Economics

References 339

———. 1981. Theory of the Maximum Principle. In Methods of the Theory of ExtremalProblems in Economics, ed. V. L. Levin, 138–177. Moscow: Nauka. In Russian.

Dunford, N., and J. T. Schwartz. 1958. Linear Operators. Part I: General Theory. New York:Wiley Interscience.

Euler, L. 1744. Methodus Inveniendi Lineas Curvas Maximi Minimive Proprietate Gaudentes,Sive Solutio Problematis Isoperimetrici Latissimo Sensu Accepti. Geneva: M.M. Bousquet.

Feinstein, C. D., and D. G. Luenberger. 1981. Analysis of the Asymptotic Behavior ofOptimal Control Trajectories: The Implicit Programming Problem. SIAM Journal on Controland Optimization 19 (5): 561–585.

Filippov, A. F. 1962. On Certain Questions in the Theory of Optimal Control. SIAM Journalon Control Ser. A. 1 (1): 76–84.

———. 1988. Differential Equations with Discontinuous Righthand Sides. Dordrecht, Nether-lands: Kluwer.

Fomenko, A. T. 1990. The Plateau Problem: Historical Survey. New York: Gordon and Breach.

Fraser-Andrews, G. 1996. Shooting Method for the Numerical Solution of Optimal ControlProblems with Bounded State Variables. Journal of Optimization Theory and Applications 89(2): 351–372.

Friedman, J. 1971. A Non-Cooperative Equilibrium for Supergames. Review of EconomicStudies 28 (1): 1–12.

Fudenberg, D., and E. Maskin. 1986. The Folk Theorem in Repeated Games withDiscounting or with Incomplete Information. Econometrica 54 (3): 533–554.

Fudenberg, D., and J. Tirole. 1991. Game Theory. Cambridge, Mass.: MIT Press.

Galilei, G. 1638. Discorsi e Dimonstrazioni Matematiche Intorno à Due Nuoue Scienze. Leiden,Netherlands: Elsevirii.

Gamkrelidze, R. V. 1959. Time-Optimal Processes with Bounded State Coordinates.Doklady Akademii Nauk SSSR 125: 475–478.

———. 1978. Principles of Optimal Control Theory. New York: Plenum Press.

———. 1999. Discovery of the Maximum Principle. Journal of Dynamical and Control Systems5 (4): 437–451.

———. 2008. The “Pontryagin Derivative” in Optimal Control. Doklady Mathematics 77(3): 329–331.

Geanakoplos, J. 1992. Common Knowledge. Journal of Economic Perspectives 6 (4): 53–82.

Gelfand, I. M., and S. V. Fomin. 1963. Calculus of Variations. Englewood Cliffs, N.J.: Prentice-Hall. Reprinted: Dover Publications, Mineola, N.Y., 2000.

Giaquinta, M., and S. Hildebrandt. 1996. Calculus of Variations. 2 vols. New York: Springer.

Gibbard, A. 1973. Manipulation of Voting Schemes: A General Result. Econometrica 41 (4):587–601.

Gibbons, R. 1992. Game Theory for Applied Economists. Princeton, N.J.: Princeton UniversityPress.

Page 353: Optimal Control Theory With Applications in Economics

340 References

Godunov, S. K. 1994. Ordinary Differential Equations with Constant Coefficient. Providence,R.I.: American Mathematical Society.

Goldstine, H. H. 1980. A History of the Calculus of Variations from the Seventeenth through theNineteenth Century. New York: Springer.

Granas, A., and J. Dugundji. 2003. Fixed Point Theory. New York: Springer.

Green, J., and J.-J. Laffont. 1979. On Coalition Incentive Compatibility. Review of EconomicStudies 46 (2): 243–254.

Gronwall, T. H. 1919. Note on the Derivatives with Respect to a Parameter of the Solutionsof a System of Differential Equations. Annals of Mathematics 20 (3): 292–296.

Guesnerie, R., and J.-J. Laffont. 1984. A Complete Solution to a Class of Principal-AgentProblems with an Application to the Control of a Self-Managed Firm. Journal of PublicEconomics 25 (3): 329–369.

Guillemin, V., and A. Pollack. 1974. Differential Topology. Englewood Cliffs, N.J.: Prentice-Hall.

Gül, F., H. Sonnenschein, and R. B. Wilson. 1986. Foundations of Dynamic Monopoly andthe Coase Conjecture. Journal of Economic Theory 39 (1): 155–190.

Hadamard, J. 1902. Sur les Problèmes aux Dérivées Partielles et Leur SignificationPhysique. Princeton University Bulletin, no. 23, 49–52.

Halkin, H. 1974. Necessary Conditions for Optimal Control Problems with InfiniteHorizons. Econometrica 42 (2): 267–272.

Hamilton, W. R. 1834. On a General Method in Dynamics. Philosophical Transactions of theRoyal Society of London 124: 247–308.

Harsanyi, J. C. 1967. Games with Incomplete Information Played by “Bayesian” Players.Part I: The Basic Model. Management Science 14 (3): 159–182.

Hartl, R. F., S. P. Sethi, and R. G. Vickson. 1995. A Survey of the Maximum Principles forOptimal Control Problems with State Constraints. SIAM Review 37 (2): 181–218.

Hartman, P. 1964. Ordinary Differential Equations. New York: Wiley.

Hildebrandt, S. 1989. The Calculus of Variations Today. Mathematical Intelligencer 11 (4):50–60.

Hildebrandt, S., and A. Tromba. 1985. Mathematics and Optimal Form. New York: Freeman.

Hotelling, H. 1931. The Economics of Exhaustible Resources. Journal of Political Economy39 (2): 137–175.

Hungerford, T. W. 1974. Algebra. New York: Springer.

Hurwicz, L., and S. Reiter. 2006. Designing Economic Mechanisms. New York: CambridgeUniversity Press.

Hurwitz, A. 1895. Über die Bedingungen, unter welchen eine Gleichung nur Wurzeln mitnegativen reellen Theilen besitzt. Mathematische Annalen 46 (2): 273–284.

Huygens, C. 1673. Horologium Oscillatorium. Paris: F. Muguet.

Page 354: Optimal Control Theory With Applications in Economics

References 341

Ioffe, A. D., and V. M. Tikhomirov. 1979. Theory of Extremal Problems. Amsterdam: NorthHolland.

Isidori, A. 1995. Nonlinear Control Systems. New York: Springer.

Jacobi, C.G.J. 1884. Vorlesungen über Dynamik. In Gesammelte Werke: Supplementband, ed.E. Lottner. Berlin: G. Reimer.

Jordan, C. 1909. Cours d’Analyse de l’École Polytechnique. 3 vols. Paris: Gauthier-Villars.

Jowett, B. 1881.The Republic of Plato. 3d ed. Oxford: Clarendon Press.

Kakeya, S. 1917. Some Problems on Maximum and Minimum Regarding Ovals. TôhokuScience Reports 6: 71–88.

Kakutani, S. 1941. A Generalization of Brouwer’s Fixed Point Theorem. Duke MathematicalJournal 8 (3): 457–459.

Kalman, R. E., and R. S. Bucy. 1961. New Results in Linear Filtering and Prediction Theory.Transactions of the ASME Ser. D: Journal of Basic Engineering 83 (3): 95–108.

Kamien, M. I., and N. L. Schwartz. 1991. Dynamic Optimization: The Calculus of Variationsand Optimal Control in Economics and Management. 2d ed. Amsterdam: Elsevier.

Kant, I. 1781. Kritik der Reinen Vernunft. Riga: J.F. Hartknoch.

Karp, L., and D. M. Newbery. 1993. Intertemporal Consistency Issues in DepletableResources. In Handbook of Natural Resource and Energy Economics, vol. 2, ed. A. V. Kneeseand J. L. Sweeney, 881–931. Amsterdam: Elsevier.

Keerthi, S. S., and E. G. Gilbert. 1988. Optimal Infinite-Horizon Feedback Laws fora General Class of Constrained Discrete-Time Systems: Stability and Moving-HorizonApproximation. Journal of Optimization Theory and Applications 57 (2): 265–293.

Khalil, H. K. 1992. Nonlinear Systems. New York: Macmillan.

Kihlstrom, R. E., and M. H. Riordan. 1984. Advertising as a Signal. Journal of PoliticalEconomy 92 (3): 427–450.

Kirillov, A. A., and A. D. Gvishiani. 1982. Theorems and Problems in Functional Analysis.New York: Springer.

Kolmogorov, A. N., and S. V. Fomin. 1957. Elements of the Theory of Functions and FunctionalAnalysis. 2 vols. Rochester, N.Y.: Graylock Press. Reprinted: Dover Publications, Mineola,N.Y., 1999.

Koopmans, T. C. 1965. On the Concept of Optimal Economic Growth. The EconometricApproach to Development Planning. Pontificiae Academiae Scientiarum Scripta Varia 28: 225–300. Reissued: North-Holland, Amsterdam, 1966.

Kreps, D. M. 1990. Game Theory and Economic Modelling. Oxford: Clarendon Press.

Kress, R. 1998. Numerical Analysis. New York: Springer.

Kurian, G. T., and B. A. Chernov. 2007. Datapedia of the United States. Lanham, Md.: BernanPress.

Kydland, F. E., and E. C. Prescott. 1977. Rules Rather Than Discretion: The Inconsistencyof Optimal Plans. Journal of Political Economy 85 (3): 473–491.

Page 355: Optimal Control Theory With Applications in Economics

342 References

Laffont, J.-J. 1989. The Economics of Uncertainty and Information. Cambridge, Mass.: MITPress.

Laffont, J.-J., E. Maskin, and J.-C. Rochet. 1987. Optimal Nonlinear Pricing with Two-Dimensional Characteristics. In Information, Incentives, and Economic Mechanisms: Essaysin Honor of Leonid Hurwicz, ed. T. Groves, R. Radner, and S. Steiger, 255–266. Minneapolis:University of Minnesota Press.

Lagrange, J.-L. 1760. Essai d’une Nouvelle Méthode pour Déterminer les Maxima et lesMinima des Formules Intégrales Indéfinies. Reprinted in Oeuvres, vol. 1, ed. J.-A. Serret,335–362. Hildesheim, Germany: G. Olms, 1973.

———. 1788/1811. Mécanique Analytique, Nouvelle Édition. Vol. 1. Paris: Mme Ve Courcier.Reprinted: Cambridge University Press, New York, 2009.

Lancaster, K. J. 1966. A New Approach to Consumer Theory. Journal of Political Economy74 (2): 132–157.

Lang, S. 1993. Real and Functional Analysis. 3d ed. New York: Springer.

LaSalle, J. P. 1968. Stability Theory for Ordinary Differential Equations. Journal ofDifferential Equations 4 (1): 57–65.

Lee, E. B., and L. Markus. 1967. Foundations of Optimal Control Theory. New York: Wiley.

Lee, J. M. 2003. Introduction to Smooth Manifolds. New York: Springer.

Leibniz, G. W. 1684. Nova Methodus pro Maximis et Minimis, Itemque Tangentibus,quaenec Fractas, nec Irrationales Quantitates Moratur, et Singulare pro Illis Calculi Genus.Acta Eruditorum. Leipzig: J. Grossium and J. F. Gletitschium.

Lorenz, E. N. 1963. Deterministic Nonperiodic Flow. Journal of the Atmospheric Sciences20 (2): 130–141.

Lotka, A. J. 1920. Undamped Oscillations Derived from the Law of Mass Action. Journalof the American Chemical Society 42 (8): 1595–1599.

Luenberger, D. G. 1969. Optimization by Vector Space Methods. New York: Wiley.

Lyapunov, A. M. 1892. The General Problem of the Stability of Motion. Kharkov, Ukraine:Kharkov Mathematical Society. French translation: Problème Général de la Stabilité duMouvement, trans. A. Liapounoff. Annales de la Faculté des Sciences de Toulouse, DeuxièmeSérie 9 (1907): 203–474. English translation: The General Problem of the Stability of Motion,trans. A. T. Fuller. London: Taylor and Francis, 1992.

Magaril-Il’yaev, G. G., and V. M. Tikhomirov. 2003. Convex Analysis: Theory and Applications.Providence, R.I.: American Mathematical Society.

Mailath, G. J., and L. Samuelson. 2006. Repeated Games and Reputations: Long-RunRelationships. Oxford: Oxford University Press.

Mangasarian, O. L. 1966. Sufficient Conditions for the Optimal Control of NonlinearSystems. SIAM Journal on Control 4 (1): 139–152.

Matthews, S. A., and J. H. Moore. 1987. Monopoly Provision of Quality and Warranties:An Exploration in the Theory of Multidimensional Screening. Econometrica 55 (2): 441–467.

Page 356: Optimal Control Theory With Applications in Economics

References 343

Maupertuis, P.-L. 1744. Accord de Différentes Loix de la Nature qui Avoient Jusqu’ici ParuIncompatibles. Histoire de l’Académie Royale des Sciences de Paris, 417–426.

Maxwell, J. C. 1868. On Governors. Proceedings of the Royal Society 16 (10): 270–283.

Mayne, D. Q., and H. Michalska. 1990. Receding Horizon Control of Nonlinear Systems.IEEE Transactions on Automatic Control 35 (7): 814–824.

Mayr, O. 1970. The Origins of Feedback Control. Cambridge, Mass.: MIT Press.

Megginson, R. E. 1998. An Introduction to Banach Space Theory. New York: Springer.

Milgrom, P. R., and I. Segal. 2002. Envelope Theorems for Arbitrary Choice Sets.Econometrica 70 (2): 583–601.

Milgrom, P. R., and R. J. Weber. 1985. Distributional Strategies for Games with IncompleteInformation. Mathematics of Operations Research 10: 619–632.

Milyutin, A. A., and N. P. Osmolovskii. 1998. Calculus of Variations and Optimal Control.Providence, R.I.: American Mathematical Society.

Mirrlees, J. A. 1971. An Exploration in the Theory of Optimal Income Taxation. Review ofEconomic Studies 38 (2): 175–208.

Moler, C. B., and C. F. Van Loan. 2003. Nineteen Dubious Ways to Compute the Exponentialof a Matrix: Twenty-Five Years Later. SIAM Review 45 (1): 3–49.

Mordukhovich, B. S. 2006. Variational Analysis and Generalized Differentiation. 2 vols. NewYork: Springer.

Murray, J. D. 2007. Mathematical Biology. 3d ed. 2 vols. New York: Springer.

Mussa, M., and S. Rosen. 1978. Monopoly and Product Quality. Journal of Economic Theory18 (2): 301–317.

Myerson, R. B. 1979. Incentive Compatibility and the Bargaining Problem. Econometrica47 (1): 61–74.

Nash, J. F. 1950. Equilibrium Points in n-Person Games. Proceedings of the National Academyof Sciences 36 (1): 48–49.

Nerlove, M. L., and K. J. Arrow. 1962. Optimal Advertising Policy under DynamicConditions. Economica 29 (144): 129–142.

Palfrey, T. R., and S. Srivastava. 1993. Bayesian Implementation. Chur, Switzerland:Harwood Academic Publishers.

Peano, G. 1890. Démonstration de l’Integrabilité des Équations Différentielles Ordinaires.Mathematische Annalen 37 (2): 182–238.

Pearl, J. 2000. Causality: Models, Reasoning, and Inference. Cambridge: CambridgeUniversity Press.

Perron, O. 1913. Zur Existenzfrage eines Maximums oder Minimums. Jahres-bericht derDeutschen Mathematiker-Vereinigung 22 (5/6): 140–144.

Petrovski[i], I. G. 1966. Ordinary Differential Equations. Englewood Cliffs, N.J.: Prentice-Hall. Reprinted: Dover Publications, Mineola, N.Y., 1973.

Page 357: Optimal Control Theory With Applications in Economics

344 References

Poincaré, H. 1892. Les Méthodes Nouvelles de la Mécanique Céleste. Vol. I. Paris: Gauthier-Villars.

———. 1928. Oeuvres de Henri Poincaré. Vol. 1. Paris: Gauthier-Villars.

Pontryagin, L. S. 1962. Ordinary Differential Equations. Reading, Mass.: Addison Wesley.

Pontryagin, L. S., V. G. Boltyanskii, R. V. Gamkrelidze, and E. F. Mishchenko. 1962. TheMathematical Theory of Optimal Processes. New York: Wiley Interscience.

Powell, W. B. 2007. Approximate Dynamic Programming. New York: Wiley Interscience.

Primbs, J. 2007. Portfolio Optimization Applications of Stochastic Receding HorizonControl. In Proceedings of the 26th American Control Conference, New York, 1811–1816.

Radner, R., and R. W. Rosenthal. 1982. Private Information and Pure-Strategy Equilibria.Mathematics of Operations Research 7: 401–409.

Radó, T. 1930. On Plateau’s Problem. Annals of Mathematics 31 (2): 457–469.

Ramsey, F. P. 1928. A Mathematical Theory of Saving. Economic Journal 38 (152): 543–559.

Roberts, S. M., and J. S. Shipman. 1972. Two-Point Boundary Value Problems: ShootingMethods. New York: Elsevier.

Robertson, R. M. 1949. Mathematical Economics before Cournot. Journal of PoliticalEconomy 57 (6): 523–536.

Rochet, J.-C. 1985. The Taxation Principle and Multitime Hamilton-Jacobi Equations.Journal of Mathematical Economics 14 (2): 113–128.

Rochet, J.-C., and P. Choné. 1998. Ironing, Sweeping and Multidimensional Screening.Econometrica 66 (4): 783–826.

Rochet, J.-C., and L. A. Stole. 2003. The Economics of Multidimensional Screening. InAdvances in Economics and Econometrics: Theory and Applications, vol. 1, ed. M. Dewatripont,L.-P. Hansen, and S. J. Turnovsky, 150–197. New York: Cambridge University Press.

Rockafellar, R. T., and R.J.B. Wets. 2004. Variational Analysis. New York: Springer.

Ross, I. M., and F. Fahroo. 2003. Legendre Pseudospectral Approximations of OptimalControl Problems. In New Trends in Nonlinear Dynamics and Control, and Their Applications,ed. W. Kang, M. Xiao, and C. Borges, 327–341. Lecture Notes in Control and InformationSciences. Vol. 295. New York: Springer.

Ross, W. D. 1951. Plato’s Theory of Ideas. Oxford: Clarendon Press.

Routh, E. J. 1877. A Treatise on the Stability of a Given State of Motion. London: Macmillan.

Rudin, W. 1976. Principles of Mathematical Analysis. 3d ed. New York: McGraw-Hill.

Russell, B. 1961. A History of Western Philosophy. 2d ed. New York: Simon and Schuster.Reprinted: Folio Society, London, 2004.

Rutquist, P. E., and M. M. Edvall. 2009. PROPT: MATLAB Optimal Control Software.Technical Report. Pullman, Wash.: TOMLAB Optimization.

Sachdev, P. L. 1997. A Compendium on Nonlinear Ordinary Differential Equations. New York:Wiley Interscience.

Page 358: Optimal Control Theory With Applications in Economics

References 345

Samuelson, P. A. 1970. What Makes for a Beautiful Problem in Science? Journal of PoliticalEconomy 78 (6): 1372–1377.

Schechter, M. 2004. An Introduction to Nonlinear Analysis. Cambridge: CambridgeUniversity Press.

Schwartz, A. L. 1996. Theory and Implementation of Numerical Methods Based onRunge-Kutta Integration for Solving Optimal Control Problems. Ph.D. diss., Universityof California, Berkeley.

Schwartz, A. L., E. Polak, and Y. Chen. 1997. RIOTS: A MATLAB Toolbox for SolvingOptimal Control Problems. Technical Report. Department of Electrical Engineering andComputer Sciences, University of California, Berkeley.

Seierstad, A., and K. Sydsæter. 1987. Optimal Control Theory with Economic Applications.Amsterdam: North-Holland.

Sethi, S. P. 1977. Optimal Advertising for the Nerlove-Arrow Model under a BudgetConstraint. Operational Research Quarterly 28 (3): 683–693.

Sethi, S. P., and G. L. Thompson. 1974. Quantitative Guidelines for a CommunicableDisease: A Complete Synthesis. Biometrics 30 (4): 681–691.

———. 2000. Optimal Control Theory: Applications to Management Science and Economics. 2ded. Boston: Kluwer.

Sim, Y. C., S. B. Leng, and V. Subramaniam. 2000. A Combined Genetic Algorithms–Shooting Method Approach to Solving Optimal Control Problems. International Journal ofSystems Science 31 (1): 83–89.

Smirnov, G. V. 2002. Introduction to the Theory of Differential Inclusions. Providence, R.I.:American Mathematical Society.

Sontag, E. D. 1998. Mathematical Control Theory: Deterministic Finite-Dimensional Systems.2d ed. New York: Springer.

Spence, A. M. 1973. Job Market Signaling. Quarterly Journal of Economics 87 (3): 355–374.

Steiner, J. 1842. Sur le Maximum et le Minimum des Figures dans le Plan, sur la Sphèreet dans l’Espace en Général. Journal für die Reine und Angewandte Mathematik 1842 (24):93–162, 189–250.

Stephens, P. A., W. J. Sutherland, and R. P. Freckleton. 1999. What Is the Allee Effect? Oikos87 (1): 185–190.

Stiglitz, J. E. 1975. The Theory of “Screening,” Education, and the Distribution of Income.American Economic Review 65 (3): 283–300.

Stokey, N. L. 1981. Rational Expectations and Durable Goods Pricing. Bell Journal ofEconomics 12 (1): 112–128.

Strang, G. 2009. Introduction to Linear Algebra. 4th ed. Wellesley, Mass.: Wellesley-Cambridge Press.

Struwe, M. 1989. Plateau’s Problem and the Calculus of Variations. Princeton, N.J.: PrincetonUniversity Press.

Page 359: Optimal Control Theory With Applications in Economics

346 References

Sutton, R. S., and A. G. Barto. 1998. Reinforcement Learning: An Introduction. Cambridge,Mass.: MIT Press.

Taylor, A. E. 1965. General Theory of Functions and Integration. New York: BlaisdellPublishing. Reprinted: Dover Publications, Mineola, N.Y., 1985.

Thom, R. F. 1972. Stabilité Structurelle et Morphogénèse: Essai d’Une Théorie Générale desModèles. Reading, Mass.: W.A. Benjamin.

Thomas, I. 1941. Greek Mathematical Works. 2 vols. Cambridge, Mass.: Harvard UniversityPress.

Tikhomirov, V. M. 1986. Fundamental Principles of the Theory of Extremal Problems. New York:Wiley Interscience.

Tikhonov, A. N. 1963. Solution of Incorrectly Formulated Problems and the RegularizationMethod. Doklady Akademii Nauk SSSR 151 (4): 501–504. In Russian.

Verri, P. 1771. Meditazioni Sulla Economia Politica. Livorno: Stamperia dell’Enciclopedia.

Vidale, M. L., and H. B. Wolfe. 1957. An Operations Research Study of Sales Response toAdvertising. Operations Research 5 (3): 370–381.

Vinter, R. B. 1988. New Results on the Relationship between Dynamic Programming andthe Maximum Principle. Mathematics of Control, Signals, and Systems 1 (1): 97–105.

———. 2000. Optimal Control. Boston: Birkhäuser.

Volterra, V. 1926. Variazione Fluttuazioni del Numero d’Individui in Specie AnimaliConviventi. Memorie dela R. Academia Nationale dei Lincei. Ser. 6, vol. 2, 31–113. Englishtranslation: Variations and Fluctuations of a Number of Individuals in Animal SpeciesLiving Together. In Animal Ecology, by R. N. Chapman, 409–448. New York: McGraw-Hill,1931.

von Neumann, J. 1928. Zur Theorie der Gesellschaftsspiele. Mathematische Annalen 100:295–320.

von Neumann, J., and O. Morgenstern. 1944. Theory of Games and Economic Behavior.Princeton, N.J.: Princeton University Press.

Walter, W. 1998. Ordinary Differential Equations. New York: Springer.

Warga, J. 1972. Optimal Control of Differential and Functional Equations. New York: AcademicPress.

Weber, T. A. 1997. Constrained Predictive Control for Corporate Policy. Technical Report LIDS-TH 2398. Laboratory for Information and Decision Systems, Massachusetts Institute ofTechnology, Cambridge, Mass.

———. 2005a. Infinite-Horizon Optimal Advertising in a Market for Durable Goods.Optimal Control Applications and Methods 26 (6): 307–336.

———. 2005b. Screening with Externalities. Technical Report 2005-05-17. Department ofManagement Science and Engineering, Stanford University, Stanford, Calif.

———. 2006. An Infinite-Horizon Maximum Principle with Bounds on the AdjointVariable. Journal of Economic Dynamics and Control 30 (2): 229–241.

Page 360: Optimal Control Theory With Applications in Economics

References 347

Weierstrass, K. 1879/1927. Mathematische Werke. Vol. 7: Vorlesungen über Variationsrech-nung. Leipzig: Akademische Verlagsgesellschaft.

Weitzman, M. L. 2003. Income, Wealth, and the Maximum Principle. Cambridge, Mass.:Harvard University Press.

Wiener, N. 1948. Cybernetics, or, Control and Communications in the Animal and the Machine.New York: Wiley.

———. 1950. The Human Use of Human Beings: Cybernetics and Society. Boston: HoughtonMifflin.

Wiggins, S. 2003. Introduction to Applied Nonlinear Dynamic Systems and Chaos. New York:Springer.

Willems, J. C. 1996. 1696: The Birth of Optimal Control. In Proceedings of the 35th Conferenceon Decision and Control, Kobe, Japan.

Williams, S. R. 2008. Communication in Mechanism Design: A Differential Approach. NewYork: Cambridge University Press.

Wilson, R. B. 1971. Computing Equilibria of n-Person Games. SIAM Journal on AppliedMathematics 21 (1): 80–87.

———. 1993. Nonlinear Pricing. Oxford: Oxford University Press.

Zermelo, E. 1913. Über eine Anwendung der Mengenlehre auf die Theorie desSchachspiels. In Proceedings of the Fifth Congress of Mathematicians, Cambridge, 501–504.

Zorich, V. A. 2004. Mathematical Analysis. 2 vols. New York: Springer.

Page 361: Optimal Control Theory With Applications in Economics
Page 362: Optimal Control Theory With Applications in Economics

Index

{∃, ∀, ∀, �, . . .}, 231

a.a. (almost all), 103Abreu, Dilip, 179Absolute continuity, 243Absolute risk aversion, 102Accumulation point, 236Acemoglu, Darron (1967–), 11Acta Eruditorum, 7Action set, 155Adjoint equation, 107, 110Adjoint variable, 82, 96

current-value, 110Admissible control, 90, 104Adverse selection, 182, 207Advertising, 25, 47, 76, 117, 141a.e. (almost everywhere), 124Afanas’ev, Alexander Petrovich (1945–),

141Afrajmovich, Valentin Senderovich

(1945–), 75Aghion, Philippe (1956–), 11Akerlof, George Arthur (1940–), 181, 182Alaoglu, Leonidas (1914–1981), 126Aleksandrov, Aleksandr Danilovich

(1912–1999), 15Alembert, Jean le Rond d’ (1717–1783), 9Algebraic group, 40Allee, Warder Clyde (1885–1955), 78Allee effect, 78, 79Allegory of the Cave, 6Allocation function, 216Ampère, André-Marie (1775–1836), 12Anderson, Brian David Outram, 140Anti-derivative, 243Anti-razor, 11Aoki, Masahiko (1938–), 179

Apostol, Tom M. (1923–), 252Approximate dynamic programming, 258Aristotle (384 B.C.–322 B.C.), 9Armstrong, Christopher Mark (1964–), 227Armstrong, Mark Anthony, 241Arnold, Vladimir Igorevich (1937–2010),

10, 13, 64, 75Arrow, Kenneth Joseph (1921–), 7, 102Arrow-Pratt coefficient, 102Arutyunov, Aram V., 106, 120, 133, 141Arzelà, Cesare (1847–1912), 241Arzelà-Ascoli theorem, 241Ascoli, Giulio (1843–1896), 241Aseev, Sergey M., 115, 140Asymptotic stability, 58Aubin, Jean-Pierre, 251, 252Auction

first-price, 163second-price, 152

Augmented history, 175Aumann, Robert John (1930–), 162Autonomous system. See Ordinary

differential equationAxelrod, Robert (1943–), 177

Backward induction, 92, 151Baker, Henry Frederick (1866–1956), 71Balder, Erik J., 166, 203Banach, Stefan (1892–1945), 31, 126, 237,

238Banach-Alaoglu theorem, 126Banach fixed-point theorem, 238Banach space, 31Banach space (complete normed vector

space), 237Barbashin, Evgenii Alekseevich

(1918–1969), 58

Page 363: Optimal Control Theory With Applications in Economics

350 Index

Barrow-Green, June (1953–), 13Barto, Andrew G., 258Basar, Tamer (1949–), 203Bass, Frank M. (1926–2006), 18, 76Bass diffusion model, 18, 19, 76, 259Battle of the sexes, 157, 171Bayes, Thomas (1702–1761), 152Bayesian perfection, 181Bayes-Nash equilibrium (BNE), 152,

163, 184existence, 165

Behavioral strategy, 166, 184Bellman, Richard Ernest (1920–1984), 14,

92, 118, 140, 245Bellman equation, 118, 119Bendixson, Ivar Otto (1861–1935), 13, 63Berge, Claude (1926–2002), 251Berge maximum theorem, 251Bernoulli, Daniel (1700–1782), 24Bernoulli, Jakob (1654–1705), 27Bernoulli, Johann (1667–1748), 7, 8Bertsekas, Dimitri P., 140, 151, 246,

252, 258Besicovitch, Abram Samoilovitch

(1891–1970), 7Besicovitch set. See Kakeya setBest-response correspondence, 157Bidding function, 152, 164Blåsjö, Viktor, 15Bluman, George W. (1943–), 44Boltyanskii, Vladimir Grigorevich (1925–),

14, 15, 106, 114, 140, 141Bolza, Oskar (1857–1942), 105Bolza problem, 105Bolzano, Bernhard Placidus Johann

Nepomuk (1781–1848), 236, 237Bolzano-Weierstrass theorem, 236Border, Kim C., 252Boundary value problem (BVP), 254Boundedness condition (condition B),

106Boyd, Stephen P., 252Brachistochrone, 7, 8Branding, 277Brinkhuis, Jan (1952–), 252Bronshtein, Ilja N., 74Buckingham, Edgar (1867–1940), 44Bucy, Richard Snowden (1935–), 13Bulow, Jeremy, 168Bunching, 220, 224, 330Byrne, Oliver (1810–1880), 6

Cajori, Florian (1859–1930), 7Calculus of variations, 7–9, 11, 14, 100, 140Campbell, Douglas M., 15Cantor, Moritz Benedikt (1829–1920), 6Carathéodory, Constantin (1873–1950), 140Carathéodory’s theorem, 140Carlson, Dean A. (1955–), 140Cass, David (1937–2008), 11Cauchy, Augustin-Louis (1789–1857), 3,

20, 25Cauchy formula, 3, 24generalized, 72

Cauchy sequence, 237Cayley, Arthur (1821–1895), 85Cayley-Hamilton theorem, 85Center, 48Centipede game, 172Cesari, Lamberto (1910–1990), 137, 141Cetaev, Nikolaj Gurjevic, 49Characteristic equation, 233Chen, YanQuan, 257Chernov, Barbara Ann, 260Cho, In-Koo, 188Choné, Philippe, 227Chukanov, S. V., 141Clairaut, Alexis Claude de (1713–1765),

242Clairaut’s theorem, 242Clarke, Francis H. (1948–), 15, 140Closed-loop control, 12Closed trajectory, 63Closure (of a set), 236Coase, Ronald Harry (1910–), 168Coase conjecture, 168Coase problem, 168Coddington, Earl A. (1920–1991), 75Co-domain (of a function), 241Commitment, 169, 215, 217Common knowledge, 155Compactness condition (condition C), 106Competitive exclusion, 77, 260Complementary slackness condition, 249Complete contingent plan. See StrategyComplete normed vector space. See

Banach spaceComplete shutdown, 328Congestion externality, 228Conjugate (Young-Fenchel transform), 101Consumption smoothing, 278Contractability (observability plus

verifiability), 209

Page 364: Optimal Control Theory With Applications in Economics

Index 351

Contraction mapping, 32Contraction mapping principle. See

Banach fixed-point theoremControl (variable), 4, 83

admissible, 90, 104Control constraint, 89Control constraint set. See Control setControllability, 4, 81, 84, 88Control set, 84, 87Control system, 11, 83

linear, 84Convergence (of a sequence), 236Convergence of trajectory to set, 62Convex hull, 127Coordination game, 157Co-state. See Adjoint variableCournot, Antoine Augustin (1801–1877),

197Cournot oligopoly, 160, 203

with cost uncertainty, 164repeated, 179

Cournot-Stackelberg duopoly, 197Cumulative distribution function

(cdf), 162Current-value formulation, 110Cybernetics, 12Cycloid, 8

De-dimensionalize (an ODE), 44, 77δ-calculus, 8Dido (queen of Carthage) (ca. 800 B.C.), 6Dido’s problem. See Isoperimetric problemDifferentiable function, 242Differential game, 188Dikusar, Vasily Vasilievich (1937–), 141Dirac, Paul Adrien Maurice (1902–1984),

75Dirac distribution, 75Discount factor, 110, 173, 174Discount rate, 92Dixit, Avinash Kamalakar (1944–), 177Dmitruk, Andrei, 141Dockner, Engelbert, 203Domain, 20

contractible, 41Domain (of a function), 241Dorfman, Robert (1916–2002), 141Douglas, Jesse (1897–1965), 10DuBois-Reymond, Paul David Gustav

(1831–1889), 100DuBois-Reymond equation, 100

Dubovitskii, Abraham Yakovlevich(1923–), 103, 106, 141

Dugundji, James (1920–1985), 252Duhamel, Jean-Marie (1797–1872), 25Duhamel principle, 25Dunford, Nelson (1906–1986), 126, 252Dutta, Prajit K., 179

Eberlein, William F. (1917–1986), 126Eberlein-Šmulian theorem, 126Edvall, Markus M., 257Egorov, Dmitri Fyodorovich (1869–1931),

130Eigenvalue, 233Einstein, Albert (1879–1955), 11, 81Ekeland, Ivar (1944–), 252Endpoint constraint, 89componentwise, 242regularity, 104

Endpoint optimality, 108, 110Entry game, 166, 167, 169, 170Envelope condition, 97, 108, 110Envelope theorem, 250Equicontinuity, 242Equilibrium, 3Bayes-Nash, 152, 163, 184Markovian Nash, 189Markov-perfect, 154mixed-strategy Nash, 158multiplicity, 179Nash, 1, 4, 149, 156, 174non-Markovian, 190odd number of Nash, 161perfect Bayesian, 181, 185pooling (see Signaling)refinements, 4, 151separating, 154subgame-perfect Nash, 169, 172, 191trigger strategy, 199

Equilibrium (point), 45asymptotically stable, 47classification, 48exponentially stable, 47region of attraction, 56stable, 19, 47unstable, 47

Equilibrium path, 151on vs. off, 153–155, 181, 185–187, 191, 202

Essential supremum, 125Euclid (of Alexandria) (ca. 300 B.C.), 5, 6Euler, Leonhard Paul (1707–1783), 8, 24, 42

Page 365: Optimal Control Theory With Applications in Economics

352 Index

Euler equation, 8, 10, 100Euler-Lagrange equation. See

Euler equationEuler multiplier, 42Euler’s identity, 231Event, 88

controllability, 88Existence

Bayes-Nash equilibrium, 165fixed point: Banach, 238fixed point: Kakutani, 251global solution to IVP, 35local solution to IVP, 31mixed-strategy NE, 160Nash equilibrium (NE), 159remark, 6solution to IVP, 28solution to OCP (see Filippov existence

theorem)subgame-perfect NE, 172

Expected value of perfect information(EVPI), 301

Exponential growth, 21Exponential stability, 60Extensive form, 166, 169, 170Extremal principles, 9

Fahroo, Fariba, 257Feedback control, 11Feedback law, 11, 90, 94, 119, 193,

194Feinstein, Charles D., 115Fenchel, Werner (1905–1988), 101Fermat, Pierre de (1601–1665),

9, 247Fermat’s lemma, 120, 247Fermat’s principle. See

Principle, of least timeFilippov, Aleksei Fedorovich (1923–), 75,

135, 136, 138Filippov existence theorem, 135, 136Finite escape time, 35First-best solution. See Mechanism, design

problemFirst-price auction, 163Flow

of autonomous system, 40of vector field, 40

Focusstable, 48unstable, 48

Folk theoremNash reversion, 178subgame-perfect, 178

Fomenko, Anatoly Timofeevich (1945–), 10Fomin, Sergei Vasilovich (1917–1975),

126, 252Fraser-Andrews, G., 255Freckleton, Robert P., 79Friedman, James W., 178Frisi, Paolo (1728–1784), 7Fromovitz, S., 105, 249Fudenberg, Drew, 178, 202Function, 240coercive, 58continuous, 241essentially bounded, 241homogeneous (of degree k), 22Lipschitz, 29locally Lipschitz, 30measurable, 241separable, 21set-valued, 251upper semicontinuous, 251

Function familyequicontinuous, 242uniformly bounded, 242

Fundamental matrix, 69properties, 69

Fundamental theorem of algebra, 233Fundamental theorem of calculus, 243

Galerkin, Boris Grigoryevich (1871–1945),257

Galerkin methods, 257Galilei, Galileo (1564–1642), 8Game, 149Bayesian, 162differential, 188dynamic, 149complete-information, 166hierarchical, 198imperfect-information, 171incomplete-information, 180perfect-information, 172

extensive-form, 169, 170nodes (terminal/nonterminal), 169repeated, 173signaling, 181static, 149complete-information, 155incomplete-information, 161

Page 366: Optimal Control Theory With Applications in Economics

Index 353

Game theory, 4, 149Gamkrelidze, Revaz Valerianovich

(1927–), 14, 15, 101, 106, 114, 140Geanakoplos, John (1955–), 155Gelfand, Israel Moiseevich (1913–2009),

126Giaquinta, Mariano (1947–), 141Gibbard, Allan (1942–), 216Gibbons, Robert S., 203Gilbert, Elmer G., 14Global asymptotic stability, 58, 60Godunov, Sergei Konstantinovich

(1929–), 75Going concern, 83, 113Golden rule, 115, 117, 142Goldstine, Herman Heine (1913–2004), 8,

9, 15Gompertz, Benjamin (1779–1865), 76Gompertz growth, 76Gradient method (steepest ascent),

255Granas, Andrzej, 252Green, George (1793–1841), 67Green, Jerry R., 216Gronwall, Thomas Hakon (1877–1932),

245Gronwall-Bellman inequality, 245

simplified, 245Group, 40Growth

exponential, 21Gompertz, 76logistic, 21, 76

Guesnerie, Roger, 226Guillemin, Victor (1937–), 241Gül, Faruk R., 168Gvishiani, Alexey Dzhermenovich

(1948–), 130, 158, 241, 243

Hadamard, Jacques Salomon (1865–1963),28

Halkin, Hubert, 114Hamilton, William Rowan (1805–1865),

9, 85Hamiltonian, 9, 96, 99, 107

current-value, 111geometric interpretation, 100, 101maximized, 97

Hamiltonian system, 9Hamilton-Jacobi-Bellman (HJB) equation,

4, 13–15, 17, 82, 90, 95, 140, 151

with discounting and salvage value,92, 93

principle of optimality, 92uniqueness of solution, 91

Hamilton-Jacobi-Bellman inequality, 90Hamilton-Jacobi equation, 10, 13, 14Hamilton-Pontryagin function.

See HamiltonianHarmonic oscillator, 87Harsanyi (Harsányi), John Charles (János

Károly) (1920–2000), 180Hartl, Richard F., 106Hartman, Philip (1915–), 75Haurie, Alain (1940–), 140Hazard rate, 18, 224Helly, Eduard (1884–1943), 131Heron of Alexandria (ca. 10 B.C.–A.D.

70), 9Hesse, Ludwig Otto (1811–1874), 42, 242Hessian, 242Higgins, John C., 15Hildebrandt, Stefan (1936–), 9, 15, 141Hoëne-Wronski, Jósef Maria (1778–1853),

69Homotopy, 41Hotelling, Harold (1895–1973), 143Hotelling rule, 143, 278Howitt, Peter Wilkinson (1946–), 11Hungerford, Thomas W., 85, 251Hurwicz, Leonid (1917–2008), 227Hurwitz, Adolf (1859–1919), 12, 52, 53Hurwitz property, 53Huygens, Christiaan (1629–1695), 8

Ill-posed problems, 28, 75Il’yashenko, Yuliy S., 64, 75Implementation horizon, 146Implementation theorem, 218Implicit differentiation, 244Implicit function theorem, 244Implicit programming problem, 115,

116Inada, Ken-Ichi (1925–2002), 161Inada conditions, 161Individual-rationality constraint.

See Participation constraintInformationhidden, 207private, 151rent, 208, 212

Information set, 169

Page 367: Optimal Control Theory With Applications in Economics

354 Index

Information structure, 174, 196causal (nonanticipatory), 196delayed-control, 197delayed-state, 197Markovian, 197regular, 196sampled-observation, 197

Initial-value problem (IVP), 20initial condition, 20maximal solution, 29nominal, 36parameterized, 36representation in integral form, 31

Institutional design, 179Instrument, 209Integral curve of ODE, 20Integrating factor, 42Invariance, 61Invariance principle, 62Invariant set, 56, 59Inverse function theorem, 244Ioffe, Aleksandr Davidovich (1938–), 140Ironing, 224, 330Isidori, Alberto, 88Isoperimetric constraint, 7Isoperimetric problem, 6IVP. See Initial-value problem

Jacobi, Carl Gustav Jacob (1804–1851), 10,61, 242

Jacobian, 242Jain, Dipak C., 76Job market signaling, 153Jordan, Marie Ennemond Camille

(1838–1922), 51, 241Jordan canonical form, 51Jordan curve theorem, 241Jørgensen, Steffen, 203Jowett, Benjamin (1817–1893), 6

Kakeya, Soichi (1886–1947), 7Kakeya set, 7Kakutani, Shizuo (1911–2004), 251Kakutani fixed-point theorem, 251Kalman (Kálmán), Rudolf Emil (1930–), 13Kalman filter, 13Kamien, Morton I., 140Kant, Immanuel (1724–1804), 11Karp, Larry S., 203Karush, William (1917–1997), 249Karush-Kuhn-Tucker conditions, 249

Keerthi, S. Sathiya, 14Khalil, Hassan K. (1950–), 75Kihlstrom, Richard E., 203Kirillov, Alexandre Aleksandrovich

(1936–), 130, 158, 241, 243Kolmogorov, Andrey Nikolaevich

(1903–1987), 15, 252Koopmans, Tjalling Charles (1910–1985), 11Krasovskii, Nikolay Nikolayevich (1924–),

58Kreps, David Marc, 179, 188Kress, Rainer (1941–), 28Krishnan, Trichy V., 76Kryazhimskii, Arkady V., 115, 140Ktesibios of Alexandria (ca. 285

B.C.–222 B.C.), 11Kuhn, Harold William (1925–), 249Kumei, Sukeyuki, 44Kurian, George Thomas, 260Kurz, Mordecai, 15Kydland, Finn Erling (1943–), 203

Laffont, Jean-Jacques Marcel (1947–2004),216, 226, 227

Lagrange, Joseph-Louis (1736–1813), 8–10,24, 25, 101, 105, 248

Lagrangian, 249small, 107

Lagrange multiplier, 249Lagrange problem, 105Lagrange’s theorem, 248Lancaster, Kelvin John (1924–1999), 226Landau, Edmund Georg Hermann

(Yehezkel) (1877–1938), 39Landau notation, 39Lang, Serge (1927–2005), 243Laplace, Pierre-Simon (1749–1827), 73Laplace transform, 19, 73common transform pairs, 74inverse, 74properties, 74

LaSalle, Joseph Pierre (1916–1983), 62LaSalle invariance principle, 62Lavrentyev, Mikhail Alekseevich

(1900–1980), 15Law of parsimony, 11Lebesgue integration, 241Lebesgue point, 122Ledyaev, Yuri S., 15, 140Lee, E. Bruce, 256Lee, John M. (1950–), 67

Page 368: Optimal Control Theory With Applications in Economics

Index 355

Left-sided limit, 60Leibniz, Gottfried Wilhelm (1646–1716),

7–9, 11, 17Leizarowitz, Arie, 140Leng, S. B., 255Leonardo da Vinci (1452–1519), 11Levinson, Norman (1912–1975), 75L’Hôpital, Guillaume de (1661–1704),

8, 258L’Hôpital’s rule, 258Limit (of a sequence), 236Limit cycle, 61, 63Lindelöf, Ernst Leonard (1870–1946), 34Linear dependence, 231Linearization criterion, 54

generalized, 61generic failure, 56

Linear-quadratic differential game, 194,203, 302

Linear-quadratic regulator, 28, 93infinite-horizon, 119

Linear space, 234. See also Vector spaceLinear system

nilpotent, 73Liouville, Joseph (1809–1882), 69Liouville formula, 69Lipschitz, Rudolf Otto Sigismund

(1832–1903), 29Lipschitz constant, 32Lipschitz property, 29Locally Lipschitz, 30Logistic growth, 21Lorenz, Edward Norton (1917–2008), 63Lorenz oscillator, 63Lotka, Alfred James (1880–1949), 13, 44Lower contour set, 149Luenberger, David G. (1937–), 101, 115,

238, 252Lyapunov, Aleksandr Mikhailovich

(1857–1918), 3, 12, 47Lyapunov differential equation, 61Lyapunov equation, 53Lyapunov function, 3, 4, 19, 47, 49, 53, 93

Magaril-Il’yaev, Georgii Georgievich(1944–), 101

Mailath, Georg Joseph, 203Mangasarian, Olvi L. (1934–), 105, 111,

112, 249Mangasarian-Fromovitz conditions

(constraint qualification), 105, 249

Mangasarian sufficiency theorem, 112,113, 115

Market for lemons, 182Markov, Andrey Andreyevich

(1856–1922), 190Markovian Nash equilibrium, 189Markov-perfect equilibrium, 154Markov property, 154, 190Markus, Lawrence, 256Martini, Horst, 141Maskin, Eric Stark (1950–), 178, 227Mathematical notation, 231Matrix, 232determinant, 232eigenvalue, 233eigenvector, 233identity, 232Laplace expansion rule, 232nonsingular, 232positive/negative (semi)definite, 233rank, 232symmetric, 232transpose, 232

Matrix exponential, 73Matthews, Steve A. (1955–), 227Maupertuis, Pierre-Louis Moreau de

(1698–1759), 9Maupertuis’s principles. See Principle of

least actionMaximality condition, 96, 107, 110Maximizing sequence, 126Maxwell, James Clerk (1831–1879), 12Maxwell equation, 66Mayer, Christian Gustav Adolph

(1839–1907), 105Mayer problem, 105Mayne, David Quinn (1930–), 14Mayr, Otto, 11, 15Mazur, Barry (1937–), 127Mean-value theorem, 248Measurable selector, 104Mechanismallocation function, 216definition, 216design problem, 210direct (revelation), 216, 217first-best, 211instrument, 209message space, 215, 216second-best, 211type space, 209

Page 369: Optimal Control Theory With Applications in Economics

356 Index

Mechanism design, 2, 5, 181, 207Megginson, Robert E., 126Menu of contracts, 209Message space, 215, 216Michalska, Hannah, 14Milgrom, Paul Robert (1948–), 203, 250Milyutin, Alexey Alekseevich (1925–2001),

103, 106, 140, 141Minmax payoff, 177Mirrlees, James Alexander (1936–), 218,

226, 227Mishchenko, Evgenii Frolovich (1922–),

14, 15, 106, 114, 140Mixed strategy, 157Moler, Cleve B., 73Moore, John B., 140Moore, John Hardman, 227Mordukhovich, Boris Sholimovich,

141Morgenstern, Oskar (1902–1977),

14, 202Muehlig, Heiner, 74Murray, James Dickson (1931–), 76Musiol, Gerhard, 74Mussa, Michael L. (1944–), 224, 227Myerson, Roger Bruce (1951–), 216

Nalebuff, Barry (1958–), 177Nash, John Forbes (1928–), 1, 149, 152,

156, 203Nash equilibrium (NE), 1, 4, 149, 156

closed-loop, 190existence, 159Markov-perfect, 191mixed-strategy, 158open-loop, 190subgame-perfect, 4, 169, 172, 191supergame, 174

Nash reversion folk theorem, 178Nature (as player), 162, 180Nerlove, Mark Leon (1933–), 7Neumann, John von (1903–1957),

14, 202Newbery, David M., 203Newton, Isaac (1643–1727), 7, 8Nilpotent system, 73Node

nonterminal, 169stable, 48terminal, 166, 169, 172unstable, 48

Noncredible threat, 168, 169Nonlinear pricing, 224with congestion externality, 228

Nonsmooth analysis, 15, 140Nontriviality condition, 108, 133Normequivalence, 236properties, 235

Nullcline, 46, 79Numerical methods, 253direct, 256indirect, 253overview, 254

Objective functional, 89Observability, 84. See ContractabilityObservation/sample space, 174Occam, William of (ca. 1288–1348), 11Occam’s razor, 11OCP. See Optimal control problemOdd number of Nash equilibria, 161ODE. See Ordinary differential equationOlsder, Geert Jan (1944–), 203Olympiodorus the Younger

(ca. 495–570), 9One-parameter group action, 40One-shot deviation principle, 172infinite-horizon, 173

One-sided limit, 60Open-loop control, 12Open set, 236Optimal consumption, 97, 101, 147, 296Optimal control problem (OCP), 4, 89, 103,

189, 201, 221infinite-horizon, 113simplified, 108time-optimal, 141

Optimal screening contract, 221Ordinary differential equation (ODE), 2,

17, 20autonomous, 44, 47completely integrable, 42de-dimensionalization, 44, 77dependent variable, 17, 20exact, 41first integral, 42homogeneous, 22independent variable, 17, 20linear, 24homogeneous, 3, 24particular solution, 3, 24

Page 370: Optimal Control Theory With Applications in Economics

Index 357

representationexplicit, 20, 68implicit, 20, 41

separable, 21solution methods (overview), 26

Osmolovskii, Nikolaj Pavlovich (1948–),140

Output (variable), 83Output controllability, 88Output controllability matrix, 88Output function, 84

p-norm, 235Parameterized optimization problem,

249Pareto optimality, 179Pareto perfection, 179Partial differential equation (PDE), 17Partial fraction expansion, 259Participation constraint, 209, 220Pascal, Blaise (1623–1662), 1Payoff function, 149, 155, 169. See also

Utility functionaverage, 174quasilinear, 222

Peano, Giuseppe (1858–1932), 28, 71Peano-Baker formula, 71Pearl, Judea (1936–), 11Penetration pricing, 277Perfect Bayesian equilibrium (PBE),

181, 185Periodic orbit/trajectory. See Limit cyclePerron, Oskar (1880–1975), 6Petrovskii, Ivan Georgievich (1901–1973),

75Phase diagram, 44Picard, Charles Émile (1856–1941), 34Picard-Lindelöf error estimate, 34Planning horizon, 146Plateau, Joseph Antoine Ferdinand

(1801–1883), 10Plateau’s problem, 10Plato (ca. 427–347 B.C.), 6Player function, 169Player set, 155Poincaré, Jules Henri (1854–1912), 13, 41Poincaré-Bendixson theorem, 63Polak, Elijah, 257Pollack, Alan, 241Pontryagin, Lev Semenovich (1908–1988),

14, 15, 75, 82, 97, 106, 114, 140

Pontryagin maximum principle (PMP), 4,15, 83, 97, 107, 221

proof, 119simplified, 109

Positive limit set, 62Potential (function), 40computation, 42of linear first-order ODE, 43

Powell, Warren B., 258Prandtl, Ludwig (1875–1953), 63Prandtl number, 63Pratt, John W., 102Predator-prey system, 44–46, 61, 77, 78Predecessor function, 169Preference relation, 149quasilinear, 218

Preference representation. See Utilityfunction

Preimage, 241Preorder, 149Prescott, Edward Christian (1940–), 203Primbs, James Alan, 14Principleof least action, 9of least time, 9maximum (see Pontryagin maximum

principle)of optimality, 92, 255of plenitude, 11of stationary action, 9

Prisoner’s dilemma, 155, 156differential game, 199finitely repeated, 175infinitely repeated, 176

Product diffusion, 18PROPT (MATLAB toolbox), 257Pseudospectral methods, 257

Rademacher, Hans Adolph (1892–1969),246

Rademacher theorem, 246Radner, Roy (1927–), 203Radó, Tibor (1895–1965), 10Ramsey, Frank Plumpton (1903–1930), 11Rapoport, Anatol Borisovich (1911–2007),

177Rationality, 155Rayleigh (Third Baron). See Strutt, John

WilliamRayleigh number, 63Reachability, 84

Page 371: Optimal Control Theory With Applications in Economics

358 Index

Rectifiability theorem (for vector fields), 64Recursive Integration Optimal Trajectory

Solver (RIOTS), 257Region of attraction, 56

of an equilibrium, 56Reiter, Stanley (1925–), 227Renegotiation, 217Revelation principle, 216, 217Riccati, Jacopo Francesco (1676–1754), 27,

68, 94Riccati differential equation, 94, 303,

304Riccati equation, 27, 68, 119Right-sided limit, 60Riordan, Michael H., 203Risk aversion, 102

constant absolute (CARA), 102constant relative (CRRA), 102, 192

Roberts, Sanford M., 255Robertson, Ross M., 7Rochet, Jean-Charles, 227Rockafellar, R. Tyrrell, 141Rolle, Michel (1652–1719), 247Rolle’s theorem, 247Root of a polynomial, 233Rosen, Sherwin (1938–2001), 224, 227Rosenthal, Robert W. (1945–2002), 203Ross, I. Michael, 257Ross, William David (1877–1971), 6Routh, Edward John (1831–1907), 12Rudin, Walter (1921–2010), 252Russell, Bertrand Arthur William

(1872–1970), 11Rutquist, Per E., 257

Sachdev, P. L. (1943–2009), 75Saddle, 48Salvage/terminal value, 92Sample/observation space, 174Samuelson, Larry (1953–), 203Samuelson, Paul Anthony (1915–2009), 11Scalar product, 231Schechter, Martin, 252Schwartz, Adam Lowell, 257Schwartz, Jacob Theodore (1930–2009),

126, 252Schwartz, Nancy L., 140Schwarz, Hermann Amandus (1864–1951),

242Schwarz’s theorem, 242Screening, 215

Second-best solution. See Mechanism,design problem

Second-price auction, 152Segal, Ilya R., 250Seierstad, Atle, 140Semendyayev, Konstantin A., 74Sensitivity analysis, 39Sensitivity matrix, 39, 121Separatrix, 263Setbounded, 236closed, 236compact, 236dense, 104open, 236open cover, 236

Sethi, Suresh P., 7, 106, 140Set-valued mappingmeasurable, 104

Shakespeare, William (1564–1616), 207Shil’nikov, Leonid Pavlovich (1934–), 75Shipman, Jerome S., 255Shooting algorithm, 254Shutdowncomplete, 328no-shutdown condition, 326solution (of two-type screening

problem), 212Signal, 83Signaling, 153, 181Bayes-Nash equilibrium, 184perfect Bayesian equilibrium of, 185pooling equilibrium, 183separating equilibrium, 183

Sim, Y. C., 255Similarity transform, 51, 55Sliding-mode solutions, 139Smirnov, Georgi V., 75Smith, Lones (1965–), 179Smoothness condition (condition S), 106Šmulian, Vitold L’vovich (1914–1944), 126Sobolev, Sergei L’vovich (1908–1989), 243Sobolev space, 243Social welfare, 142, 145, 214Socrates (469–399 B.C.), 6Soltan, Petru S., 141Solution to ODE, 20Sonnenschein, Hugo Freund, 168Sontag, Eduardo Daniel (1951–), 87,

88, 140Sorger, Gerhard, 203

Page 372: Optimal Control Theory With Applications in Economics

Index 359

Sorting condition, 218Spence, Andrew Michael (1943–), 153, 181,

203, 218Spence-Mirrlees condition, 218s.t. (subject to), 231Stability analysis, 45

linearization criterion, 54Stackelberg, Heinrich (Freiherr von)

(1905–1946), 197Stackelberg Leader-Follower Game, 197State constraints, 105State control constraint

regularity, 104State equation, 89State space, 39, 84State space form, 83State transition matrix, 69Stationary point. See Equilibrium (point)Steady state. See Equilibrium (point)Steiner, Jakob (1796–1863), 6Stephens, Philip A., 79Stern, Ron J., 15, 140Stiglitz, Joseph Eugene (1943–), 181Stochastic discounting, 176Stokey, Nancy Laura (1950–), 168Stole, Lars A., 227Strang, William Gilbert (1934–), 251Strange attractor, 63Strategy

behavior, 184Markovian, 190mixed, 157, 159open-loop vs. closed-loop, 154, 190, 191trigger, 154

Strategy profile, 155augmented, 175grim-trigger, 176mixed, 174

Structural stability, 19, 75Strutt, John William (1842–1919), 63Struwe, Michael (1955–), 10Subgame, 169, 190

perfection, 4, 169, 172, 190–192Subgame-perfect folk theorem, 178Subramaniam, V., 255Subspace, 235Successive approximation, 34, 240Successor function, 169Supergame, 173. See also Game, repeated

history, 174Nash equilibrium, 174

Superposition, 25Sutherland, William J., 79Sutton, Richard S., 258Sydsaeter, Knut, 140System, 83autonomous (see System, time-invariant)controllable, 84discrete-time, 83observable, 84reachable, 84state space representation, 83time-invariant, 18, 84

System function, 17, 18, 48, 83System matrix, 51

Tautochrone, 8Taxation principle, 220Taylor, Angus Ellis (1911–1999), 131Taylor, Brook (1685–1731), 61Terminal/salvage value, 92Theorem �, 44Thom, René Frédéric (1923–2002), 75Thomas, Ivor (1905–1993), 6Thompson, Gerald Luther (1923–2009),

140Tikhomirov, Vladimir Mikhailovich

(1934–), 101, 140, 246, 252Tikhonov, Andrey Nikolayevich

(1906–1993), 28Time consistency, 190, 191Time derivative, 17Time inconsistency, 168Time-optimal control problem, 141Time t history, 175augmented, 175

Tirole, Jean Marcel (1953–), 202Total derivative, 242Transversal (of autonomous system), 63Transversality condition, 96, 107, 110Trigger strategy, 154, 199Tromba, Anthony J., 15Tucker, Albert William (1905–1995),

249Turnpike, 115Type, 208Type space, 5, 162, 166, 182, 209, 220Tzu, Sun (ca. 500 B.C.), 149

U -controllable states, 87Ultimatum bargaining, 150Uniform boundedness, 242

Page 373: Optimal Control Theory With Applications in Economics

360 Index

Uniform convergence (⇒), 242Upper contour set, 149Upper semicontinuity, 251Utility function, 149

quasilinear, 218

Value function, 4, 13–15, 82, 97, 119, 140Vandenberghe, Lieven, 252Van Loan, Charles F., 73Van Long, Ngo, 203Variation-of-constants method, 24Vectogram, 136Vector field, 39Vector space, 234

normed, 235complete (Banach space), 237

Verifiability. See ContractabilityVerri, Pietro (1728–1797), 7Vickson, Raymond G., 106Vidale, Marcello L., 25, 118Vidale-Wolfe advertising model, 47,

118Vinter, Richard B., 15, 140Virgil (Publius Virgilius Maro)

(70–19 B.C.), 6Virtual surplus, 223Volterra, Vito (1860–1940), 13, 44

Walras, Léon (1834–1910), 10Walter, Wolfgang (1927–), 75Warder, Clyde Allee (1885–1955), 79Warga, Jack (1922–), 140Watt, James (1736–1819), 12Weber, Robert J. (1947–), 203Weber, Thomas Alois (1969–), 118, 140,

141, 227Weierstrass, Karl Theodor Wilhelm

(1815–1897), 6, 15, 141, 236, 237, 246Weierstrass theorem, 246Weitzman, Martin L. (1942–), 11Welfare. See Social welfareWell-posed problems, 28–39Wets, Roger J.-B., 141Wiener, Norbert (1894–1964), 12Wiggins, Stephen Ray, 75Willems, Jan C. (1939–), 8Williams, Steven R. (1954–), 227Wilson, Robert Butler (1937–), 161, 167,

168, 227Wilson’s oddness result, 161Wolenski, Peter R., 15, 140

Wolfe, Harry B., 25, 118Wronskian (determinant), 69

Young, William Henry (1863–1942), 101Young-Fenchel transform (dual), 101

Zenodorus (ca. 200–140 B.C.), 6Zermelo, Ernst Friedrich Ferdinand

(1871–1953), 172Zorich, Vladimir Antonovich (1937–), 42,

242, 244, 252