Signaling with Private Monitoring - web.mit.eduweb.mit.edu/gcistern/www/spm.pdf · Signaling with Private Monitoring Gonzalo Cisternas and Aaron Kolb August 10, 2019 Preliminary The

Signaling with Private Monitoring∗

Gonzalo Cisternas and Aaron Kolb

August 10, 2019

Preliminary

The most recent version of this paper can be found at

http://web.mit.edu/gcistern/www/spm.pdf

Abstract

We examine linear-quadratic signaling games between a long-run player that has a

normally distributed type and a myopic player who privately observes a noisy signal

of the long-run player’s actions. An imperfect signal of the myopic player’s behavior

is publicly observed, and thus there is two-sided signaling. Time is continuous over a

finite horizon, and the noise is Brownian. We construct linear-Markov equilibria using

the players’ beliefs up to the second order as states. In such equilibria, the long-run

player’s second-order belief is controlled, reflecting that past actions are used to fore-

cast the continuation game. Via this higher-order belief channel, the informational

content of the long-run player’s action is not only driven by the weight attached to her

type, but also by how aggressively she has signaled in the past. Applications to models

of leadership, reputation, and trading are examined.

Keywords: signaling; private monitoring; private beliefs; learning; Brownian motion.

JEL codes: C73, D82, D83.

∗Cisternas: MIT Sloan School of Management, 100 Main St., Cambridge, MA 02142, [email protected]: Indiana University Kelley School of Business, 1309 E. Tenth St., Bloomington, IN [email protected]. We thank Alessandro Bonatti, Robert Gibbons and Vish Viswanathan for usefulconversations.

1

http://web.mit.edu/gcistern/www/spm.pdf

1 Introduction

Signaling—that is, the strategic transmission of private information through actions—plays

a central role in many economic interactions. In labor markets, for instance, workers’ edu-

cational choices can credibly convey information about their hidden abilities (Spence, 1973).

In financial markets, signaling influences how quickly market prices incorporate the infor-

mation privately possessed by traders with market power (Kyle, 1985). In organizations, it

offers a productivity-enhancing rationale for leadership by example (Hermalin, 1998).

In all these settings, “receivers” must extract the informational content of the signals gen-

erated by those who hold superior information: employers must interpret a college degree

as a signal of ability; investors must interpret financial data to learn about others’ informa-

tion; followers must interpret leaders’ actions to decide how to follow, and so on. In most

studies of signaling thus far, however, the signals involved are either perfect and/or public,

or private but restricted to static settings, thereby making any such an analysis public: at

the moment they have to move, senders know their receivers’ beliefs. Crucially, while such

public analyses can be appropriate in some settings (e.g., public academic credentials can be

highly informative signals), they are less so in others.

Indeed, imperfect private signals are pervasive: they naturally appear when employers

subjectively assess workers’ performances (MacLeod, 2003; Levin, 2003); when brokers han-

dle orders of retail investors (Yang and Zhu, 2018); or when data brokers collect data from

consumers’ online behavior (Bonatti and Cisternas, 2019), for instance. The presence of id-

iosyncratic noise then renders the inferences made by receivers private, raising a fundamental

question: how do learning and signaling in repeated interactions play out when those who

hold payoff-relevant information do not know what others have seen?

In this paper, we make progress in this direction by examining a player’s signaling in-

centives in settings where her actions generate signals that are hidden to her. Specifically, a

long-run player (she) of a normally distributed type interacts with a myopic player (he) over

a finite horizon. The myopic player privately observes a noisy signal of the long-run player’s

actions, while the long-run player can learn about the myopic player’s private inferences

from an imperfect public signal of the myopic player’s behavior. The players’ preferences are

linear-quadratic and the noise is Brownian. Using continuous-time methods, we construct

linear Markov perfect equilibria (LME) using the players’ beliefs as states.

The games we study feature one-sided incomplete information and imperfect private

monitoring. Consider a leader of an organization interacting with a follower (or many of

them, acting in coordination). The organization’s payoff increases with both the proximity

of the leader’s action to a newly realized state of the world (adaptation) and the proximity of

2

the follower’s actions to the leader’s action (coordination). Moreover, the follower attempts

to match the leader’s action at all times. The environment is, however, complex, in the sense

that the leader cannot immediately convey the state of the world to the follower: the latter

learns it only gradually by subjectively evaluating the leader’s actions. In turn, the leader

receives feedback of the follower’s inferences through a public signal of the follower’s actions.

Due to the private monitoring, the follower’s belief is private, and hence both parties have

private information to signal to one other. In addition, the leader is forced to use her past

actions to estimate the follower’s belief. This forecast—the leader’s second-order belief—is

itself private, even along the path of play, as the leader conditions her actions on the state of

the world. How does the leader then manage the transition of the organization to the new

state of the world accounting for this higher-order uncertainty? What are the implications

for learning and payoffs, and hence for the value of better information channels which reduce

higher-order uncertainty?

Economic forces. We construct LME using beliefs up to the long-run player’s second

order as states. This second-order belief is controlled by the long-run player, reflecting that

she uses past play to forecast the continuation game. The well-known problem of the state

space growing due to the myopic player attempting to forecast such private belief is then

circumvented by a key representation lemma (Lemma 2) that expresses the (candidate, on

path) second-order belief as a convex combination of the long-run player’s type and the belief

about that type based on public information exclusively. This representation reflects how

the long-run player calibrates her belief using the public information when learning about

the myopic player’s belief. This “public” state is therefore part of the set of belief states,

and is affected by the myopic player.

Because different types take different actions in equilibrium, and actions are used to

forecast the myopic player’s belief, different types also perceive different continuation games

as measured by their second-order beliefs. This creates a novel history-inference effect,

whereby the sensitivity of the long-run player’s action to her type—which determines the

myopic player’s learning—is comprised not only of the direct weight her strategy places on

her type, but also by how aggressively she has signaled in the past via the myopic player’s

inference of the long-run player’s private history. This effect compounds over time as the

second-order belief reflects more the long-run player’s type, and its amplitude is decreasing in

the quality of the public signal: shutting down the public signal (no feedback case) maximizes

the potential reliance on the type, while making the public signal noiseless (perfect feedback)

eliminates this dependence. These extreme cases are exploited in the applications we study.

3

Applications. In Section 2, we illustrate the main economic insights of the paper by exam-

ining a game in which a leader must adapt an organization to a new economic environment

while controlling the coordination costs with a myopic follower who tries to match her action.

To accommodate to the follower, the leader’s action is less sensitive to her type (i.e.,

achieves less adaptation) than in the full-information benchmark; but successful accommo-

dation requires knowing the follower’s belief. Critically, because higher types take higher

actions due to their stronger adaptation motives, they also expect their followers to have

higher beliefs—the coordination motive leads higher types to take higher actions via the

history-inference channel. In the absence of feedback, therefore, standard decreasing adap-

tation incentives that fully determine signaling and learning when beliefs are public are then

offset by higher-order belief effects that make the leader’s signaling increasing over time.

This qualitatively different signaling behavior has important consequences on learning

and payoffs, and hence on the value of better information channels within the organization.

In the extreme cases of a myopic and a patient leader, we show that the follower’s overall

learning from the interaction is always higher when the leader receives no feedback than when

the public signal is perfect, a consequence of stronger overall adaptation in the former case.

Learning is, however, a measure of the organization’s struggle: learning occurs only when the

private signal is informative of the state of the world, and hence only when miscoordination

occurs along the way. The stronger signaling that arises in the no-feedback case is then more

decisive when the leader is impatient. Specifically, the history-inference effect substitutes for

the lack of adaptation of a myopic leader, thereby reducing the added value of a noiseless

public signal relative to the no-feedback case as the degree of impatience increases.

In Section 5 we explore two applications based on extensions of our model. In the first,

the type is a bias, and the long-run player wants to preserve a reputation for neutrality,

modeled via a terminal quadratic loss in the myopic player’s belief (e.g., a politician facing

reelection). Clearly, eliminating the public signal has a negative direct effect on the long-run

player’s payoff (increased uncertainty in a concave objective). Since higher types take higher

actions due to their larger biases, however, those types must offset higher beliefs to appear

unbiased; the history-inference effect is then negative, which weakens signaling and hence

the sensitivity of the myopic player’s belief, potentially leading to higher payoffs.

Finally, we exploit the presence of the public belief state in a trading model in which

an informed trader faces both a myopic trader that privately monitors her orders and a

competitive market maker who only observes the public total order flow. In this context,

we show that there is no linear Markov equilibrium for any degree of noise of the private

signal. Intuitively, the myopic player introduces momentum into the price process, as the

information he obtains now gets distributed to the market maker through all future order

4

flows. This causes prices to move against the insider and creates urgency, leading the insider

to trade away all information in the first instant.

Technical contribution. The setting we examine is asymmetric, in terms of the players’

preferences and their private information (a fixed state versus a changing one). In particular,

the players can signal at substantially different rates, which is in stark contrast to a small lit-

erature on symmetric multi-sided learning (see the literature review section). With different

rates of learning, however, the equilibrium analysis can become severely complicated.

Specifically, the belief states we construct depend on two functions of time: (1) the

myopic player’s posterior variance, which determines the sensitivity of the myopic player’s

belief to her private signal, and (2) the weight attached to the long-run player’s type in the

representation result, which captures the contribution of the history-inference effect to the

long-run player’s signaling. Standard dynamic-programming arguments reduce the problem

of existence of LME to a boundary value problem (BVP) that these two functions, along with

the weights in the long-run player’s linear strategy, must satisfy. The two learning ordinary

differential equations (ODEs) endow the BVP with exogenous initial conditions, while the

rest carry endogenous terminal conditions arising from myopic play at the end of the game.

Determining the existence of a solution to such a BVP is challenging because it involves

multiple ODEs in both directions. For this reason, we establish two sets of results. In a

private value environment, the myopic player’s best response does not directly depend on

his belief about the long-run player’s type, but only indirectly via his expectation of the

latter player’s action. In that context, we show that there is a one-to-one mapping between

the solutions to the learning ODEs (Lemma 5), a consequence of the ratio of the signaling

coefficients being constant. This, in turn, makes traditional shooting methods based on the

continuity of the solutions applicable. Via this method, we show the existence of LME in the

leadership model of Section 2 when the public signal is of intermediate quality for horizon

lengths that are decreasing in the prior variance about the state of the world (Theorem 1).

In common value settings, the multidimensionality issue seems unavoidable. Building on

the literature on BVPs with intertemporal linear constraints (Keller, 1968), however, we can

show the existence of LME via the use of fixed-point arguments applied to our BVP with

intratemporal nonlinear (terminal) constraints. Specifically, the multidimensional shooting

problem can be reformulated as one of the existence of a fixed point for a suitable function

derived from the BVP, which we then tackle for a variation of the leadership model when

the follower also cares about matching the state of the world (Theorem 2). Critically, the

method is general—it applies to the whole class of games under study, and opens a way for

examining other settings exhibiting learning and asymmetries.

5

Related Literature Static noisy signaling was introduced by Matthews and Mirman

(1983) in a limit pricing context, and further studied by (Carlsson and Dasgupta, 1997)

as a refinement tool. Recent dynamic analyses involving Gaussian noise and public beliefs

include Dilme (2019), Gryglewicz and Kolb (2019), Kolb (2019) and Heinsalu (2018).1

Multisided signaling has been examined by Foster and Viswanathan (1996) and Bonatti

et al. (2017) in symmetric settings with imperfect public monitoring and dispersed fixed

private information. In those settings, beliefs are private, but the presence of a commonly

observed public signal permits a representation of first-order beliefs that eliminates the need

for higher-order ones.2 Bonatti and Cisternas (2019) in turn examine two-sided signaling in

a setting where firms privately observe a summary statistic of a consumer’s past behavior to

price discriminate. Via the prices they set, however, firms perfectly reveal their information

to the consumer.

The literature on repeated games with private monitoring is extensive, and has largely

focused on non-Markovian incentives—Ely et al. (2005) and Horner and Lovo (2009) (the

latter allowing for incomplete information) study equilibria in which inferences of others’

private histories are not needed, and Mailath and Morris (2002) and Horner and Olszewski

(2006) study almost public information structures. Levin (2003) and Fuchs (2007) examine

one-sided private monitoring in repeated principal-agent interactions.

Regarding our applications, the stage game of our leadership model is a simplified version

of Dessein and Santos (2006).3 In turn, the value of public information has been studied

by Morris and Shin (2002), Angeletos and Pavan (2007), and Amador and Weill (2012)

in settings with infinitesimal players, thus rendering signaling and inferences of individual

private histories unnecessary. Regarding trading models, Yang and Zhu (2018) show, in a

richer two-period version of our model, that a linear equilibrium ceases to exist if a signal of

an informed player’s last trade is too precise and privately observed by another player.

To conclude, this paper contributes to a growing literature employing continuous-time

techniques to the analysis of dynamic incentives. Sannikov (2007) examines two-player games

of imperfect public monitoring; Faingold and Sannikov (2011) reputation effects with behav-

ioral types; Cisternas (2018) off-path private beliefs in games of ex ante symmetric uncer-

tainty; and Horner and Lambert (2019) information design in career concerns settings.

1This last paper also displays a normally distributed type, but it lacks strategic interdependence betweenactions. Thus, behavior is unchanged under some information structures involving private monitoring.

2Likewise in He and Wang (1995), where infinitely many agents privately see dynamic exogenous signals.3See Bolton and Dewatripont (2013) for such a static analysis with one round of pre-play communication.

More generally, these are instances of the linear-quadratic team theory of Marschak and Radner (1972).

6

2 Application: Leading Coordination and Adaptation

Economists have long understood that adaptation to changes in the external economic en-

vironment is a key problem for organizations (e.g., Simon, 1951), and that successful adap-

tation often requires substantial coordination of activities within them.4 Since at least

Radner (1962), however, it has been recognized that coordination is threatened by the pres-

ence of different decision-makers and sources of information. As Williamson (1996) further

points out, “failures of coordination can arise because autonomous parties read and re-

act to signals differently, even though their purpose is to achieve a timely and compatible

combined response.” Private signals—a consequence of either private sources or subjective

interpretations—therefore play an integral role in organizations’ ability to adapt to change.

The study of the adaptation-coordination trade-off has lead to important insights regard-

ing the returns to specialization (Dessein and Santos, 2006), centralization (Alonso et al.,

2008), and governance structures (Rantakari, 2008), but the great majority of these analyses

have been static. In particular, the central question of how information about the economic

environment is gradually transmitted and reflected in decision-making has been much less

explored.5 The difficulty in analyzing learning dynamics in organizations while accounting

for idiosyncratic shocks and/or private information is apparent: to appropriately coordinate

their actions, individuals must forecast what other members know.

In this application, we examine how a leader—a member of an organization with crucial

information about the economic environment and the opportunity to influence others—

manages the dynamics of adaptation and coordination when a follower privately monitors

her actions. For instance, top management wishes to adapt its strategy to a shift in the

market fundamentals, but it suffers from imperfect control: the information about such

changes trickles down the organization through various layers before reaching key productive

divisions. Or consider an expert who leads by example to transmit a technique or skill—an

intangible activity or knowledge—to an apprentice, and the latter subjectively evaluates the

expert’s actions. In both cases, the economic ‘fundamentals’ are learned only gradually by

the receiver, and the sender does not directly observe what the receiver has seen.

Specifically, consider the following game inspired by the team theory of Marschak and

4The literature on the topic is extensive. Refer, for instance, to Chapter 4.2 in Williamson (1996) andChapter 4 in Milgrom and Roberts (1992).

5Marschak (1955), in discussing team theory as a framework for examining organizations, makes the caseclear (p. 137): “A realistic theory of teams would be dynamic. It takes time to process and pass messagesalong a chain of team members; and messages must include not only information on external variablesbut also information on what has been done by [others...] Knowledge about [probabilities and payoffs] isacquired gradually, while the team already proceeds with decisions. These facts make the dynamic teamproblem similar to those in cybernetics and in sequential statistical analysis.”

7

Radner (1972). A team consisting of a leader (she) and a follower (he) operates in an

environment parametrized by a state of the world θ ∼ N (µ, γo). The team’s payoff is

ˆ T

0

e−rt{−(at − θ)2 − (at − at)}dt, (1)

where at denotes the leader’s action at time t, at the follower’s counterpart, r ≥ 0 is a

discount rate, and T < ∞. Thus, performance increases with the proximity of the leader’s

action to the state of the world (adaptation) and with the proximity of both players’ actions

(coordination). Such actions can take values over the whole real line.

We depart from Marschak’s and Radner’s approach to modeling teams by allowing diver-

gence in preferences. Specifically, we assume that the leader’s preferences coincide with the

team’s payoff, while the follower is myopic trying to minimize (at− at)2 at all times t ∈ [0, T ].

The leader knows the realized value of θ, while the follower only knows its distribution.

As time progresses, however, the follower privately observes an imperfect signal of the form

dYt = atdt+ σY dZYt ,

where ZY is a Brownian motion and σY > 0 a volatility parameter. In particular, immediate

adaptation at no coordination costs via a perfectly revealing action is not possible.

In turn, the leader can learn about what the follower has done (and, hence, about what

the follower has seen) from

dXt = atdt+ σXdZXt ,

where ZX is independent of ZY . This signal is public; for instance, an output measure

observed by both parties, or information prepared by the follower.

In this context, our goal is twofold. First, to understand how private monitoring affects

the players’ behavior. Second, to assess the value of better information channels for the team,

which is an issue of central importance for the performance of organizations. To achieve both

objectives in a unified way, we thus fix σY > 0 and focus on the cases of σX = 0 and +∞.

Specifically, to understand how the presence of a noisy private signal affects the players’

behavior, it is useful to examine the benchmark in which Y—and hence, the follower’s

belief—is public; an indirect approach for studying this case is by setting σX = 0, to the

extent that the follower’s action reveals his belief at all times. On the other hand, to assess

the value of better bottom-up information systems (as measured by lower values of σX),

it is natural to consider the baseline case in which X is absent, which is equivalent to

setting σX = ∞. Reductions in σX have the appealing interpretation of being the result

of interventions intended to improve the information that leaders receive from within the

8

organization (which can be important if leaders are busy with other activities). As it turns

out, the extreme cases of σX to be discussed deliver the sharpest economic intuitions.

2.1 Perfect Feedback (“Public”) Case: σX = 0

When the public signal has no noise, the observation of the follower’s action opens the

possibility of the leader perfectly inferring the follower’s belief. Thus, we aim to characterize

a linear Markov equilibrium (LME) of the form

at = β0t + β1tMt + β3tθ and at = Et[at] = β0t + (β1t + β3t)Mt (2)

where Mt := Et[θ], and βit, i = 0, 1, 3 are functions of time satisfying β1t+β3t 6= 0, t ∈ [0, T ].6

The deterministic feature of the candidate equilibrium coefficients is explained shortly; the

last condition permits identifying the follower’s belief from his observed action.7

From standard results in filtering theory, if the follower expects (at)t≥0 as in (2), then

dMt =β3tγtσ2Y

[dYt − {β0t + (β1t + β3t)Mt}︸︷︷︸Et[at]=

dt], where γt = −(γtβ3t

σ2Y

)2

. (3)

That is, the follower updates upwards whenever the observed increment, dYt, is larger than

the follower’s expectation of it, Et[dYt] = [β0t + (β1t + β3t)Mt]dt. Moreover, the intensity of

the reaction is larger the more aggressively the leader signals the state (i.e., a larger β3t),

and the less is known about the latter (i.e., a larger γt). Finally, learning is deterministic due

to the Gaussian structure, and is faster the stronger the intensity of the leader’s signaling.

The leader’s problem is to maximize (1) subject to (3), recognizing that she affects M

via dYt = atdt + σY dZYt . The next result establishes the existence of a LME, along with

some properties that any such equilibrium should satisfy:

Proposition 1 (LME—Public Case). For all r ≥ 0 and T > 0:

(i) Existence: there exists a LME. In any such equilibrium at = β3tθ + (1− β3t)Mt.

(ii) Signaling coefficient: β3t ∈ (1/2, 1) for t < T , β3T = 1/2, and β3 is strictly decreasing.

Recall that the full information benchmark is simply at = at = θ at all times. From this

perspective, the leader sacrifices adaptation (i.e., β3 < 1) to be able to coordinate with the

6We skip subindex 2 to be consistent with the general model presented in Section 3, where we completethe notion of linear Markov equilibrium with an additional state variable.

7More precisely, a LME as in (2) is perfect when Y is public, but only Nash when Y is private butσX = 0, as the continuity of the paths of M makes deviations by the myopic player observable in this lattercase. Due to the full-support noise, however, this distinction is vacuous in discrete time.

9

follower; in particular, the coefficient on M , β1 = 1− β3, must be positive, as higher values

of M then require higher actions by the leader.8

The leader’s incentives to sacrifice adaptation are weaker the farther the team is from the

end of the interaction. In fact, in equilibrium, dMt ∝ γtβ3t[θ−Mt] at all times, and so stronger

adaptation today brings—via signaling—more coordination tomorrow. This dynamic incen-

tive decays (deterministically) because there is less time to enjoy future coordination and

beliefs are less responsive as learning progresses. The terminal value simply reflects that

static equilibrium behavior (a, a) = (12θ + 1

2M, M) arises at the end of the interaction.

2.2 No Feedback Case: σX =∞

When the public signal is uninformative, the leader must perform a non-trivial inference of

the follower’s belief to correctly assess how her actions affect future payoffs (i.e., to correctly

assess the continuation game). This is, in turn, an exercise of inference of private histories.

Forecasting by input. In the public case the leader’s past actions were immaterial for

inferring the follower’s contemporaneous belief, as the latter was fully determined by the

realized history Y t—i.e., the leader forecasted by output. In fact, since (3) is linear, we have

Mt = A1(t) +

ˆ t

0

A2(t, s)dYs,

for some deterministic functions A1 and A2. The ability to observe Y t implies that the leader

always computes her forecast as above, with larger effort profiles only indicating that the

corresponding shocks ZY were lower, and vice-versa.

In the absence of feedback, the leader does not observe M . Thus, her second-order belief,

Mt := Et[Mt], is a payoff-relevant state. Further, as long as M is as above (for potentially

different functions A1 and A2), and using that Et[dYt] = atdt, the leader’s forecast reads

Mt = A1(t) +

ˆ t

0

A2(t, s)asds. (4)

Unlike the public case, therefore, the leader now forecasts by input : the more effort she has

exerted in, say, pushing the follower’s belief upwards towards θ from a low prior, the higher

she thinks the current value of M is.

8That β0t = 0 and β1t + β3t = 1 hold at all times in this dynamic setting can be understood fromthe leader’s incentives at Mt = θ: in this case, there is no coordination loss (at = at), and the objective

then becomes, locally, one of minimizing the adaptation cost; but since Et[dMt] = 0 (i.e., M is locallyunpredictable), there are no incentives to move away from at = θ. Thus, β0 + (β1 + β3)θ = θ for all types.

10

This contrast between the public and no-feedback cases is natural: in the absence of

any additional information, an expert must rely on how much emphasis she has given to a

particular idea or technique to assess how much the apprentice has assimilated the latter.

(By contrast, in the public case, the apprentice’s output signal suffices for perfectly inferring

his understanding of the topic.) This dependence of Mt on the past history of play in the

no-feedback case reflects the well-known idea that, in games of private monitoring, players

must rely on their past behavior to forecast others’ private histories; yet, it has important

effects on equilibrium outcomes.

Representation of second-order belief and history-inference effect. Observe that

M is hidden to the follower: off the path of play, because deviations go undetected; and in

equilibrium, because the leader’s action carries her type. Along the path of play of a linear

strategy, however, one would expect a linear relationship between θ and M , as the linearity

of (4) suggests. When this is the case, the follower’s (third-order) inference of M would then

reduce to a function of M , and the system of beliefs would “close.”

To this end, suppose that the follower expects that, in equilibrium, M satisfies

Mt =

(1− γt

γo

)θ +

γtγoµ (5)

when the leader follows a strategy

at = β0tµ+ β1tMt + β3tθ, (6)

for some deterministic coefficients βit, i = 0, 1, 3 (potentially different from those in the public

case). The representation (5) encodes two ideas. First, there is no second-order uncertainty

at time zero, i.e., M0 = µ = M0; this is obtained by setting γ0 = γo in the right-hand side

of (5). Second, if enough signaling has taken place, the leader would expect the follower to

have learned the state: γt ≈ 0 in the same expression leads to Mt ≈ θ.

How is the follower’s learning, γt, now determined? To simplify notation, let us use

χ := 1− γ

γo

to denote the weight on the type in (5). Inserting this into (6) yields at = [β0t + β1t(1 −χt)]µ + [β3t + β1tχt]θ, and so the new signaling coefficient is no longer given by the weight

that the equilibrium strategy directly attaches to the type, β3, but instead by

α ≡ β3 + β1χ.

11

We refer to β1χ as the history-inference effect on signaling. In fact, because the leader

uses her actions to forecast M , the follower needs to infer the leader’s private histories to

extract the correct informational content of the signal Y . However, since higher types take

higher actions due to their static adaptation and future coordination motives—forces that

fully determine behavior in the public case—those types also expect their followers to have

higher beliefs. Alternatively, given a history Y t, consider the impact that a marginal increase

in θ has on the leader’s action: in the public case, the overall effect is β3, as all types agree

on the value that M takes; this is not the case when there is no feedback, as different types

perceive different continuation games via M . We collect these ideas in the next result.

Lemma 1 (Belief Representation). Suppose that the follower expects at = [β0t+β1t(1−χt)]µ+

αtθ, where α = β3 + β1χ, χ = 1 − γ/γo, and γt := Et[(θt − Mt)2]. Then, γ = −

(γtαtσ2Y

)2

.

Moreover, if the leader follows (6), Mt = χtθ + (1− χt)µ holds at all times.

The representation of the second-order belief (5) holds only under the linear strategy (6).

More generally, the leader controls M as reflected by (4), and thus (θ,M, µ, t) effectively

summarizes all the payoff-relevant information for the leader’s decision-making.

Proposition 2 (LME—No Feedback Case). For all r ≥ 0 and T > 0:

(i) Existence: there exists a LME. In any such an equilibrium: β0 +β1 +β3 = 1; β3t > 1/2,

t ∈ [0, T ); β3T = 1/2; and β1 > 0 over [0, T ].

(ii) Signaling coefficient: α > 1/2; αT → 1 as T → ∞, and α′t ≥ 0, t ∈ [0, T ), with strict

inequality if and only if r > 0.

Thus, in the no-feedback case, the signaling coefficient α has a radically different behavior

relative to the public case counterpart: it is non-decreasing, and its right-end point becomes

asymptotically close to 1 as the length of the interaction increases. See Figure 1.

2 4 6 8 10t

0.55

0.60

0.65

0.70

0.75

0.80

β3 Publicβ3 NFα NF

β3 Public

β3 NF

α NF

0 2 4 6 8 10t

0.55

0.60

0.65

0.70

0.75

0.80

0.85

β3 Public

β3 NF

α NF

Figure 1: Left r = 0; Right: r = 1. Other parameter values: γo = 1, σY = 1.5, T = 10.

12

The reason behind the discrepancy lies in the history-inference effect compounding over

time. In fact, since the leader expects the follower to gradually learn the state as signaling

progresses, M attaches an increasingly higher weight χ to θ in (5). With a positive coordi-

nation motive (β1 > 0), this implies that higher types take higher actions over time via this

second-order belief channel. We conclude that private monitoring generates an interesting

phenomenon whereby standard monotonically decreasing signaling effects under public be-

liefs are more than offset by an increasingly strong informational content in the leader’s past

history of play (except for the r = 0 case, where both forces perfectly offset each other).

2.3 Learning, Coordination, and the Value of Public Information

The fact that the leader has to rely on her private information to coordinate with the follower,

and that this force reinforces the direct signaling effect coming from β3, opens the possibility

for more information to be transmitted in the no-feedback case. At least, αT > βPub3T = 1/2

for all T > 0, so more signaling indeed takes place by the end of the game.

To assess the validity of this conjecture, we take advantage of the model’s analytic so-

lutions in the patient (r = 0) and myopic (r = ∞) cases. Let γPub and γNF denote the

follower’s posterior variance in the public and no-feedback case, respectively.

Proposition 3 (Learning comparison). For every T > 0:

(i) Patient case: if r = 0, βPub30 > α0 and γPubT > γNFT ;

(ii) Large r case: for every δ ∈ (0, T ), γPubt > γNFt for t ∈ [T − δ, T ] if r large enough.

0 1 2 3 4 5r

0.30

0.35

0.40

0.45

γT NF

γT Public

Figure 2: Terminal values of γPub and γNF , Parameter values: γo = σY = 1, and T = 4.

Consequently, when the leader is either patient or very impatient, in the no-feedback case

the follower always has a more precise knowledge of the state of the world by the end of the

interaction. In this line, part (i) says that, if the leader is patient, this result is non-trivial due

to an inter-temporal substitution effect: the leader, anticipating that the history-inference

13

effect will eventually take place, decides to reduce α0 = βNF30 below the public counterpart,

βPub30 . Part (ii) then states that the fraction of time over which the follower has a more

accurate belief can converge to 1 as r grows large. Figure 2 shows, albeit numerically, that

learning is higher in the no-feedback case for intermediate values of r.

One may feel tempted to conjecture that the organization can be better off by isolating

the leader from any information about the follower, as this fosters the latter’s learning.

The caveat is that information transmission happens through actions: a more precise belief

that is the result of more aggressive signaling is necessarily the reflection of more transient

miscoordination, as learning occurs only when Y is informative about the state of the world.

Proposition 4 (Team’s ex ante payoffs—public vs. no-feedback).

(i) Patient case: if r = 0, the team’s ex ante payoff is larger in public case for all T > 0.

(ii) Large r case: there is T > 0 such that, for all T > T , the team’s ex ante flow payoffs

are larger in the no feedback case over [T , T ] for r sufficiently large.

In the patient case, the team is unequivocally better off in the public case. In particular,

one can show that ex ante coordination costs satisfy

0 < Eθˆ T

0

[βpub3t (θ− Mt)]2dt = −σ2

Y log

(γPubT

γo

)< −σ2

Y log

(γNFTγo

)= Eθ

ˆ T

0

[αt(θ− Mt)]2dt.

Thus, the extent of the follower’s learning is effectively a measure of the total coordination

costs incurred by the team, which are larger in the no-feedback case.9 Consequently, an

important takeaway of our analysis is that not accounting for the specific features of the in-

formation channels within firms, but instead just focusing an outcome measure such terminal

learning γT , can be very misleading in terms of assessing past (or even future) performance:

a better understanding of the economic environment can in fact be the reflection of a painful

struggle to coordinate actions. Alternatively, our analysis uncovers how affecting an infor-

mation channel that does not feed a follower can affect his learning via the strategic response

of other members in the organization.

We conclude our analysis by discussing the second part in the proposition, which allows

us to begin talking about the value of better public information and its dependence on

parameters of the model, such as the leader’s degree of patience.

Part (ii) states that, for sufficiently large horizons and discount rates, the organization’s

continuation payoffs at a time T (independent of r) are ranked in favor of the no-feedback

case. This is in fact the result of a stronger adaptation by the leader. To see why, observe

9Note that − log(γT /γo) is the entropy reduction in the follower’s belief over [0, T ].

14

first that a sufficiently long horizon is needed for the history-inference effect to gain strength.

In this line, the first row in Figure 3 plots total ex ante coordination and adaptation losses

when r = 0: the latter are lower in the no-feedback case for T beyond a threshold.

2 4 6 8 10T

0.5

1.0

1.5

2.0

Coord. Loss (T) NF Coord. Loss (T) Pub

2 4 6 8 10T

0.05

0.10

0.15

0.20

0.25

0.30

Adap. Loss (T) Pub Adap. Loss (T) NF

Coord Losst NF - Coord Losst Pub

r=0 r=0.5 r=1 r=+infty

5 10 15 20 25 30t

-0.04

-0.02

0.02

0.04

0.06

0.08Adapt Losst NF - Adapt Losst Pub

r=0 r=0.5 r=1 r=+infty

5 10 15 20 25 30t

-0.06

-0.04

-0.02

0.02

Figure 3: First row: total adaptation and coordination losses for T ∈ [0, 10] for r = 0. Second row:flow losses for T = 30, r ∈ {0, 0.5, 1,+∞}. Other parameters: γo = σY = 1.

The second row plots differences of ex ante coordination and adaptation flow losses be-

tween cases in a large horizon context. In particular, adaptation losses are eventually lower

in the no-feedback case (right panel), and this is more pronounced as r grows. In fact,

recall that a myopic leader attaches a weight of 1/2 to her type at all times in the public

case. By contrast, in the no-feedback case, the history-inference effect continues to operate

if the leader is myopic, allowing α to become arbitrarily close to 1 as T lengthens. Thus, an

impatient leader in the public case simply adapts too little after a threshold.

This analysis of discounting and time-horizon effects has two implications. First, from (ii),

initializing the game with second-order uncertainty can improve upon a public counterpart

with potentially higher uncertainty yet a perfect ability to coordinate; for instance, inten-

sive one-sided communication by a leader, and subsequent leadership by example without

feedback, can dominate two-sided communication followed by a perfect feedback channel.

Second, the additional value that a noiseless feedback channel brings to an organization

relative to the no-feedback case is expected to fall with discounting due to the leader being

weakly adapted to the environment when beliefs are public. See Figure 4:

15

1 2 3 4 5r

-1.4

-1.2

-1.0

-0.8

-0.6

-0.4

-0.2

Payoff NF

Payoff Public

0 1 2 3 4 5r

1.04

1.06

1.08

1.10

1.12

1.14

Payoff NFPayoff Pub

Figure 4: Ex ante total payoffs comparison. γo = σY = 1, and T = 4.

We have examined how a leader gradually adapts a team to a new economic environ-

ment while controlling the team’s coordination costs, uncovering three sets of results. First,

higher-order uncertainty arising from private monitoring radically affects the way in which a

leader’s actions transmit her private information: the signaling coefficient is non-decreasing,

which is in stark contrast with its public counterpart. Second, the history-inference effect

driving the previous result is sufficiently strong to generate more learning on behalf of the

follower—learning is however, a measure of the coordination costs incurred by the organiza-

tion. Third, the value of interventions aimed at improving the information flow to leaders

depends critically on both horizon effects and discounting: longer interactions combined with

leader myopia reduce the value of noiseless information structures.

Critically, this example is just a first attempt at understanding organizations as dynamic

enterprises, where decision makers can signal and learn information at the same time that

decisions are being made. From this standpoint, it is important to recognize that public

signals rarely are perfectly informative or pure noise. Away from those cases, we would

expect informed parties’ forecasts to lie somewhere in between the two extreme cases just

analyzed: i.e., to rely both on input and output measures. This is done in the next section,

where the key will be to derive a generalization of the representation Mt = χtθ + (1 − χ)µ

for 0 < σX ≤ ∞.

3 General Model

We consider two-player linear-quadratic-Gaussian games with one-sided private information

and one-sided private monitoring in continuous time. The baseline model considered is

introduced next, and extensions of it are presented in Section 5 via two further applications.

Players, Actions and Payoffs. A forward looking long-run player (she) and a myopic

counterpart (he) interact in a repeated game that is played continuously over a time interval

16

[0, T ], T <∞. At each t ∈ [0, T ], the long-run player chooses an action at, while the myopic

player chooses at, both taking values over the real line. Given a profile of realized actions,

(at, at)t∈[0,T ], the long-run player’s total payoff is

ˆ T

0

e−rtU(at, at, θ)dt. (7)

In this specification, r ≥ 0 is the long-run player’s discount rate, U : R3 → R is quadratic

capturing her flow (i.e., stage-game) utility function, and θ denotes the value of a normally

distributed random variable with mean µ and variance γo > 0 that parametrizes the economic

environment. In turn, the myopic player’s stage-game payoff at any time t ≥ 0 is given by

U(at, at, θ) (8)

if (at, at) was chosen at that time, where U : R3 → R is also quadratic.

In what follows, we assume the following properties on the quadratic functions U and U

(partial derivatives are denoted by subindices):

Assumption 1.

(i) Strict concavity: Uaa = Uaa = −1;

(ii) Non-trivial signaling: Uaθ(Uaθ + UaaUaθ) > 0;

(iii) Second-order inferences: Uaa 6= 0 and |Uaθ|+ |Uaa| 6= 0.

(iv) Myopic best-replies intersect: UaaUaa < 1;

We first require that the players’ objectives are concave in their respective choice vari-

ables; from this perspective (i) is simply a normalization. A second minimal requirement is

that the long-run player strategically care about θ, which is implied by (ii). Equipped with

this, part (iii) says that second-order inferences are relevant for play: the myopic player’s

first-order belief matters for his behavior—either directly because he cares about θ, or be-

cause he wants to predict the long-run player’s action—and in turn the long-run player wants

to predict the myopic player’s action, invoking a second-order belief.10

The remaining parts are technical conditions pertaining to the one-shot game that arises

at the end of the interaction via the static game with higher-order uncertainty that takes

place at T . Specifically, part (iv) ensures that a static Nash equilibrium always exists, and

part (ii) ensures that any such equilibrium involves non-trivial signaling. We elaborate more

on these conditions when we explain how to find equilibria of the type we are interested in.

10Of course, (iii) is not really a restriction to our analysis, but instead a choice.

17

Information. The long-run player observes θ before play begins, while the myopic player

only knows the distribution θ ∼ N (µ, γo) from which it is drawn (and this is common

knowledge). In addition, there are two signals X and Y that convey noisy information

about the players’ actions according to

dXt = atdt+ σXdZXt (9)

dYt = atdt+ σY dZYt , (10)

where ZX and ZY are independent Brownian motions, and σY and σX are strictly positive

volatility parameters. In this linear product-structure specification, the signal Y is only

observed by the myopic player, while the signal X is public.11

Let Et[·] denote the long-run player’s conditional expectation operator, which can con-

dition on the histories (θ, as, Xs : 0 ≤ s ≤ t), t > 0, and on her conjecture of the my-

opic player’s play. Likewise, Et[·] denotes the myopic player’s analog, which conditions on

(as, Xs, Ys : 0 ≤ s ≤ t) and on her belief about the long-run player’s strategy.

Strategies and Equilibrium Concept. To characterize equilibrium outcomes, we focus

on Nash equilibria. From a time-zero perspective, an admissible strategy for the long-run

player is any square-integrable real-valued process (at)t∈[0,T ] that is progressively measurable

with respect to the filtration generated by (θ,X). Similarly, an admissible strategy (at)t∈[0,T ]

for the myopic player satisfies identical integrability conditions, but the measurability re-

striction is with respect to (X, Y ).12

Definition 1 (Nash equilibrium.). An admissible pair (at, at)t≥0 is a Nash equilibrium if,

(i) given (at)t≥0, the process (at)t≥0 maximizes

E0

[ˆ T

0

e−rtU(at, at, θ)dt

]among all admissible processes, and

(ii) at solves maxa′∈R

E[U(at, a′, θ)] for all t ∈ [0, T ].

In the next section, we characterize Nash equilibria that are supported by linear Markov

strategies that are sub-game perfect, i.e., that sequentially rational on and off the path

11Thus, flow payoffs do not convey any additional information to the players (i.e., they are either realizedafter time T , or they can be written in terms of the actions and signals observed by each player).

12Square integrability is in the sense of E0[´ T0a2tdt] < +∞ for the long-run player. Such condition ensures

that a strong solution to 9 exists, and thus that the outcome of the game is well defined.

18

of play. Such equilibria generalize that presented in Section 2 for the no-feedback case to

settings in which 0 < σX ≤ ∞.

Remark 1 (Extensions). The baseline model can be generalized along two dimensions:

(i) Terminal payoffs: terminal payoffs of the form e−rTΨ(aT ), with Ψ quadratic, can be

added to (7). A reputation model with this property is studied in 5.1

(ii) Long-run player affecting the public signal X: the drift of (9) can be generalized to

at + νat, where ν ∈ [0, 1] is a scalar. An insider trading model involving ν = 1, as well

as Uaa = 0, (i.e., linear utility) is explored in 5.2

4 Equilibrium Analysis: Linear Markov Equilibria

To construct linear Markov perfect equilibria (henceforth, LME), we first postulate a minimal

set of belief states up to the second order to be used by the players in any equilibrium of this

kind. We then derive a representation of the long-run player’s second-order belief as a linear

function of a subset of such belief states, when the players use the candidate belief states in

a linear fashion. This result generalizes the representation (5) obtained in section 2.2, and

it circumvents the problem of the set of states growing without bound (Section 4.1).

In Section 4.2 we then turn to setting up the long-run player’s best-response problem,

and elaborate on how the problem of existence of LME reduces to finding solutions to a

boundary-value problem. In Section 4.3, we illustrate two proof techniques that depend on

whether the myopic player’s best response explicitly depends on his belief about the state of

the world or not (common vs. private values environments, respectively). Finally, we obtain

two existence results for LME, each for a variation of the coordination game from Section 2.

4.1 Belief States and Representation Lemma

With linear-quadratic payoffs and signals that are linear in the players’ actions, it is natural

to examine equilibria in which the long-run player conditions on her type θ linearly.

The logic is then analogous to that in Section 2.2. Specifically, since the myopic player

cares about the long-run player’s action (and/or her type) to determine his best response, he

will use Y to learn about θ. Because Y is privately observed, however, the myopic player’s

(first-order) belief about θ is private. The strategic interdependence of the players’ actions in

the long-run player’s payoff then forces the latter agent to forecast the myopic player’s belief,

which leads her second-order belief to become a relevant state. As we demonstrate shortly,

such second-order belief is also private due to its dependence on the long-run player’s type

19

via her past actions. Thus, the myopic player is forced to perform a non-trivial inference

about such hidden second-order belief, and so forth.

Along the path of play of any pure strategy, however, the outcome of the game should

depend only on the tuple (θ,X, Y ). Intuitively, given any rule that specifies behavior as a

function of past actions and information, the dependence on past play must disappear when

such a rule is followed, thus leading to realized outcomes that depend on the exogenous

elements of the model. In particular, the long-run player’s second-order belief should be

a function of (θ,X) exclusively, which is the only source of information available to her.

Moreover, in this Gaussian environment, one would expect the relationship between M and

(θ,X) to be linear if the rule that drives behavior is linear in some belief states.

Let Mt := Et[θ] denote the mean of the myopic player’s belief, and Mt := Et[Mt] denotes

the long-run player’s second-order counterpart. The previous discussion then suggests the

existence of a deterministic function χ and a process (Lt)t∈[0,T ] that depends on the paths of

the public signal X, such that M admits the representation

Mt = χtθ + (1− χt)Lt (11)

when the players follow linear Markov strategies

at = β0t + β1tMt + β2tLt + β3tθ (12)

at = δ0t + δ1tMt + δ2tLt, (13)

where the coefficients βit and δjt, i = 0, 1, 2, 3 and j = 0, 1, 2, are deterministic. (We

occasionally use ~β := (β0, β1, β2, β3) and ~δ := (δ0, δ1, δ2) for convenience.) The reason for

augmenting the strategies of 2.2 by the public state L is apparent: if true, the myopic player

uses (11) to forecast M , which means that L becomes a payoff-relevant state for both players.

Lemma 2 below characterizes the pair (χ, L) that validates (11)–(13). Before stating

the result, it is instructive to explain its derivation and introduce some notation. When

the myopic player conjectures that (11)–(12) hold for some “public” process L, he therefore

expects the long-run player’s realized actions to follow

at = β0t︸︷︷︸=:α0t

+ (β2t + β1t(1− χt))︸︷︷︸=:α2t

Lt + (β3t + β1tχt)︸︷︷︸=:α3t

θ. (14)

Because L is public, the myopic player can then filter θ from (X, Y ) when Y is driven by (14).

This learning problem is (conditionally) Gaussian, and hence the myopic player’s posterior

20

belief is fully characterized by a mean process (Mt)t≥0, and a deterministic variance

γt := Vart = Et[(θt − Mt)2],

where we have omitted the hat symbol in γt for notational convenience. As in Section 2, this

posterior variance will be determined by the signaling coefficient

α3t := β3t + β1tχt,

with β1χt encoding the history-inference effect: different types are expected to take different

actions not only because of their direct signaling incentives (captured by β3) but also because

their past actions have lead them to hold different beliefs today.

Critically, while the long-run player does not observe M , she recognizes that deviations

from (14) affect its evolution via Y—thus, her problem is one of stochastic control of an

unobserved state. Given the linear-quadratic payoffs, linear dynamics, and Gaussian noise,

this problem can be recast as one of controlling the long-run player’s estimate of M—namely,

Mt := Et[Mt]—after appropriately adjusting her flow payoffs via the use of conditional

expectations.13 Inserting the general linear Markov strategy (12) into the law of motion of

(Mt)t≥0, and solving for Mt as a function of {θ, (Xs)s<t} allows is to pin down (χ, L) given

the coefficients in the strategies.

Lemma 2 (Representation of second-order belief). Suppose that (X, Y ) is driven by (12)–

(13) and that the myopic player believes that (11) holds. Then, (11) holds at all times

(path-by-path of X), if and only if

γt = −γ2t (β3t + β1tχt)

2

σ2Y

, t > 0, γ0 = γo, (15)

χt =γt(β3t + β1tχt)

2(1− χt)σ2Y

− γtχ2t δ

21t

σ2X

, t > 0, χ0 = 0, (16)

dLt = (l0t + l1tLt)dt+BtdXt, t > 0, L0 = µ, (17)

where l0t and l1t, and Bt are given in (B.6)-(B.8). Moreover, Lt = E[Mt|FXt ] = E[θ|FXt ]

and γtχt = Vart = Et[(Mt − Mt)2].

The long-run player uses the public signal to learn about the myopic player’s belief. By

13This is the so-called separation principle, which allows one to filter first, and optimize afterwards usingbelief states, in these types of problems. We elaborate more on this topic in the proof of Lemma 4, wherewe derive the laws of motion of the Markov belief states.

21

the lemma, along the path of (12)–(13), we have that

Mt =Vart

Vartθ +

(1− Vart

Vart

)E[θ|FXt ].

Indeed, while learning about M from X, the only informational advantage that the long-run

player has relative to an outsider who observes X exclusively is that she knows her type.

Due to the Gaussian structure of the model, therefore, (i) Mt is a linear combination of θ

and E[Mt|FXt ], and (ii) the weights are deterministic. By the law of iterated expectations,

E[Mt|FXt ] = E[θ|FXt ], and the representation follows.

Let us now elaborate on the structure of the χ-ODE (16). Recall that the common prior

assumption implies that the long-run player knows that M = µ at time zero: Var0 = 0 then

implies M0 = µ in the previous expression, and so the χ-ODE starts at zero. As signaling

progresses, however, second-order uncertainty arises due to the long-run player losing track

of M (i.e., Vart > 0): this is captured in χ > 0 as soon as α3 > 0 in (16). In other words,

the long-run player expects M to gradually reflect her type θ, and so χt > 0.

Observe that if σX =∞ (the public signal is infinitely noisy) or δ1 ≡ 0 (the myopic player

does not signal back) the public signal is uninformative, so we would expect the long-run

player to forecast M solely by input: in fact, Lt = L0 = µ and χt = 1 − γt/γ0 hold in this

case, exactly as in Section 2.14 Otherwise, she also forecasts by “output,” as reflected in the

dependence of L on X. Conversely, as δ21/σ

2X grows, there is more downward pressure on

the growth of χ: as the signal-to-noise ratio in X improves, the long-run player relies less on

her past actions to forecast M , everything else being held constant. Thus, the no-feedback

maximizes the potential impact of second-order belief effects on behavior.

Our subsequent analysis takes the system (15)–(16) as an input. Thus, we require it to

have a unique solution over [0, T ] so as to ensure that the ODE-characterization is valid. To

this end, notice that the myopic player’s best reply can be written as δ1t := uθ+ua[β3t+β1tχt],

where uθ = Uaθ and ua = Uaa are real numbers.

Lemma 3. Suppose that β1 and β3 are continuous, β3· 6= 0 and δ1t = uθ + ua[β3t + β1tχt].

Then, there is a unique solution to (15)–(16). Such solution satisfies 0 < γt ≤ γo and

0 < χt < 1, t ∈ (0, T ].

The idea is that, under the conditions in the lemma, (γ, χ) is bounded, and hence a

solution to the system exists over [0, T ] (as solutions to ODE systems either exist or explode

over unrestricted domains). Since the system is locally Lipschitz continuous, uniqueness

14Setting δ1/σX ≡ 0 in (16) leads to the same ODE that χ satisfies in the no-feedback case. By uniqueness,such solution is χ = 1− γt/γo. See the proof of Lemma 1.

22

ensues; in particular, γt = Et[(θt − Et[θ])2] and χt = Vart/Vart = Et[(Mt − Mt)2]/γt.

The belief representation (11) relies on the long-run player following the linear strategy

(12); i.e., it does not hold off the path of play. In fact, as argued earlier, (Mt)t≥0 is controlled

by the long-run player, a phenomenon that is the consequence of the private monitoring

present in the model: past play is used for forecasting the myopic player’s private histories,

and so different actions yield different perceptions of the continuation game as measured by

M . Moreover, because such deviations are hidden, from the long-run player’s perspective, the

myopic player is always assuming that (11) holds—thus, the pair (γ, χ) affects the evolution

of M in the myopic player’s learning process, and hence the evolution of M . The next result

introduces the law of motion of M and L for an arbitrary strategy of the long-run player,

which will allow us to state her best-response problem.

Lemma 4. Suppose that the long-run player follows (a′t)t≥0 while the myopic player follows

(13) and believes (11)–(12). Then, from the long-run player’s perspective

dMt =γtα3t

σ2Y

(a′t − [α0t + α2tLt + α3tMt])dt+χtγtδ1t

σXdZt (18)

dLt =χtγtδ1t

σ2X(1− χt)

[δ1t(Mt − Lt)dt+ σXdZt] (19)

where (γ, χ) solves (15)–(16) and (Zt)t≥0 is a Brownian motion from her standpoint.

The dynamic (18) shows that long-run player’s choice of strategy a′ affects M . In par-

ticular, she will update her belief upward when a′t > Et[α0t + α2tLt + α3tMt], i.e., when she

exceeds her own expectation of the myopic player’s belief about her behavior. The intensity

of such a reaction is given by γtα3t/σ2Y : more uncertainty (higher γ) and stronger signaling

(larger α3) makes the long-run player’s belief more sensitive to her own actions. Further, M

evolves deterministically when δ1/σX ≡ 0.15

The drift of (19) demonstrates that the long-run player affects L only indirectly via

changes in M , due to her action not entering the public signal directly. Further, the drift

captures that the belief of an outsider who only observes X always moves in the direction of

M on average, reflecting that such an outsider learns the long-run player’s type. From this

perspective, by leading to Lt = µ at all times, the no-feedback case (σX =∞) misses a mild

signal-jamming effect—the ability to influence a public belief, albeit only indirectly.

Finally, the full-support monitoring and linear-quadratic structure, along with the (equi-

librium) representation (11), make it clear that (t, θ, Lt,Mt) and (t, Lt, Mt) summarize all

15It is worth noting that (Mt)t≥0 corresponds to a player’s non-trivial belief that is controlled by the sameplayer. Unless there are experimentation effects, players’ own beliefs are usually affected by other players.

23

the payoff-relevant information our players.16 In this line, the time variable captures both

time-horizon effects and the learning effects via γ and χ.

4.2 Dynamic Programming and the Boundary-Value Problem

The long-run player’s best-response problem. Given a conjecture ~β by the myopic

player, the coefficients ~δ will be such that

at := δ0t + δ1tMt + δ2tLt = arg maxa′

Et[U(α0t + α2tLt + α3tθ, a′, θ)]. (20)

Because U is quadratic, the long-run player’s best-response problem is, up to a constant,

max(at)t∈[0,T ]

E0

[ˆ T

0

e−rtU(at, δ0t + δ1tMt + δ2tLt, θ)dt

]s.t. (18) and (19)

and where ~δ satisfies (20). Observe that we have replaced M by M in the flow by means of

Et[M2t ] = M2

t + χtγt, and then using that χtγt is deterministic.

We can now define the notion of a linear Markov perfect equilibrium (LME).

Definition 2 (Linear Markov Perfect Equilibrium). A Nash equilibrium (at, at)t≥0 is a Lin-

ear Markov Equilibrium (LME) if there are deterministic coefficients (~β, ~δ) such that at

satisfies (20) and at = α0 + α2tLt + α3tθ, where: (i) (Lt)t≥0 evolves as in (17), (iii) ~α

satisfies (14), and (iii) β0 + β1M + β2L+ β3θ is an optimal policy for the long-run player.

The natural approach for establishing the existence of LME is via dynamic programming.

Specifically, we postulate a quadratic value function

V (θ,m, `, t) = v0t + v1tθ + v2tm+ v3t`+ v4tθ2 + v5tm

2 + v6t`2 + v7tθm+ v8tθ`+ v9tm`,

where vi·, i = 0, ..., 9 depend on time only. In turn, the HJB equation is

rV = supa′

{U(a′, at, θ) + Vt + µM(a′)Vm + µLV` +

σ2M

2Vmm + σMσLVm` +

σ2L

2V``

},

where µM(a′) and µL (respectively, σM and σL) are the drifts (respectively, volatilities) in

the laws of motion for M and L given in Lemma 4, and where at, is determined by ~β and

16We have focused on the long-run player exclusively. While deviations by the myopic player do affectL, the same assumptions (i.e., linear-quadratic structure and undetectable deviations) make his flow payofffully determined by the current value of (t, L, M) after all private histories.

24

χ via (20). A LME with coefficients ~β for the long-run player is obtained when the linear

Markov strategy (12) is an optimal policy for the previous HJB equation.

The boundary-value problem. We briefly explain how to obtain a system of ordinary

differential equations (ODEs) for ~β. Letting a(θ,m, `, t) denote the maximizer of the right-

hand side in the HJB equation, the first-order condition (FOC) reads

Ua(a(θ,m, `, t), δ0t + δ1tm+ δ2t`, θ) +γtα3t

σ2Y

[v2t + 2v5tm+ v7tθ + v9t`]︸︷︷︸Vm(θ,m,`,t)

= 0. (21)

where γtα3t/σ2Y in the second term captures the sensitivity of M to the long-run player’s

action at time t. Solving for a(θ,m, `, t) in the previous FOC, the equilibrium condition

becomes a(θ,m, `, t) = β0t + β1tm+ β2t`+ β3tθ.

Because the latter condition is a linear equation, we can solve for (v2, v5, v7, v9) as a

function of the coefficients ~β. Inserting these into the HJB equation along with a(θ,m, `, t) =

β0t + β1tm+ β2t`+ β3tθ in turn allows us to obtain a system of ODEs that the ~β coefficients

must satisfy. The resulting system is coupled with the ODEs that v6 and v8 satisfy (and that

are obtained from the HJB equation): since M feeds into L, the envelope condition with

respect to M is not enough to determine equations for the candidate equilibrium coefficients.

Finally, since the pair (γ, χ) affects the law of motion of (M,L), it also affects the evolution

of (~β, v6, v8), and so the ODEs (15)–(16) must be considered.

The boundary conditions for the system of ODEs that (β0, β1, β2, β3, v6, v8, γ, χ) satisfies

are as follows. First, there are the exogenous initial conditions that γ and χ satisfy, i.e.,

γ0 = γo > 0 and χ0 = 0. Second, there are terminal conditions v6T = v8T = 0 due to

the absence of a lump-sum terminal payoff in the long-run player’s problem. Third, more

interestingly, there are endogenous terminal conditions that are determined by the static

Nash equilibrium that arises from myopic play at time T . In fact, letting

Uaθ = uθ, Uaa = ua and Ua(0, 0, 0) = u0,

and analogously for the myopic player via the substitution (·)↔ (·), we obtain

β0T =u0 + uau0

1− uaua, β1T =

ua[uθua + uθ]

1− uauaχT, β2T =

u2aua[uθua + uθ](1− χT )

(1− uaua)(1− uauaχT ), β3T = uθ (22)

which are well-defined thanks to (iv) in Assumption 1 and χ ∈ (0, 1).17

We conclude that b := (β0, β1, β2, β3, v6, v8, γ, χ)′ satisfies a boundary-value problem

17From here, δ0T = u0 + uaβ0T , δ1T = uθ + ua[β3T + β1TχT ] and δ2T = ua[β2T + β1T (1− χT )].

25

(BVP) of the form

bt = f(bt), s.t. D0b0 + DTbT = (B(χT )′, γo, 0)′ (23)

where (i) f : R6 × R+ × [0, 1)→ R8, (ii) D0 and DT are the diagonal matrices

D0 = diag(0, 0, 0, 0, 0, 0, 1, 1) and DT = diag(1, 1, 1, 1, 1, 1, 0, 0)

and

(iii) χ 7→ B(χ) :=

(u0 + uau0

1− uaua,ua[uθua + uθ]

1− uauaχ,u2aua[uθua + uθ](1− χ)

(1− uaua)(1− uauaχ), uθ, 0, 0

)∈ R6. (24)

The general expression for f(·) given any pair (U, U) satisfying Assumption 1 is tedious and

long, and can be found in spm.nb on our websites. In the next section, we provide examples

that exhibit all the relevant properties that any such f(·) can satisfy.

The question of finding LME is then reduced to finding solutions to the BVP (23) (subject

to the rest of the coefficients of the value function being well defined). We turn to this issue

in the next section.

4.3 Existence of Linear Markov Equilibria: Interior Case

In this section, we present two existence results for LME in the case σX ∈ (0,∞): one for the

application introduced in Section 2, and the second for a variation of it in which the follower

(i.e., the myopic player) cares about both matching the leader’s action and matching the her

type. We accomplish this via proving the existence of a solution to the BVP that arises in

each setting, for the case in which the leader is patient (i.e., r = 0)—the applicability of the

methods is, however, more general (both in terms of the flow payoffs and time preferences).

The problem of finding a solution to any instance of the BVP (23) is complex because

there are multiple ODEs in either direction: (β0, β1, β2, β3, v6, v8) are traced backward from

their (endogenous) terminal values, while (γ, χ) are traced forward using their initial (exoge-

nous) ones—see Figure 5. In practice, this implies that the traditional “shooting methods”

can become severely complicated. Specifically, when constructing, say, a modified backward

initial value problem (IVP) in which (γ, χ) has a parametrized initial condition at T , the

requirement becomes that the chosen parameters induce terminal values at 0 that exactly

match (γo, 0). With more than one variable, however, this method essentially requires having

an accurate knowledge of the relationship between γ and χ at T for all possible coefficients~β: it is only then that we can find a way to trace the parametrized values over a region of

26

initial (time-T ) values in a way that it is possible to ensure that the target is hit.

γo

0

χ

T

β (χ ,γ )γt

t

i,T T T

v (χ ,γ )i,T T T

StaticNash

.

..

B(χ ,γ )T T

Figure 5: In the BVP, (γ, χ) has initial conditions, while (~β, v6, v8) has terminal ones.

The reason behind this dimensionality problem is the asymmetry in the environment:

the rate at which the long-run player signals her private information, α3 := β3 + χβ1, can

be substantially different than rate at which the myopic player signals his private belief,

δ1. This, in turn, potentially introduces a non-trivial history dependence between γ and

χ, reflected in the coupled system of ODEs they satisfy. Two natural questions then arise:

first, under which conditions such history dependence can be simplified; and second, how to

tackle the issue of existence of LME when this simplification is not possible.

Private values: one-dimensional shooting. We say that an environment is one of

private values if the myopic player’s flow utility satisfies

uθ := Uaθ = 0,

i.e., the myopic player’s best-reply does not directly depend on his belief about θ, but only

indirectly via the long-run player’s action. Otherwise, we say that the environment is one of

common values (despite the long-run player always knowing θ).

In a private-value setting, the myopic player’s coefficient on M is δ1 = uaα3. In this case,

there is a one-to-one mapping between γ and χ:

Lemma 5. Set σX ∈ (0,∞). Suppose that β1 and β3 are continuous and that δ1 = uaα3. If

ua 6= 0, there are positive constants c1, c2 and d independent of γo such that

χt =c1c2(1− [γt/γ

o]d)

c1 + c2[γt/γo]d.

Moreover, (i) 0 ≤ χt < c2 < 1 for all t ∈ [0, T ] and (ii) c2 → 0 as σX → 0 and c2 → 1 as

σX →∞. If instead ua = 0, χt = 1− γt/γo and c2 = 1.

27

It is easy to see that the right-hand side of the expression for χ in the previous lemma is

strictly decreasing in γt. Consequently, when the ratio of the signaling coefficients is constant,

the dimensionality of the (backward) shooting problem is reduced to the single variable. The

lemma also states that, as long as σX <∞, χ is always strictly below 1, reflecting that the

scope for the history-inference effect is diminished relative to the no-feedback case. Further,

the characterization of χ obtained in the latter case (5) is recovered when ua = 0, as the

public signal is then uninformative.

Thanks to the previous lemma, the standard shooting method based on the continuity of

the solutions is applicable. We state below the BVP for the leading-by-example application

of Section 2 when σX ∈ (0,∞) in its undiscounted version: recall that in that setting, the

follower wants to match the leader’s action, and so

at = Et[at]⇒ δ1t = α3t ⇔ ua = 1.

(Since scaling U and U each by a constant does not alter incentives, the ODEs below are

obtained under U(a, a, θ) = −(θ − a)2 − (a − a)2 and U(a, a, θ) = −(a − a)2 as opposed to

[−(θ − a)2 − (a− a)2]/4 and U(a, a, θ) = −(a− a)2/2, which would yield (i) in Assumption

1). We omit the β0–ODE due to being uncoupled from the rest and linear in itself:

v6t = β22t + 2β1tβ2t(1− χt)− β2

1t(1− χt)2 +2v6tα

23tγtχt

σ2X(1− χt)

v8t = −2β2t − 2(1− 2α3t)β1t(1− χt)− 4β21tχt(1− χt) +

v8tα23tγtχt

σ2X(1− χt)

β1t =α3tγt

2σ2Xσ

2Y (1− χt)

{2σ2

X(α3t − β1t)β1t(1− χt)− α23tβ1tγtχtv8t

−2σ2Y α3tχt(β2t − β1t[1− χt − 2β2tχt])

}β2t =

α3tγt2σ2

Xσ2Y (1− χt)

{2σ2

Xβ21t(1− χt)2 + 2σ2

Y α3tβ2tχ2t (1− 2β2t)− α2

3tγtχt(2v6t + β2tv8t)}

β3t =α3tγt

2σ2Xσ

2Y (1− χt)

{−2σ2

Xβ1t(1− χt)β3t + 2σ2Y α3tβ2tχ

2t (1− 2β3t)− α2

3tβ3tγtχtv8t

}γt = −γ

2t α

23t

σ2Y

with boundary conditions v6T = v8T = 0, β1T = 12(2−χT )

, β2T = 1−χT2(2−χT )

, β3T = 12

and

γ0 = γo, and where α3 := β3 +β1χ and χt is as in the previous lemma. We have the following:

Theorem 1. Let σX ∈ (0,∞) and r = 0. Then, there exists a strictly positive function

T (γo) ∈ O(1/γo) such that, for all T < T (γo), there exists a LME based on the solution to

the previous BVP. In that equilibrium, β0t = 0, β1t + β2t + β3t = 1 and α3t > 0, t ∈ [0, T ].

28

The key step behind the proof is to show that (β1, β2, β3, v6, v8, γ) can be bounded uni-

formly over [0, T (γo)), some T (γo) > 0, when γt ∈ [0, γo] at all times. For a given T < T (γo),

therefore, this implies that tracing the (parametrized) initial condition of γ in the (backward)

IVP from 0 upwards as schematically in Figure 6 will lead to at least one γ-path landing

exactly at γo (while the rest of the ODEs still admitting solutions), due to the continuity of

the solutions with respect to the initial conditions.18

γo

0T

γt

v (χ( ), )i,T

ParametrizedStatic Nash

γFγF

β (χ( ), )i,T

γFγF

χ( )γF

Fγ

γF

Figure 6: The one-dimensional shooting method.

As expected, the signaling coefficient in the interior cases lies “in between” those found

in the extreme cases of Section 2. Graphically for the r = 0 case:19

0 2 4 6 8 10t

0.55

0.60

0.65

0.70

0.75

0.80

0.85

β3 Public

α NF

α Int. σX = 0.1

0 2 4 6 8 10t

0.55

0.60

0.65

0.70

0.75

0.80

0.85

β3 Public

α NF

α Int. σX = 0.75

Common-value settings: fixed-point methods. When α and δ cease to be propor-

tional, χ can depend on both current and past values of γ at all points in time (as it is the

case in most coupled-ODE systems). The multi-dimensionality problem reappears.

18See Bonatti et al. (2017) for an application of this method to a symmetric oligopoly model featuringdispersed fixed private information, imperfect public monitoring, and multiple long-run players.

19In the discounted case, one can instead work with the ‘forward-looking’ component of (β1, β2, β3, v6, v8),which is defined as the latter dynamic coefficients net of their myopic counterpart (given the learning inducedby the dynamic strategy). Such forward-looking system eliminates a component linear in r present in thesystem that (β1, β2, β3, v6, v8) satisfies, and that is absent in the undiscounted version.

29

0 2 4 6 8 10t

0.55

0.60

0.65

0.70

0.75

0.80

0.85

β3 Public α

NF

α Int. σX = 2

0 2 4 6 8 10t

0.55

0.60

0.65

0.70

0.75

0.80

0.85

β3 Public

α NF

α Int. σX = 10

Figure 7: As σX ranges from 0 to +∞ the signaling coefficient starts close to the public benchmark,and gradually becomes closer the the no-feedback case counterpart.

Observe that finding a solution to any given instance of the BVP (23) is, mathematically,

a fixed-point problem. Specifically, notice that the static Nash equilibrium at time T depends

on the value that χ takes at that point. The latter value, however, depends on how much

signaling has taken place along the way, i.e., on values of the coefficients ~β at times prior

to T . Those values, in turn, depend on the value of the equilibrium coefficients at T by

backward induction, and we are back to the same point where we started.

Our approach therefore applies a fixed-point argument adapted from the literature on

BVPs with intertemporal linear constraints (Keller, 1968) to our problem with intratemporal

nonlinear constraints. Because the method is novel and has the generality required to become

useful in other asymmetric settings, we briefly elaborate on how it works.20

Let t 7→ bt(s, γo, 0) denote the solution to the forward IVP version of (23) when the

initial condition is (s, γo, 0), s ∈ R6, provided a solution exists. From Lemma 3, the last two

components of b, γ and χ, always admit solutions as long as the others do; moreover, there

are no constraints on their terminal values. Thus, for the fixed-point argument, we can focus

on the first six components in b := (β0, β1, β2, β3, v6, v8, γ, χ) by defining the gap function

g(s) = B(χT (s, γo, 0))−DT

ˆ T

0

f(bt(s, γo, 0))dt.

This function measures the distance between the total growth of (β0, β1, β2, β3, v6, v8) (last

term in the display), and its target value, B(χT (s, γo, 0)). By (24), B(χ) is nonlinear: the

static Nash equilibrium imposes nonlinear relationships across variables at time T .21

20Our approach is inspired by Theorem 1.2.7 in (Keller, 1968), the proof of which is not provided.21The function g takes only the first six components of b because there are no “shooting” constraints on

γ and χ. Yet, one is not really dispensing with (γ, χ), as this pair does affect (β0, β1, β2, β3, v6, v8).

30

Critically, using that, by definition, b0(s, γo, 0) = s, it follows that

g(s) = s ⇔ B(χT (s, γo, 0)) = s+ DT

ˆ T

0

f(bt(s, γo, 0))dt = DTbT (s, γo, 0),

where the last equality follows from the definition of the ODE-system that DTb satisfies.

Thus, the shooting problem (i.e., find s s.t. B(χT (s, γo, 0)) = DTbT (s, γo, 0)) can be trans-

formed to one of finding a fixed point of the function g.22

The bulk of the method then consists of finding a time T (γo) and a compact set S of

values for s such that (i) for all s ∈ S, a unique solution (bt(s, γo, 0))t∈T (γo) exists for the IVP

with initial condition (s, γo, 0), and (ii) g is continuous map from S to itself. The natural

choice for S is a ball centered around s0 := B(0), the terminal condition of the trivial game

with T = 0. With this in hand, part (i) can be accomplished by bounding the solutions

uniformly as in the one-dimensional shooting method, but now over [0, T (γo)]× S. In turn,

the continuity requirement of (ii) is guaranteed if the system of ODEs has enough regularity,

while the self-map condition can be ensured due to the system scaling with γo and T .23

We can now establish our main existence result for a variation of the leading-by example

application in which the follower’s best-reply is given by

at = uθEt[θ] + Et[at]⇒ δ1t = uθ + α3t, where uθ > 0.

(The positivity constraint ensures that (ii) in Assumption 1 is satisfied.24) The associated

BVP is given by (B.32)-(B.38) in the Appendix.

Theorem 2. Set σX ∈ (0,∞), u > 0 and r = 0 in the leadership model. Then, there is a

strictly positive function T (γo) ∈ O(1/γo) such that if T < T (γo), there exists a LME based

on the BVP (B.32)-(B.38). In such an equilibrium, α3 > 0.

22A BVP with intertemporal linear constraints differs from ours in that D0b0 + DTbT = (B(χT )′, γo, 0)′

becomes Ab0 +BbT = α, where A and B are not necessarily diagonal matrices and α is a constant vector.Thus, unlike in our analysis, one may not be able to dispense with a subset of the system (even if theassociated ODEs can be shown to exist independently from the rest): when A and B are not diagonal, theanalog of g(·) in that case may carry constraints on all coordinates. A complication that arises in our setting,however, is that our version of α is a nonlinear function of a subset of components of bT . This requiresestimating B(χT (s, γo, 0)) for all values of s over which g(·) must be shown to be a self-map.

23In general, it is more useful to work with a change of variables that eliminates 1−χt from denominatorin the system, and which reflects play when the state variable L is replaced by (1 − χ)L. Having shownexistence of the associated BVP in this case, we can then recover a solution to our original BVP by reversingthe change of variables and applying Lemma 3 (which ensures that 1 − χt > 0 for all t ∈ [0, T ], and hencethat the right-hand side of our system of interest is well-defined). This approach avoids the unnecessarytask of finding a uniform upper bound for χ that is strictly less than 1, and that would be required at themoment of bounding the system uniformly. In all cases, γt ∈ [0, γo] due the IVP under consideration beingin its forward version (Lemma 3).

24Since Uaθ = uθ > 0, Uaa = ua = 1 and Uaθ = 1/2 > 0, it follows that Uaθ(Uaθ + UaaUaθ) > 0.

31

We conclude with three observations that distill from this theorem. First, the self-map

condition, while not affecting the order of T (γo) relative to a traditional one-dimensional

shooting case, is not vacuous either. In fact, since s0 = B(0) is the center of S, we have that

g(s)− s0 = B(χT (s, γo, 0))−B(0)−DT

ˆ T

0

f(bt(s, γo, 0))dt.

Thus, bounding B(χT (s, γo, 0)) − B(0) imposes an additional constraint relative to those

that ensure that the system is uniformly bounded (and which guarantee that the last term

in the previous expression is bounded too). In other words, the self-map condition reduces

the constant of proportionality in T (γo) ∈ O(1/γo).

Second, the set of times for which a LME is guaranteed to exist increases without bound

as γo ↘ 0: this is because the rate of growth of the system of ODEs scale with this

parameter, and so its solution converges to the full-information (v6, v8, β1, β2, β3, χ, γ) =

(0, 0, 0, 0, 0, 1, 0, 0), which is defined for all T > 0.25

Finally, the bound T (γo) is obtained under minimal knowledge of the system: it imposes

crude bounds that only use the degree of the polynomial vector f(b), and that do not exploit

any relationship between the coefficients. Thus, the proof technique is both general and

improvable, provided more is known about the system in specific settings.

5 Extensions

As noted in Remark 1, our model can be generalized to accommodate a quadratic terminal

payoff or to allow the long-run player to affect the public signal. To demonstrate, we first

explore a political setting a politician’s payoff depends on her terminal reputation, and a

then trading model a la Kyle (1985) exhibiting private monitoring of an insider’s trades.

5.1 Reputation for Neutrality

We consider an application in which the long-run player is an expert or politician with

career concerns. The politician has a hidden ideological bias θ and takes repeated actions

25Inspection of the ~β-ODEs in the previous BVP indicates that v6 and v8 are always appear multipliedby γ. Thus, we can instead look at the system with vi = γvi, i = 6, 8, and these ODEs scale with γ. Sincethe system is uniformly bounded, γ never vanishes, and we can recover vi, i = 6, 8.

32

— for example, adopting positions on critical issues26 or making campaign promises.27 She

receives utility from taking actions that conform to her bias but also from attaining a neutral

reputation at the end of the horizon; hence, she must trade off her ideological desires with

her career concerns.

We model this specification with

−ˆ T

0

e−rt(at − θ)2dt− e−rTψa2T

as the payoff for the long-run player, where ψ > 0 is common knowledge and governs the

intensity of career concerns, and a flow payoff of U(at, at, θ) = −(at − θ)2 for the myopic

player. Since the myopic player optimally chooses at = Mt at each t ∈ [0, T ], the long-run

player’s termination payoff is effectively −e−rTψM2T . The myopic player can be interpreted as

a decision-maker (or in reduced form, an electorate) whose actions are direct communication,

journalism, or opinion polls which convey his belief about the long-run player.

As in the leading by example application, we study the role of public feedback in deter-

mining learning and payoffs for the long-run player in equilibrium. Note that the direct effect

on payoffs of removing public feedback is negative: due to the concavity of the termination

payoff, greater uncertainty about the myopic player’s belief hurts the long-run player. How-

ever, an indirect effect runs the opposite direction. All else equal, the long-run player prefers

higher actions when her type is higher, and hence her equilibrium strategy attaches positive

weight to her type. But the concavity of the termination payoff implies that the greater

the perceived value of M , the greater the incentive the long-run player has to manipulate it

downward. Higher types therefore must offset higher beliefs from their perspectives, leading

to a negative history-inference effect, which dampens the signaling coefficient α. With re-

duced signaling, the belief is less volatile from an ex ante perspective, which improves payoffs

due to the concavity of the objective function.28 Indeed, provided the objective is not too

concave, the indirect effect dominates, and the politician is better off:

26Mayhew (1974) in a classic political science text outlines three kinds of activities congresspeople engagein for electoral reasons: advertising, credit claiming, and (as in the current model) position taking. Hedescribes the dynamic nature of position taking:

. . . it might be rational for members in electoral danger to resort to innovation. The form ofinnovation available is entrepreneurial position taking, its logic being that for a member facingdefeat with his old array of positions, it makes good sense to gamble on some new ones.

27Campaign promises may be costly either due to a politician’s honesty (Callander and Wilkie, 2007) orbecause the electorate might not reelect politicians who renege on promises (Aragones et al., 2007).

28It is easy to show that the ex ante expectation of M2T is γo−γT , so that greater learning by the myopic

player results in larger terminal losses for the long-run player.

33

Proposition 5. Suppose that ψ < σ2Y /γ

o and r = 0. Then, for all T > 0: (i) there are

unique LME in the public and no feedback cases, and (ii) learning is lower and ex ante payoffs

higher in the no feedback case.

Proposition 5 highlights one mechanism through which an expert might benefit from

committing to not following polls or journalism that publicly convey her reputation for bias.

The present environment is one of common values. Hence, one can establish the existence

of a LME in the interior version of this problem with analogous methods to those in Section

4.3. The only difference is that our baseline model had terminal conditions that were a func-

tion of χ exclusively, whereas the presence of a terminal payoff delivers a terminal condition

for β1 that also depends on γ according to

β1T = − ψγTσ2Y + ψγTχT

,

reflecting the fact that the incentive to manipulate the myopic player’s belief in the final

moment is decreasing in the precision of that belief. To the extent that B depending on γ

and χ is of class C1, our fixed point method goes through.29,30

5.2 Insider Trading

An asset with fixed fundamental value θ, is traded in continuous time until date T when its

fundamental value is revealed, ending the game. The long-run player, or insider, privately

observes θ prior to the start of the game. The myopic player has a technology which allows

him to obtain private, noisy signals of the insider’s trades, as in Yang and Zhu (2018). Both

players and a flow of noise traders submit continuous orders to a third party, the market

maker, who executes those trades at a price Lt, which is public information.

We depart from the baseline model along three dimensions. First, the myopic player’s

flow payoff depends on L according to ξ(θ−L)a− a2

2, where ξ ≥ 0, the interpretation being

that L is the action of the market maker.31 Second, the long-run player’s flow payoff is

29The only adjustment needed is in proving the self-map condition for g, where Bγ appears.30We can also study the case ψ < 0, where the long-run player wants to appear as extreme as possible at

the end of the horizon. In that case, the history-inference effect becomes positive; higher types forecast higherbeliefs by the myopic player and have greater incentive to further manipulate those beliefs. The history-inference effect thus amplifies the signaling coefficient, which benefits the long-run player by increasinglearning and hence the terminal reward. This effect reinforces the positive direct effect of greater uncertaintyin the presence of a convex terminal payoff, so the long-run player again prefers the environment with nofeedback. This result is valid for mildly convex rewards, as β1T is not well-defined if ψ is too negative.

31The quadratic loss term strengthens our non-existence result, as it limits the myopic player’s abilityto exploit the private information he acquires. The parameter ξ can then be interpreted as the size of themyopic player or (inverse of) his transaction costs.

34

simply (θ − Lt)at, i.e., it is linear in at. Finally, the public signal now includes the long-run

player’s action: dXt = (at + at)dt+ σXdZXt . Hence, the myopic player learns from both the

private monitoring channel and the public price.

Following the literature, we seek an equilibrium in which the informed trader reveals her

private information gradually over time through a linear strategy of the form (12). Hence,

we require that the coefficients of the insider’s strategy be C1 functions over strict compact

subsets of [0, T ).32 We can then apply Lemmas 2 and 3 to such sets.33

Clearly, when ξ = 0, the model reduces to the classic model of Kyle (1985) (see also Back

(1992)), and hence a LME with trading strategy of the form β3(θ − L) always exists. This

is not the case when ξ > 0.

Proposition 6. Fix ξ > 0. Then for all σY > 0, there does not exist a linear Markov

equilibrium of the insider trading game.

The intuition for this result is as follows. As the myopic player privately observes a

signal of the insider’s trades, he acquires private information about θ over time. The myopic

player’s own trades then carry further information to the market maker, beyond that which

the market maker learns from the insider alone. This introduces momentum into the law of

motion for the price from the insider’s perspective, measured by a term ξ(m− l) in the price

drift; the insider’s trading at any time not only causes an immediate price impact but also

sets forth continued future price impacts as the myopic player’s trades continue to inform the

market maker. These repeated price impacts via the myopic player make future trades less

attractive to the insider, thereby putting the insider in a “race against herself” and inducing

her to trade away all information in the first instant.

This result is intimately related to a non-existence result in Yang and Zhu (2018). In a

two-period model, they show that a linear equilibrium ceases to exist if the private signal

of a back-runner—a trader who only participates in the last round after receiving noisy

information of the informed player’s first-period trade—is sufficiently precise, situation in

which a mixed strategy equilibrium emerges. More generally, the existence problem relates

to how, with pure strategies, an informed player’s rush to trade depends on the number of

trading opportunities in certain settings. In this line, Foster and Viswanathan (1994) show,

in an asymmetric environment where one long-run trader’s information nests another’s, that

the better informed trader quickly trades the commonly known piece of information to exploit

32By not imposing this requirement over [0, T ], we maintain the possibility of full revelation of the insider’sinformation through an explosion of trades near the end of the game, as is standard in insider trading models.In addition, this requirement ensures that the total order can be “inverted” from the price, and hence it iswithout loss to make X public to all players.

33Specifically, the proof of Lemma 2 provides the learning ODEs for the case ν > 0, and it is easy to seethat the steps of Lemma 3 (with uθ = ξ, ua = 0) go through for this case.

35

her superior information only later on. While there are important differences between our

settings (the belief of the lesser informed player is, in their model, always known to the

first, and their common information exogenous) there is a common theme: once common

information is created (either exogenously or endogenously), there is a pressure to trade

quickly on it. Such pressure is increasing in the number of trading opportunities.34

6 Conclusion

We have examined the implications of a minimal—yet natural—departure from an extensive

literature on signaling games: namely, that the signal observed by a receiver is both noisy

and private. We showed that, unlike in settings where such a signal is public, the sender’s

history of play affects the informativeness of her actions at all points in time, and we explored

the learning and payoff implications of such history-inference effect in applications. In the

process, we have introduced an approach for establishing the existence of LME in dynamic

games of asymmetric learning. Let us now discuss three assumptions of the model: its

asymmetry, the presence of a myopic player, and the linear-quadratic-Gaussian structure.

The asymmetry of the environment studied indeed provides us with enough tractability,

in the sense that it allows us to “close” the set of states at the second order. If instead the

long-run player had a stochastic type, or access to an imperfect private signal, even higher-

order beliefs would become payoff-relevant states. While some economic environments may

feature some of these assumptions, a natural question that arises is whether we believe

economic behavior in such settings is effectively driven by such higher-order inferences.

Second, the presence of a myopic player is not a major technical limitation. In fact, most

of the results are derived for, or can be generalized to, continuous coefficients ~δ. With a

long-run “receiver” such coefficients solve ODEs capturing optimality and correct beliefs but

(i) no additional states are needed, and (ii) the fixed-point argument is applicable (to an

enlarged boundary value problem).

Finally, the linear-quadratic-Gaussian class examined is clearly a limited class. Yet, its

advantage lies in that it is a powerful framework for uncovering economic effects that are

likely to be key in other, more nonlinear, environments. From that perspective, the way in

which the inference of others’ private histories interacts with payoffs in shaping signaling,

along with the time-effects that learning has on incentives, seem to exhaust the set of effects

that we would expect to be of first order in other settings.

34In symmetric settings, Holden and Subrahmanyam (1992) show that intense trading occurs in early pe-riods between two identically informed traders, and Back et al. (2000) obtain the corresponding nonexistenceresult directly in continuous time.

36

Appendix A: Proofs for Section 2

Proofs for Section 2.1

Since player 2 attempts to match player 1’s action, we have

at = Et[β0t + β1tMt + β3tθ]

= β0t︸︷︷︸δ0t

+ (β1t + β3t)︸︷︷︸δ1t

Mt.

The HJB equation for player 1 is

rV (θ,m, t) = supa

{−(a− θ)2 − (a− at)2 + Λtµt(a)VM(θ,m, t) +

Λ2tσ

2Y

2VMM(θ,m, t) + Vt(θ,m, t)

},

(A.1)

where

Λt :=β3tγtσ2Y

µt(a) := a− β0t − (β1t + β3t)m.

To obtain the maximizer of the RHS of (A.1), we impose the first order condition

0 = −2(a− θ)− 2(a− β0t − (β1t + β3t)m) +β3tγt[v2t + 2mv4t + θv5t]

σ2Y

=⇒ 0 = −2(β0t + β1tm+ β3tθ − θ)− 2β3t(θ −m) +β3tγt[v2t + 2mv4t + θv5t]

σ2Y

, (A.2)

where in the second line we have used that the maximizer must be a∗ := β0t + β1tm+ β3tθ.

Since A.2 must hold for all (θ,m, t) ∈ R2× [0, T ], the coefficients on θ, m and t must vanish,

and we obtain

v2t :=2σ2

Y β0t

β3tγt(A.3)

v4t :=σ2Y (β1t − β3t)

β3tγt(A.4)

v5t :2σ2

Y (2β3t − 1)

β3tγt. (A.5)

37

Since viT = 0 for all i ∈ {0, 1, . . . , 5}, (A.3)-(A.5) imply the terminal values of the coefficients

β0T := 0 (A.6)

β1T :=1

2(A.7)

β3T :=1

2, (A.8)

which are also the myopic equilibrium coefficients, that is, the coefficients for the game in

which both players act myopically at each instant (while still updating their beliefs).

Substituting a∗ into (A.1) yields

0 = −r[v0t + v1tθ + v2tm+ v3tθ2 + v4tm

2 + v5tθm]

+−[β0t +mβ1t + θ(β3t − 1)]2 − (m− θ)2β23t −

(m− θ)[v2t + 2mv4t + θv5t]β23tγt

σ2Y

+v4tβ

23tγ

2t

σ2Y

+ v0t + v1tθ + v2tm+ v3tθ2 + v4tm

2 + v5tθm,

(A.9)

which again must hold for all (θ,m, t) ∈ R2× [0, T ]. Using (A.3)-(A.5) and their derivatives,

we eliminate (v2t, v4t, v5t, v2t, v4t, v5t) from (A.9) to obtain a new equation. As the constant

term and the coefficients on θ,m, θ2,m2 and θm in this new equation must vanish, we obtain

along with (A.13) a system of ODEs for (v0, v1, v3, β0, β1, β3, γ). This system contains a

subsystem for (β0, β1, β3, γ):

β0t = 2rβ0tβ3t (A.10)

β1t = β3t

[r(2β1t − 1) +

β1tβ3tγtσ2Y

](A.11)

β3t = β3t

[r(2β3t − 1)− β1tβ3tγt

σ2Y

](A.12)

γt = −(β3tγt)2

σ2Y

. (A.13)

Hence, a linear Markov equilibrium must solve the following boundary value problem:

(A.10)-(A.13) with boundary conditions γ0 = γo and (A.6)-(A.8). We show that a solution

to this boundary value problem always exists.

It is useful to transform this boundary value problem into an initial value problem by

reversing the direction of time and making a guess for the initial value of γ. Hence, we define

the backward system

38

β0t = −2rβ0tβ3t (A.14)

β1t = β3t

[r(1− 2β1t)−

β1tβ3tγtσ2Y

](A.15)

β3t = β3t

[r(1− 2β3t) +

β1tβ3tγtσ2Y

](A.16)

γt =β2

3tγ2t

σ2Y

, (A.17)

with initial conditions β00 = 0, β10 = β30 = 12

and γ0 = γF ≥ 0.

Define Bpubt := β1t + β3t.

Lemma A.1. In any solution to the backward system over an interval [0, T ], if γF > 0, then

(i) Bpubt = 1 for all t ∈ [0, T ], (ii) β3t ∈ (1/2, 1) and β1t ∈ (0, 1/2) for all t ∈ (0, T ], (iii)

β3 is monotonically increasing while β1 is monotonically decreasing, and (iv) γ is strictly

increasing. If γF = 0, then β1t = β3t = 12

and γt = 0 for all t ∈ [0, T ]. For any γF , β0 = 0.

Proof of Lemma A.1. We first claim that if a solution exists over some interval [0, T ], then

β3t > 0 for all t ∈ [0, T ]. To see this let fβ3(t, β3t) denote the RHS of (A.16). Letting xt := 0

for all t ∈ [0, T ], we have β30 = 1/2 > x0 and β3t − fβ3(t, β3t) = 0 = xt − fβ3(t, xt). By the

comparison theorem in Teschl (2012, Theorem 1.3), the claim holds.

Next, we define Bpubt := β1t + β3t and show that Bpub = 1. Adding (A.15) and (A.16)

yields

Bpubt = 2rβ3t(1−Bpub

t ),

which, given the initial condition and any β3t, has solution of the formBpubt = 1−Ce−

´ t0 2rβ3sds.

Since Bpub0 = 1 = 1− C, we have C = 0 and Bpub = 1.

Hence we can rewrite the β3t ODE as

β3t = β3t

[r(1− 2β3t) +

β3t(1− β3t)γtσ2Y

]. (A.18)

We now show that β3 < 1. Let fβ3(t, β3t) now denote the RHS of (A.18), and define

xt := 1 for all t ∈ [0, T ]. Then x0 = 1 > β30 = 12, and β3t−fβ3(t, β3t) = 0 ≤ r = xt−fβ3(t, xt),

so by the comparison theorem, the claim holds. Since β1t = 1− β3t, we have β1 > 0.

Consider the case γF = 0. Since β3 > 0, γt = 0 is the unique solution to (A.17). Letting

zt := β3t− β1t, we have zt = −2rβ3tzt. As this is a linear ODE with initial condition z0 = 0,

39

the unique solution is zt = 0 for all t ∈ [0, T ]. Since β1t + β3t = 1, we have β1t = β3t = 1/2

for all t ∈ [0, T ], proving the claim in the proposition statement.

Next consider the case γF > 0. Since β3 > 0, (A.17) implies γ is strictly increasing, and

hence γt > 0 for all t ∈ [0, T ]. Now whenever β3t = 12, we have β3t = 1

2

[0 + γt

4σ2Y

]> 0, and

thus β3t > 1/2 for all t ∈ (0, T ]. Since β1t = 1− β3t, we have β1t < 1/2 for all such t.

We now turn to (iii). Since β1t + β3t = 0 for all t ∈ [0, T ], it suffices to show that β3t > 0

for all t ∈ [0, T ]; in turn, it suffices to show that Ht := r(1 − 2β3t) + β3t(1−β3t)γtσ2Y

> 0 for all

t ∈ [0, T ], where β3t = β3tHt. Now H0 = γ04σ2Y> 0, and with algebra it can be shown that if

Ht = 0, Ht =(1−β3t)β3

3tγ2t

σ4Y

> 0. It follows that Ht > 0 for all t ∈ [0, T ], as desired.

Finally, note that in all cases, we have β3 > 0, so the unique solution to (A.14) consistent

with the initial condition β00 = 0 is β0 = 0.

Proof of Proposition 1. Existence of LME reduces to existence of a solution to the boundary

value problem defined by (A.10)- (A.13) with associated boundary conditions. Using the

uniform bounds in Lemma A.1, we can obtain existence using an analogous backward shoot-

ing argument to the one used in Theorem 1. It is then straightforward to verify that the

remaining value function coefficients are well-defined and that the HJB equation is satisfied.

The remaining claims have been established in Lemma A.1.

We conclude with a derivation of the closed form solution for the undiscounted case, to

which we refer in Propositions 3 and 4.

Lemma A.2. For r = 0, the leading by example game has a unique LME for the public case,

and (β0, β1, β3, γ) can be expressed in closed form.

Proof. We derive the following closed form solution:

γt =γT2

+1

2γT− T−t

σ2Y

(A.19)

β3t =1

2− γT (T−t)2σ2Y

, where (A.20)

γT =γoT + 2σ2

Y −√

(γoT )2 + 4σ4Y

T(A.21)

and where β0 ≡ 0 and β1 ≡ 1− β3.

Observe that β3tγt + β3tγt =β23tγ

2t

σ2Y. Hence, define Πt := β3tγt, which has ODE Πt =

Π2t

σ2Y

40

with initial condition Π0 = β30γF = γF/2; its solution is

Πt =1

2γFpub− t

σ2Y

.

Substitute Π into (A.17) to obtain

γt =1

σ2Y

12

γFpub− t

σ2Y

2

=⇒ γt = Cγ +1

2γFpub− t

σ2Y

.

As γ0 = γFpub, we have Cγ = γFpub/2 and thus

γt =γFpub

2+

12

γFpub− t

σ2Y

(A.22)

Moreover,

γT = γo =γFpub

2+

12

γFpub− T

σ2Y

,

which is equivalent to the quadratic

=⇒ T

2

(γFpub

)2 −(γoT + 2σ2

Y

)γFpub + 2σ2

Y γo = 0.

The quadratic on the LHS is convex and evaluates to 2σ2Y γ

o > 0 at γFpub = 0 and evaluates

to − (γo)2 T/2 < 0 at γFpub = γo, so there is a unique solution in (0, γo) which in the forward

system is γT as in the proposition statement. Substituting this into (A.22) and returning to

the forward system by replacing t with T − t yields γt in the forward system. It is easy to

verify that γt > 0 for all t.

Lastly, we have (in the forward system) β3t = Πt/γt = 1

2−γFpub

(T−t)

2σ2Y

and β1t = 1− β3t.


Proof of Lemma 1. Let β0t + β1tMt + β3tθ denote the long-run player’s strategy. Thus,

Et[at] = α0t + α3tMt, where α0 = β0 + β1(1 − χ)m0 and α3 = β3 + χβ1. Then, dMt =

41

α3tγ1tσ2Y

[dYt − (α0t + α1tMt)dt], so letting R(t, s) = exp(−´ ts

α23uγ1uσ2Y

du)

Mt = m0R(t, 0) +

ˆ t

0

R(t, s)α3sγ1s

σ2Y

[(a2t − α0t)dt+ σY dZYt ]

⇒Mt = m0R(t, 0) +

ˆ t

0

R(t, s)α3sγ1s

σ2Y

(a2t − α0t)dt

˙γ1t = −γ21tα

23t

σ2Y

. (A.23)

On the path of play, however, a2t = β0t + β1tMt + β3tθ, so we obtain

Mt = m0R(t, 0) + θ

ˆ t

0

R(t, s)α3sγ1s

σ2Y

[−(1− χs)β1sm0 + β1sM2s + β3sθ]ds

where we used that β0s − α0s = −(1− χs)β1sm0. In particular,

dMt =

(−M2t

[α3sγsσ2Y

(α3s + β1s)

]+α3sγsσ2Y

[−β1s(1− χs)m0 + β3sθ]

)dt,

and so, letting R(t, s) = exp(−´ tsα3uγ1uσ2Y

(α3u − β1u)du),

Mt = m0

(R(t, 0)−

ˆ t

0

R(t, s)α3sγsσ2Y

β1s(1− χs)ds)

+ θ

ˆ t

0

R(t, s)α3sγsσ2Y

β3sds.

From here

χt =

ˆ t

0

R(t, s)α3sγsσ2Y

β3sds and 1− χt = R(t, 0)−ˆ t

0

R(t, s)α3sγsσ2Y

β1s(1− χs)ds.

Critically, observe that the second constraint is a direct consequence of the first. Specifically,

adding and subtracting β3s and noticing that α3s − β1s = β3s − β1s(1− χs)we can write

−ˆ t

0

R(t, s)α3sγsσ2Y

β1s(1− χs)ds =

ˆ t

0

R(t, s)α3sγsσ2Y

[α3s − β1s]ds− χt

=

ˆ t

0

dR(t, s)

dsds− χt

= 1− R(t, 0)− χt (A.24)

where in the last equality we used that R(t, t) = 1.

With this in hand, the relevant constraint is the first. In differential form, and using that

42

α3t = β3 + β1χ,

χt = −α3tγtσ2Y

[α3t − β1t]χt +α3tβ3tγtσ2Y

=(β3t + β1tχt)

2γtσ2Y

(1− χt) =α2

3tγ1t

σ2Y

(1− χt).

Using the exact same arguments in the proof of Lemma 3, we conclude that the re-

sulting (χ, γ) system admits a unique solution with the same properties as in that lemma.

Furthermore, it is easy to check that 1− γt/γo satisfies the χ-ODE, concluding the proof.

We now turn to characterizing equilibrium for the no feedback case. From Lemma 1,

given a conjecture by the follower about (β0, β1, β3), the variance of the follower’s belief

evolves deterministically as γt = −α2tγ

2t

σ2Y

and χ ≡ 1− γ/γo.The follower matches the expectation of the leader’s action by playing

at = E2t [β0t + β1tMt + β3tθ]

= Et[β0t + β1t (m0[1− χt] + χtθ) + β3tθ]

= β0t + β1tm0(1− χt)︸︷︷︸δ0t

+ (β1tχt + β3t)︸︷︷︸δ1t

Mt.

Define µ1t := αtγt/σ2Y , µ0t = −µ1t[β0t + β1tm0(1 − χt)] and µ2t = −αtµ1t, where αt =

β1tχt + β3t.

The HJB equation is

rV (θ,m, t) = supa{ −

(a2 − 2a[δ0t + δ1tm] + δ2

0t + 2δ0tδ1tm+ δ21t[γtχt +m2]

)︸︷︷︸=Et[(a−at)2]

−(a− θ)2 + (µ0t + aµ1t +mµ2t)Vm(θ,m, t) + Vt(θ,m, t)},

(A.25)

where we have used that Et[Mt

]= Mt = m and Et

[M2

t

]=(Et[Mt

])2

+ Var[Mt

]=

m2 + γtχt.

Imposing the first order condition on the RHS of (A.25) and evaluating at a = β0t +

β1tm+ β3tθ, yields

0 = −4[β0t + β1tm+ β3tθ] + 2θ +αtγt[v2t + 2mv4t + θv5t]

σ2Y

+ 2[β0t + m0β1t(1− χt) +mαt].

(A.26)

43

Matching coefficients, we obtain

v2t =2σ2

Y [β0t − m0β1t(1− χt)]αtγt

(A.27)

v4t =−σ2

Y [β3t − β1t(2− χt)]αtγt

(A.28)

v5t =2σ2

Y [2β3t − 1]

αtγt, (A.29)

provided that αt, γt > 0, which we will confirm below. Setting viT = 0 for i = 2, 4, 5 yields

the terminal conditions stated below. As in the public benchmark, these conditions also

yield the equilibrium coefficients for the case where both players behave myopically.

We now evaluate (A.25) at a = β0t+β1tm+β3tθ, using (A.27)-(A.29) and their derivatives

to eliminate (v2t, v4t, v5t, v2t, v4t, v5t) and in turn using (A.33) and (A.34) to eliminate γt and

χt. The resulting equation gives a system of ODEs for (v0, v1, v3, β0, β1, β3, γ), which contains

the following subsystem for (β0, β1, β3, γ):

β0t =αt

2σ2Y

{rσ2

Y β0t(2− χt)− rσ2Y (1− χt) + 2γtβ

21t(1− χt)

}(A.30)

β1t =αt

2σ2Y

{−rσ2

Y + 2β1t[β3tγt + rσ2Y (2− χt)]− 2β2

1tγt(1− χt)}

(A.31)

β3t =αt

2σ2Y

{−rσ2

Y (2− χt)− 2β3t[β1tγt − rσ2Y (2− χt)]

}(A.32)

γt = −α2tγ

2t

σ2Y

(A.33)

χt =γtα

2t (1− χt)σ2Y

(A.34)

with boundary conditions γ0 = γo, β0T = 1−χT2(2−χT )

, β1T = 12(2−χT )

, β3T = 12, γ0 = γo and

χ0 = 0. The ODE for α is

αt = −rαt[1− αt(2− χt)], (A.35)

and its terminal value is αT = 12−χT

.

Let (βm0t , βm1t , β

m3t , α

mt ) =

(1−χt

2(2−χt) ,1

2(2−χt) ,12, 1

2−χt

)denote the myopic coefficients, that is

the myopic equilibrium given the variance γ induced by the dynamic candidate equilibrium

strategy.

Proof of Proposition 2. We work with the backward system, using an initial guess γ0 = γF

to formulate an initial value problem parameterized by γF .

44

The backward system for the coefficients is

β0t =αt

2σ2Y

{−rσ2

Y β0t(2− χt) + rσ2Y (1− χt)− 2γtβ

21t(1− χt)

}(A.36)

β1t =αt

2σ2Y

{rσ2

Y − 2β1t[β3tγt + rσ2Y (2− χt)] + 2β2

1tγt(1− χt)}

(A.37)

β3t =αt

2σ2Y

{rσ2

Y (2− χt) + 2β3t[β1tγt − rσ2Y (2− χt)]

}(A.38)

αt = rαt[1− αt(2− χt)] (A.39)

γt =α2tγ

2t

σ2Y

(A.40)

with boundary conditions β00 = 1−χ0

2(2−χ0), β10 = 1

2(2−χ0), β30 = 1

2and α0 = 1

2−χ0> 0.

By Teschl’s comparison theorem, α > 0 in any solution to the IVP. It follows that for

γF = 0, the IVP has a unique solution (β0, β1, β3, γ,χ) = (0, 1/2, 1/2, 0, 1). By continuity,

suppose now that γF > 0 is sufficiently small that there exists a solution to the IVP. By the

same argument as in the proof of Lemma A.1, γ is increasing.

Given such (γt)t∈[0,T ], we can write the right-hand side of the α-ODE as a function of the

form fα(t, α) that is of class C1. Using that χt = 1 − γt/γo, observe that, in the backward

system,

d

dt

(1

2− χt

)− fα(t, 1/(2− χt)) = − γt

γo(1 + γt/γo)2< 0 = αt − fα(t, αt)

where the first inequality follows from γt being increasing in the backward system. Moreover,

α0 = 12−χ0

. The comparison theorem allows us to conclude that αt ≥ 1/(2−χt), and in turn

αt ≤ 0 (and hence αt ≥ 0 in the forward system), for all t ∈ [0, T ] with both inequalities

strict for t ∈ (0, T ] (t ∈ [0, T ) in the forward system) if and only if r > 0. It follows that for

all t ∈ (0, T ], αt < α0 = 12−χ0

< 1.

Now by simple addition of the ODEs, we obtain that B∞t := β0 + β1 + β3 satisfies

B∞t =αt

2σ2Y

{2rσ2

Y (2− χt)[1−B∞t ]}, with B∞0 = 1.

It is easy to see that the unique solution is B∞t = 1 (one can integrate the previous ODE

between 0 and t small, and argue by contradiction using the continuity of B∞).

Next, we establish uniform bounds on β1 and β3 (and hence β0). Observe that β30 =

βm30 = 1/2 and β10 = βm10 > 0; we first argue that β3t >12

and β1t > 0 for all t ∈ (0, T ].

By way of contradiction, suppose that τ ∈ (0, T ] is the first strictly positive time either of

these properties fails. If β1τ = 0, then β1τ = rβ3τ2

> 0, a contradiction; hence β1τ > 0. But

45

if β3τ = 1/2, then β3τ = β1τγτ (1+2β1τχτ )

4σ2Y

> 0, also a contradiction. We conclude that β3t >12

and β1t > 0 for all t ∈ (0, T ].

By definition, αt = β1tχt + β3t and thus β3t ≤ αt < 1. Finally, note that β10 = βm10 < 1

and the RHS of the β1 ODE can be written as fβ1(t, β1) of class C1, and with algebra it can

be shown that

βm1t − fβ1(t, βm1t) =γt

4σ2Y (2− χt)4

(β3t[2− χt]− [1− χt]) (2β3t[2− χt] + χt)

> 0 = β1t − fβ1(t, β1t),

so by the comparison theorem, we have β1t < βm1t < 1 for all t ∈ (0, T ].

By an analogous one-dimensional shooting argument as in the public case, we obtain

existence of a solution to the BVP and hence existence of an LME.

The final claim to prove is that as T → ∞, αT → 1, for which we use the forward

system. Recalling that α > 1/2, we have γT → 0 as T → ∞, and thus χT → 1 and

αT = 1/(2− χT )→ 1.

We refer to the following result in Propositions 3 and 4.

Lemma A.3. For r = 0, the leading by example game has a unique LME for the no feedback

case, expressed in closed form as follows:

β1t =γo[(γo + γT )2σ2

Y − (T − t)(γo)2γT ]

(γo + γT )[2σ2Y (γo + γT )2 − (T − t)(γo)2γT ]

(A.41)

β3t =σ2Y (γo + γT )2

2σ2Y (γo + γT )2 − (T − t)(γo)2γT

(A.42)

αt =γo

γo + γT(A.43)

γt =γTσ

2Y (γo + γT )2

σ2Y (γo + γT )2 − (T − t) (γo)2 γT

, (A.44)

for all t ∈ [0, T ], where χt = 1 − γt/γo and γT ∈ (0, γo) is the unique solution in (0, γo) to

the cubic γTT (γo)3 + (γT − γo) (γT + γo)2 σ2Y = 0, and β0 ≡ 1− β1 − β3.

Proof. We work with the backward system. First note that by setting r = 0 in (A.39), α

must be constant and equal to its initial value α0 = 12−χ0

. Next, recall that by Lemma 1,

χt = 1 − γtγo

, so χ0 = 1 − γFNFγo

and thus αt = α = γo

γFNF+γofor all t ∈ [0, T ]. Next, note that

the ODE γt =α2γ2tσ2Y

given an initial value γFNF has solution γt =γFNF σ

2Y

σ2Y −γ

FNF

(γo

γFNF

+γo

)2

t

; switching

back to the forward system by replacing t with T − t yields the expression in the original

46

statement. Now the terminal condition γT = γo is equivalent to the following cubic equation

for γFNF :

q(γFNF ) := γFNFT (γo)3 +(γFNF − γo

) (γFNF + γo

)2σ2Y = 0. (A.45)

Note q(γFNF

)> 0 for γFNF ≥ γo and q

(γFNF

)≤ 0 for γFNF ≤ 0, so all real roots must lie in

(0, γo). Now any root to the cubic must satisfy

T (γo)3

γo − γFNF= σ2

Y

(γFNF + γo)2

γFNF. (A.46)

The LHS of (A.46) is strictly increasing for γFNF ∈ (0, γo) while the RHS is strictly decreasing

in this interval, so q has a unique real root. Returning to the β1 ODE, using α = β1χ+ β3,

we have β1 = αγtβ1tσ2Y

(α−β1t). This ODE can be solved by integration after moving β1(α−β1)

to the LHS, and with algebra, one obtains (in the forward system) the expression in the

proposition statement. One then obtains β3t from these known quantities using β3t = α −β1tχt.


Here we prove Propositions 3 and 4. Since these results require some preliminary lemmas,

we organize them into separate sections.

Proof of Proposition 3

We treat the patient and myopic cases one at a time. The following lemma compares signaling

and learning between the public and no feedback cases for r = 0.

Lemma A.4. For r = 0 and all values of T, γo and σY , more information is revealed in the

no feedback case than in the public benchmark case: γFpub > γFNF . In the public benchmark,

there is more aggressive signaling early in the game and less aggressive signaling later in the

game, relative to the no feedback case; i.e., there exists T ∗ ∈ (0, T ) such that βpub3t > αNF if

and only if t < T ∗.

Proof. For the first claim, recall that γFNF is the unique positive root of the cubic equation

q(γF ) = 0 defined in (A.45), where for γ > 0, q(γ) > 0 iff γ > γFNF . Hence, to prove the

47

claim, it suffices to show that q(γFpub) > 0. By direct calculation, we have

q(γFpub) = (γo)3

(Tγo + 2σ2

Y −√

(Tγo)2 + 4σ4Y

)+σ2Y

T 3

(2σ2

Y −√

(Tγo)2 + 4σ4Y

)(2Tγo + 2σ2

Y −√

(Tγo)2 + 4σ4Y

)2

= (γo)4Tq2(S), where

q2(S) := 1 + 2S −√

1 + 4S2 + S(

2S −√

1 + 4S2)(

2 + 2S −√

1 + 4S2)2

and

S :=σ2Y

Tγo.

We now show that q2(S) > 0 for all S > 0 (observe that q2(0) = 0). Let R(S) = 1 + 2S −√1 + 4S2; it is straightforward to verify that R(0) = 0 and that for all S ≥ 0, R′(S) > 0 and

R(S) < 1. Moreover, the inverse of R is the function S : [0, 1) → [0,∞) characterized by

S(R) := R(2−R)4(1−R)

. Hence, by change of variables, q2(S) > 0 for all S > 0 iff q3(R) > 0, where

q3(R) := R− S(R)(1−R)(R + 1)2.

Now for R ∈ [0, 1),

q3(R) > 0

⇐⇒ S(R) =R(2−R)

4(1−R)<

R

(1−R)(R + 1)2

⇐⇒ q4(R) := (2−R)(R + 1)2 < 4.

It is straightforward to verify that over the interval [0, 1], q4(R) attains its maximum value

of 4 at R = 1, and tracing our steps backwards this implies that q(γFpub) > 0, so γFpub > γFNF ,

proving the first claim.

For the second claim, using the forward system, since βpub3T = 12< αNF and βpubt is mono-

tonically decreasing, it suffices to show that βpub30 > αNF . Using the associated expressions

from Lemmas A.2 and A.3, this is equivalent to

1

2− γFpubT

2σ2Y

>γo

γo + γFNF

⇐⇒ γ := γo

(1−

γFpubT

2σ2Y

)< γFNF .

48

It suffices to show that q(γ) := T γ(γo)3 + (γ − γo)(γ + γo)2σ2Y < 0. Recalling that

γpubF =γoT + 2σ2

Y −√

(γoT )2 + 4σ4Y

T

one can show that

q(γ) =(γo)4T [−γoT +

√(γoT )2 + 4σ4

Y ]

2σ2Y

[1−

2σ2Y − (γoT −

√(γoT )2 + 4σ4

Y )

2σ2Y

]

= −T (γo)4

2σ4Y

[(Tγo)2 + 2σ4

Y − Tγo√

(Tγo)2 + 4σ4Y

].

The expression in square brackets can be written as x+y2−√xy > 0 where x = (Tγo)2 > 0

and y = (Tγo)2 + 4σ4Y > 0, and thus q(γ) < 0, concluding the proof.

Continuing toward the proof of Proposition 3, we now handle the myopic case. We begin

by deriving the solution for the public and no feedback cases for a myopic leader.

Lemma A.5. Suppose the leader is myopic. In the LME for the public case, β3 = 1/2 and

γpubt =4σ2Y γ

o

4σ2Y +γot

. In the LME for the no feedback case, αt = γo

γ0+γNFt, where γNFt is defined

implicitly as the unique solution in (0, γo] of the equation 2 ln(γNFt /γo)−γo/γNFt +γNFt /γo =

−γotσ2Y

.

Proof. We first consider the public benchmark case, where in the myopic solution, β3t = 1/2,

and thus (in the forward system)

γpubt = −β23t(γ

pubt )2

σ2Y

= −(γpubt )2

4σ2Y

=⇒ γpubt =4σ2

Y γo

4σ2Y + γot

,

and thus upubt =γpubt

2=

2σ2Y γ

o

4σ2Y +γot

.

In the myopic solution to the no feedback case, αt = 12−χt = 1

1+γNFt /γo, where γNFt solves

the ODE

γNFt = −α2t

(γNFt

)2

σ2Y

= − 1

σ2Y

(γoγNFtγo + γNFt

)2

(A.47)

=⇒ 2γNFtγNFt

+ γoγNFt

(γNFt )2 +

γNFtγo

= − γo

σ2Y

. (A.48)

By integrating both sides of (A.48) and using that γNF0 = γo to pin down the constant of

49

integration, we obtain (γNFt )t∈[0,T ] to (A.47) solves

2 ln(γNFt /γo)− γo/γNFt + γNFt /γo = −γot

σ2Y

. (A.49)

To verify that γNFt ∈ (0, γ0] is well-defined as such, define f : (0, 1]→ R by

f(y) := 2 ln(y)− 1/y + y,

and note that f(y) is strictly increasing as f ′(y) = (1 + 1/y)2 > 0, and moreover, f(1) =

0 ≥ −γotσ2Y

while limy→0 f(y) = −∞ < −γotσ2Y

. It follows that for all t ∈ [0, T ], γNFt ∈ (0, γo] is

uniquely determined by (A.47).

Lemma A.6. For the myopic case, γPubt > γNFt for all t ∈ (0, T ].

Proof. Observe that γNF0 = γPub0 = γo. Solving the ODEs for γPub and γNF by integration,

and using that αt ≥ β3t = 1/2 with strict inequality for all t > 0 then yields the result.

The last step toward proving Proposition 3 is to establish uniform convergence of solu-

tions to the myopic solutions as r → ∞. For arbitrary T > 0 and r ≥ 0, let BVPPub(r)

denote the boundary value problem for (β1, β3, γ) defined by (A.31)-(A.13) and the associated

boundary conditions, parameterized by r, and likewise let BVPNF (r) denote the boundary

value problem for (α, γ, χ) defined by (A.33)-(A.35) and the associated boundary conditions.

Ξpub := {(β1, β3, γ) ∈ C1([0, T ])3 such that (β1, β3, γ) solves BVPPub(r) for some r ≥ 0}

ΞNF := {(α, γ, χ) ∈ C1([0, T ])3 such that (α, γ, χ) solves BVPNF (r) for some r ≥ 0}.

Lemma A.7. The families {(β1, β3, γ) : (β1, β3, γ) ∈ Ξpub} and {(α, γ, χ) : (α, γ, χ) ∈ ΞNF}are uniformly bounded, and hence Ξpub and ΞNF are equicontinuous.

Proof. We begin with the public case. Recall that Ξpub is uniformly bounded, and in par-

ticular, we have (β1t, β3t, γt) ∈ [0, 1/2] × [1/2, 1] × [0, γo] for all (β1, β3, γ) ∈ Ξpub and all

t ∈ [0, T ]. It follows that |γt| =β23tγ

2t

σ2Y≤ (γo)2

σ2Y

. We now establish a uniform bound on β3. If

we define βm3 := 1/2 and βf3 = β3 − βm3 , we have from the (backward system) β3 ODE

βf3t = β3t = β3t[−2rβf3t + β3t(1− β3t)γt/σ2Y ],

which is linear in βf3 . Solving this ODE and multiplying through by r, we obtain

rβf3t =

ˆ t

0

re−r´ ts 2β3uduβ2

3s(1− β3s)γs/σ2Y ds.

50

Now |β23s(1− β3s)γs/σ

2Y | ≤ gpub := γo

σ2Y

. Moreover, 2β3u ≥ 1/2, and thus

|rβf3t| ≤ gpubˆ t

0

re−r´ ts 2β3ududs ≤ gpub

ˆ t

0

re−r(t−s)ds = gpub(1− e−rt) < gpub.

It follows that

|β3t| = |βf3t| = |β3t[−2rβf3t + β3t(1− β3t)γt/σ2Y ]|

≤ | − 2β3t| · |rβf3t|+ |β23t(1− β3t)γt/σ

2Y |

≤ 1 · gpub + gpub,

which is the desired uniform bound as gpub is independent of r. Now since β1 + β3 ≡ 1, |β1t|is also uniformly bounded above by 2gpub. Hence we have established uniform bounds on the

derivatives (β1, β3, γ) for (β1, β3, γ) ∈ Ξpub, and thus Ξpub is equicontinuous.

Next, we turn to the no feedback case, where we recall the uniform bounds αt ∈ [1/(2−χt), 1] ⊂ [1/2, 1], γt ∈ [0, γo] and χt ∈ [0, 1]. Immediately, we have |γt| = |α

2tγ

2t

σ2Y| ≤ (γo)2

σ2Y

, and

since χ ≡ 1− γ/γo, |χt| = | − γt/γo| ≤ γo

σ2Y

=: gNF . We now uniformly bound αt.

Set αmt := 1/(2− χt) and αft := αt − αmt , and note that αmt = χt/(2− χt)2. We

αft := αt − αmt = −rαt(2− χt)αft − χt/(2− χt)2,

which is linear in αf . As in the public case, solving this ODE and multiplying through by r

yields

rαft =

ˆ t

0

re−r´ ts αu(2−χu)du[−χs/(2− χs)2]ds

=⇒ |rαft | ≤ˆ t

0

re−r´ ts αu(2−χu)du|χs/(2− χs)2|ds.

Now |χs/(2− χs)2| ≤ |χs| ≤ gNF as noted above, so

|rαft | ≤ gNFˆ t

0

re−r´ ts αu(2−χu)duds

≤ gNFˆ t

0

r−r(t−s)ds = gNF (1− e−rt) < gNF ,

51

where we have used that αu ≥ 1/(2− χu) =⇒´ tsαu(2− χu)du ≥ (t− s). We now have

|αt| = | − rαt(2− χt)αft | = |rαft | · |αt| · |2− χt|

≤ gNF · 1 · 2.

We have shown that (α, γ, χ) are uniformly bounded for (α, γ, χ) ∈ ΞNF , and thus ΞNF is

equicontinuous.

Let (βpub,∞1 , βpub,∞3 , γpub,∞) and (αNF,∞, γNF,∞, χNF,∞) denote the myopic equilibrium co-

efficients for the public and no feedback cases, respectively.

Proposition A.1. Fix any T > 0 and let {rn}∞n=1 be a sequence with limn→∞ rn = ∞.

Let {(βpub,n1 , βpub,n3 , γpub,n)}∞n=1 and {(αNF,n, γNF,n, χNF,n)}∞n=1 be sequences of solutions to

BVPPub(rn) and BVPNF (rn), respectively. Then uniformly, (βpub,n1 , βpub,n3 , γpub,n)→ (βpub,∞1 , βpub,∞3 )

and (αNF,n, γNF,n, χNF,n)→ (αNF,∞, γNF,∞, χNF,∞).

Proof. First, note that rn → ∞ implies that rn = 0 for at most finitely many n, so there

exists N such that rn > 0 for all n ≥ N ; it suffices to consider only such n. In the public

benchmark, recall that βpub,∞1 = βpub,∞3 = 1/2. We have β3 ≥ 1/2 > 0, so the β3 ODE can

be rearranged to obtain

βpub,n3t = 1/2 +1

rn

[(βpub,n3t )2(1− βpub,n3t )γpub,nt

2βpub,n3t σ2Y

− βpub,n3t

2βpub,n3t

].

Since βpub,n3t ≥ 1/2, the expression in brackets is uniformly bounded in absolute value by some

constant Kpub (independent of rn and t). Hence βpub,n3 converges pointwise to 1/2 = βpub,∞3 .

Since βpub,n1 ≡ 1 − βpub,n3 , βpub,n1 converges pointwise to 1/2 = βpub,∞1 . As [0, T ] is compact

and the sequence {(βpub,n1 , βpub,n3 )}∞n=1 is equicontinuous by Lemma A.7, we apply Lemma 39

in Royden (1988, p. 168) and obtain uniform convergence for βpub1 and βpub3 . Finally, we

prove uniform convergence for γpub, by proving pointwise convergence and invoking the same

result to obtain uniform convergence. Note that for i ∈ N ∪ {∞}

γpub,it =γoσ2

Y

σ2Y + γo

´ Tt

(βpub,i3s )2ds.

Since for n ∈ N, the functions (βpub,n3s )2 are bounded uniformly by a constant and converge

pointwise to (βpub,∞3 )2, the dominated convergence theorem (Royden, 1988, p. 267, Theorem

52

16) implies´ Tt

(βpub,n3s )2ds→´ Tt

(βpub,∞3s )2ds and thus for all t ∈ [0, T ] pointwise,

γpub,nt → γoσ2Y

σ2Y + γo

´ Tt

(βpub,∞3s )2ds= γpub,∞t .

By Lemma A.7, the sequence of γpub,n is equicontinuous, and thus Royden (1988, p. 168,

Lemma 39) gives uniform convergence to γpub,∞.

Next, consider the no feedback case. By Royden (1988, p. 169, Theorem 40), there exists

a subsequence of {(αNF,n, γNF,n, χNF,n)}∞n=1 indexed by {k(n)}∞n=1 which converges uniformly

on [0, T ] to some limit (α∗, γ∗, χ∗). We argue that (α∗, γ∗, χ∗) = (αNF,∞, γNF,∞, χNF,∞), and

thus the original sequence converges uniformly to

(αNF,∞, γNF,∞, χNF,∞).

Using a similar dominated convergence argument to before, we have that pointwise

γNF,k(n)t → γoσ2

Y

σ2Y +γo

´ Tt (α∗)2ds

, so γ∗t =γoσ2

Y

σ2Y +γo

´ Tt (α∗)2ds

and χNF,k(n)t → 1−γ∗t /γo. Since χNF,k(n) →

χ∗ uniformly, we have χ∗ = 1− γ∗/γo.Let αm,nt := 1/(2 − χNF,nt ). Since αNF,nt > 0 for all n, the α ODE can be rearranged to

obtain

αNF,nt − αm,nt =1

rn

[− αNF,nt

αNF,nt (2− χNF,nt )

].

Since 2 − χNF,nt > 1 and αNF,nt > 1/2, the expression in brackets is uniformly bounded in

absolute value by some constant KNF (independent of rn and t), and thus |αNF,nt −αm,nt | → 0

pointwise. It follows that pointwise, and by familiar arguments uniformly, αm,k(n) → α∗. But

αm,k(n) = 1/(2− χNF,n)→ 1/(2− χ∗), so α∗ = 1/(2− χ∗) = 1/(1 + γ∗/γo).

By differentiating the equation γ∗t =γoσ2

Y

σ2Y +γo

´ Tt (α∗)2ds

, we obtain γ∗t =(α∗t γ

∗t )2

σ2Y

=(γ∗t )2

σ2Y [1+γ∗t /γ

o]2,

subject to initial condition γ∗T = γo. This equation has a unique solution which is γNF,∞

as obtained in the proof of Lemma A.5, from which α∗ = αNF,∞ and χ∗ = χNF,∞. This

establishes that all convergent subsequences have the same limit, and hence the original

sequences converges to that limit.

Proof of Proposition 3. Part (i) is a restatement of results from Lemma A.4. For part (ii),

choose any δ ∈ (0, T ). Define γ := mint∈[T−δ,T ](γPub,∞t −γNF,∞t ), where we use γx,r to denote

the solution for the case x ∈ {Pub,NF} and r is the discount rate. By Lemma A.6, γ > 0.

By Proposition A.1, γPub,r − γNF,r converges uniformly to γPub,∞ − γNF,∞ as r → ∞, and

thus for any ε ∈ (0, γ), there exists r > 0 such that for all r > r and all t ∈ [T − δ, T ], we

have γPub,rt − γNF,rt > γPub,∞t − γNF,∞t − ε ≥ γ − ε > 0, as desired.

53

Proof of Proposition 4

We begin by calculating expected flow losses in the public and no feedback cases.

Lemma A.8. Then the expected flow payoffs to player 1 in the public benchmark and no

feedback case have magnitudes

upubt = γpubt [(1− β3t)2 + β2

3t]

uNFt = (1− α3t)2γo + α2

3tγNFt .

Proof. Let Eθ denote ex ante expectations over realizations of θ ∼ N(m0, γo). In the public

benchmark, using β1t = 1− β3t, we have

upubt = Eθ(E0

[(θ − at)2 + (at − at)2

])= Eθ

(E0

[(θ − β1tMt − β3tθ)

2 + (Mt − β1tMt − β3tθ)2])

= Eθ(E0

[(θ −Mt)

2([1− β3t]

2 + β23t

)])= E0

[Et{(θ −Mt)

2}([1− β3t]

2 + β23t

)]= γpubt [(1− β3t)

2 + β23t],

where in the second to last step we have used that the long-run player and myopic player have

the same information prior to θ being realized, along with the law of iterated expectations.

In the no feedback case, we have

uNFt = Eθ(E0

[(θ − at)2 + (at − at)2

])= Eθ

(E0

[(θ − (1− αt)m0 − αtθ)2 + ([1− αt]m0 + αtθ − [1− αt]m0 − αtMt)

2])

= Eθ(E0

[(1− αt)2(θ − m0)2 + α2

t (Mt − θ)2])

= γo(1− αt)2 + α2tEθ

(E0

{Et[(Mt − θ)2

]}),

where we have again used the law of iterated expectations. In the second term, we expand

Et[(Mt − θ)2

]= Et

[M2

t − 2Mtθ + θ2]

= Et[M2

t

]− 2θMt + θ2

= γNF2t +M2t − 2θMt + θ2

= γNFt χt + (Mt − θ)2

= γNFt χt + (1− χt)2(θ − m0)2.

54

This term is unchanged under the expectation E0. Under the outer expectation Eθ, it reduces

to γNFt χt + (1− χt)2γo, and using χt = 1− γNFt /γo, the result follows.

Let V Pub and V NF denote Player 1’s ex ante expected payoff, i.e., at time 0 taking

expectations over θ ∼ N(m0, γo).

Lemma A.9. Assume r = 0. For all T, γo, σ2Y > 0, V Pub > V NF .

Proof. For i ∈ {pub,NF}, define T i :=TγFiσ2Y

and define ρ := γFNF/γo. We claim that the

quantities under comparison are

V pub = −σ2Y

{T pub/2− ln

[16− 8T pub

(4− T pub)2

]}.

V NF = −σ2Y {ρ(1− ρ)− ln ρ} .

First consider the public benchmark, where from Lemma A.8,

V pub = E0

(ˆ T

0

[−(θ − at)2 − (at − at)2

]dt

)= −ˆ T

0

upubt dt

= −ˆ T

0

[γt([1− β3t]

2 + β23t

)]dt

Using the closed-form expressions for γt and β3t, the integrand simplifies to

γt([1− β3t]

2 + β23t

)=γFpub

2

[1 +

2tγFpubσ2Y

(2σ2Y − tγFpub)(4σ2

Y − tγFpub)

]

=γFpub

2

[1 +

2tpub

(2− tpub)(4− tpub)

],

where tpub := tγFpub/σ2Y . Using that the function g : x 7→ x

(2−x)(4−x)has antiderivative

ln(

(4−x)2

2−x

)and integrating the second term w.r.t. tpub over

[0, T pub

]and rescaling by σ2

Y /γFpub,

we obtain

V pub = −TγFpub

2− σ2

Y

(ln

[(4− T pub)2

2− T pub

]− ln 8

)

= −σ2Y

{T pub/2− ln

[16− 8T pub

(4− T pub)2

]}.

55

Next, consider the no feedback case, where by Lemma A.8,

V NF = E0

(ˆ T

0

[−(θ − at)2 − (at − at)2

]dt

)= −ˆ T

0

uNFt dt

= −ˆ T

0

[(1− α)2γo + α2

([1− χt]2γo + γtχt

)]dt. (A.50)

By plugging in the analytic solutions for α, χt and γt in terms of γFNF , the integrand in

(A.50) is

γoγFNF

(γo + γFNF )2

t(γoγFNF

)2 −(γo + γFNF

)3σ2Y

t (γo)2 γFNF (γo + γFNF )2σ2Y

. (A.51)

From the equation defining γFNF , we have(γo + γFNF

)2σ2Y = TγFNF (γo)3 /

(γo − γFNF

), and

thus (A.51) reduces to

γoγFNF

(γo + γFNF )2

tγFNF(γo − γFNF

)− Tγo

(γo + γFNF

)t (γo − γFNF )− Tγo

(A.52)

Integrating (A.52) yields

V NF = −ˆ T

0

[γoγFNF

(γo + γFNF )2

tγFNF(γo − γFNF

)− Tγo

(γo + γFNF

)t (γo − γFNF )− Tγo

]dt

= − TγoγFNF

(γo + γFNF )2

γFNF +(γo)2 ln γo

γFNF

γo − γFNF

= −

σ2Y

(γo − γFNF

)(γo)2

γFNF +(γo)2 ln γo

γFNF

γo − γFNF

= −σ2

Y {ρ(1− ρ)− ln ρ} .

Next, we prove that V NF < V pub. Substituting in the expressions for γpubF , and simplify-

56

ing, we obtain

V NF − V pub = σ2Y

{ρ(ρ− 1) + ln ρ+ 1 + T /2−

√4 + T 2

2

− ln

[8

(−T +

√4 + (T )2

)]+ 2 ln

[2− T +

√4 + (T )2

]}=σ2Y

2ρf(ρ),

where again T := Tγo

σ2Y

and ρ := γNF

γo, where in the second line we have used that the cubic

equation defining γNF rearranges to T = (1−ρ)(1+ρ)2

ρ, and where

f(x) := A1(x) + 2x ln

(A2(x)2

A3(x)

)(A.53)

A1(x) := x3 − 3x2 + 3x+ 1− z(x)

A2(x) := x3 + x2 + x− 1 + z(x)

A3(x) := 8[x3 + x2 − x− 1 + z(x)]

z(x) :=√

4x2 + (1− x)2(1 + x)4.

We now show that f(x) < 0 for all x ∈ (0, 1), so that in particular f(ρ) < 0, from which

the desired result follows. To that end, we first show that A2(x) > 0 and A3(x) > 0 for all

x > 0. By inspection, for all x > 0, we have A2(x) > A3(x)/8, and

A3(x)/8 = (x− 1)(x+ 1)2 +√

4x2 + (1− x)2(1 + x)4

> (x+ 1)2[x− 1 +

√(1− x)2

]≥ 0.

Next, we apply the inequality ln(y) ≤ 2(y

12 − 1

)for y > 0 using y = A2(x)2

A3(x)in (A.53) to

obtain

f(x) ≤ A1(x) + 4x

(A2(x)√A3(x)

− 1

). (A.54)

For x > 0, the RHS of (A.54) is negative if and only if

A2(x)√A3(x)

< −A1

4x+ 1. (A.55)

57

For x ∈ (0, 1), the RHS of (A.55) is strictly positive:

−A1

4x+ 1 =

1

4x

[√4x2 + (1− x)2(1 + x)4 − x3 + 3x2 + x− 1

]>

1

4x

[(1− x)(1 + x)2 − x3 + 3x2 + x− 1

]=

1

2[1 + x(1− x)] > 0.

Hence, for x ∈ (0, 1), (A.55) is equivalent to

0 > A2(x)2 − A3(x)

(−A1

4x+ 1

)2

=2

x2

[(1− x)2A4(x) + A5(x)z(x)

](A.56)

where

A4(x) = x6 − 4x4 − x3 + 4x2 + 3x+ 1

A5(x) = x5 − 3x4 + x3 + 2x2 − 1.

Now by Descartes’ rule of signs, A5 has 3 sign changes and at most 3 positive real roots,

counting multiplicity. It is easy to verify that there is a double root at x = 1, that A5(2) =

−1 < 0, and that limx→+∞A5(x) = +∞, so there is a positive root at some x > 2. This

implies there are no roots in (0, 1). Since A5(0) = −1 < 0, it follows that A5(x) < 0 for all

x ∈ (0, 1). Thus, without signingA4(x), it suffices to show that (1−x)2|A4(x)| < −A5(x)z(x),

or equivalently (since A5(x) < 0)

0 > (1− x)4A4(x)2 − z(x)2A5(x)2 = 4(1− x)4x5(x4 − 2x2 − 3x− 1). (A.57)

Now for x ∈ (0, 1), we have in (A.57) x4 − 2x2 − 3x − 1 < −2x2 − 3x < 0, and the outside

factor is clearly positive. Hence we have shown that f(x) < 0 for x ∈ (0, 1), concluding the

proof.

Lemma A.10. Suppose the leader is myopic. There exists T > 0 such that for all T > T

the team’s ex ante flow payoff is strictly larger in the no feedback case over [T , T ].

Proof. Plugging the solutions from Lemma A.5 into the expressions for the expected flow

58

losses in Lemma A.8, we have

upubt = γpubt [(1− β3t)2 + β2

3t] =γpubt

2=

2σ2Y γ

o

4σ2Y + γot

uNFt = (1− α3t)2γo + α2

3tγNFt =

γoγNFtγo + γNFt

.

Let t := tγo

σ2Y

. Flow losses are worse in the public benchmark case at time t if and only if

upubt > uNFt

⇐⇒ 2σ2Y γ

o

4σ2Y + tγo

>γoγNFtγo + γNFt

⇐⇒ γNFt <2σ2

Y γo

2σ2Y + γot

=2γo

2 + t.

Since f(γ/γo) is strictly increasing in γ, where f(γNF/γo) is the LHS of (A.49), it suffices

to show that there exists T such that f(γ/γo)|γ= 2γo

2+t

> −γotσ2Y

= −t if t > T . By direct

calculation,

f(γ/γo)|γ= 2γo

2+t

+ t =f2(t)

2(t+ 2), where (A.58)

f2(t) := t2 + 4(t+ 2) ln

[2

t+ 2

].

Now the denominator of the RHS of (A.58) is strictly positive, and it is easy to verify that

in the numerator, f2(0) = 0 and f ′2(0) = −4. Moreover, f ′2(t) = 2t − 4 + 4 ln[

2t+2

]with

limt→+∞ f′2(t) = +∞ as the linear term dominates the log term, and f ′′2 (t) = 2 − 4

T+2≥ 0

with strict inequality for all t > 0. It follows that there is a unique positive threshold t∗

defined by f2(t∗) = 0 such that the RHS of (A.58) is strictly positive, and upubt > uNFt , if

and only if t > t∗. Letting T = t∗σ2Y /γ

o yields the result.

Proof of Proposition 4. Part (i) of the proposition is simply a restatement of Lemma A.9.

For part (ii), use superscript r for the discount rate as in the proof of Proposition 3. Let T be

as in Lemma A.10 and define u := mint∈[T ,T ](uPub,∞t −uNF,∞t ), which is strictly positive. Now

the mappings γ 7→ γ/2 and γ 7→ γoγγo+γ

underlying the expressions for expected flow losses in

Lemma A.8 are continuous for γ ∈ [0, γo], a compact set, and hence uniformly continuous;

together with Proposition A.1, this implies that uPub,r and uNF,r converge uniformly to uPub,∞

and uNF,∞, respectively. Hence for any ε ∈ (0, u), there exists r such that for all r > r and

all t ∈ [T , T ], uPub,rt − uNF,rt > uPub,∞t − uNF,∞t − ε ≥ u− ε > 0.

59

Appendix B: Proofs for Section 4

In the proofs for this section, we denote the prior by m0 := µ.

Proof of Lemma 2. We establish a more general version of the lemma for a drift of the

form at + νat, ν ∈ [0, 1], in X. We use “p1” and “p2” to refer to the long-run player and

the myopic player, respectively. Without fear of confusion, we also use γ1t for the posterior

variance of p2 (γt in the main body), as this variance appears in the first filtering step of

the proof. Likewise, p1’s posterior variance will be denoted by γ2, as it is obtained from a

second filtering step.

Inserting (11) in (12), we can write at = α0t + α2tLt + α3tθ, with

α0t = β0t, α2t = β2t + β1t(1− χt), and α3t = β3t + β1tχt.

that p2 conjectures drives the public signal X.35 With this in hand, p2’s filtering problem

can be obtained using the Kalman filter (Chapters 11 and 12 in Liptser and Shiryaev, 1977).

Specifically, define

dX2t := dXt − [at + ν(α0t + α2tLt)]dt = να3tθdt+ σXdZ

Xt

dY 2t := dYt − [α0t + α2tLt]dt = α3tθdt+ σY dZ

Yt

which are in p2’s information set. Then, by Theorems 12.6 and 12.7 in Liptser and Shiryaev

(1977), θ|F2t ∼ N (Mt, γ1t), where,

dMt =να3tγ1t

σ2X

[dX2t − να3tMtdt] +

α3tγ1t

σ2Y

[dY 2t − α3tMtdt]

˙γ1t = −γ21tα

23tΣ,

and Σ :=(ν2

σ2X

+ 1σ2Y

). (These expressions hold for any (admissible) strategies of the players,

as deviations go undetected.)

P1 can affect Mt via her choice (at)t∈[0,T ]. It is easy to see that, from her perspective,

dMt =να3tγ1t

σ2X

[(νat − ν{α0t + α2tLt + α3tMt})dt+ σXdZXt ]

+α3tγ1t

σ2Y

[(at − {α0t + α2tLt + α3tMt})dt+ σY dZYt ]

35Since α3t plays a key role in the economic analysis and appears frequently throughout the paper, wesometimes abbreviate it to αt.

60

under any admissible strategy (at)t∈[0,T ]. Rearranging terms we can write

dMt = (µ0t + µ1tat + µ2tMt)dt+BXt dZ

Xt +BY

t dZYt , (B.1)

where

µ1t = α3tγ1tΣ, µ0t = −µ1t[α0t + α2tLt], µ2t = −α3tµ1t, BXt =

να3tγ1t

σX, BY

t =α3tγ1t

σY. (B.2)

This dynamic is linear in M . Also, since Lt depends only on the paths of X, µ0t is in p1’s

information set. Similarly with (at)t∈[0,T ], which is measurable with respect to (θ,X).

On the other hand, because p1 always thinks that p2 is on path, the public signal follows

dXt = (νat + δ0t + δ1tMtdt+ δ2tLt)dt+ σXdZXt ,

from her perspective. This dynamic is also affine in the unobserved state M , with an

intercept that is again measurable with respect to (θ,X). The hidden state M along with

the observation state (θ,X) is thus conditionally Gaussian. In particular, applying the

filtering equations in Theorem 12.7 in Liptser and Shiryaev (1977) yields that Mt := Et[Mt]

and γ2t := Et[(Mt − Mt)2] satisfy

dMt = (µ0t + µ1tat + µ2tMt)dt︸︷︷︸=Et[(µ0t+µ1tat+µ2tMt)dt]

+σXB

Xt + γ2tδ1t

σ2X

[dXt − (νat + δ0t + δ1tMt + δ2tLt)dt] (B.3)

γ2t = 2µ2tγ2t + (BXt )2 + (BY

t )2 −(σXB

Xt + γ2tδ1t

σX

)2

, (B.4)

and where dZt := [dXt−(νat+δ0t+δ1tMt+δ2tLt)dt]/σX is a Brownian motion with respect to

the long-run player’s standpoint. (For notational simplicity, we have omitted the dependence

of M and Z on the strategy (at)t∈[0,T ] that is being followed.)

Inserting at = β0t + β1tMt + β2tLt + β3tθ into the right-hand side of dMt, and solving for

dMt as a function of the increment dXt, we obtain

dMt = [µ0t + µ1tMt + µ2tLt + µ3tθ]dt+ BtdXt

61

where

µ0t = −α3tγ1tα0tΣ︸︷︷︸constant in µ0t

+α3tγ1tβ0tΣ︸︷︷︸µ1tβ0t

+να3tγ1t + γ2tδ1t

σ2X︸︷︷︸

=[σXBXt +γ2tδ1t]/σ2

X

[−νβ0t − δ0t]

µ1t = α3tγ1tβ1tΣ︸︷︷︸µ1tβ1t

+−α23tγ1tΣ︸︷︷︸µ2t


σ2X

[−νβ1t − δ1t]

µ2t = −α3tγ1tα2tΣ︸︷︷︸Lt term in µ0t

+α3tγ1tβ2tΣ︸︷︷︸µ1tβ2t


σ2X

[−νβ2t − δ2t]

µ3t = α3tγ1tβ3tΣ︸︷︷︸µ1tβ3t


σ2X

[−νβ3t] =

[α3tγ1t

σ2Y

− νγ2tδ1t

σ2X

]β3t

Bt =να3tγ1t + γ2tδ1t

σ2X

.

Let R(t, s) = exp(´ tsµ1udu), and suppose and denote the prior distribution of θ by

N (m0, γ10); in particular, M0 = m0. Path-by-path of X, therefore,

Mt = R(t, 0)m0 + θ

ˆ t

0

R(t, s)µ3sds+

ˆ t

0

R(t, s)[µ0s + µ2sLs]ds+

ˆ t

0

R(t, s)BsdXs,

which, after recalling that we started from (11), yields a system of two equations:

χt =

ˆ t

0

R(t, s)µ3sds,

Lt =R(t, 0)m0 +

´ t0R(t, s)[µ0s + µ2sLs]ds+

´ t0R(t, s)BsdXs

1− χt.

The validity of the construction boils down to finding a solution to the previously stated

equation for χ that takes values in [0, 1). In fact, when this is the case,

(1− χt)dLt − Ltdχt = [µ0t + (µ1t(1− χt) + µ2t)Lt]dt+ BtdXt

=⇒ dLt =Lt[µ1t + µ2t + µ3t]dt+ µ0tdt+ BtdXt

1− χt.

Thus, letting R2(t, s) := exp(´ t

sµ1u+µ2u+µ3u

1−χu du)

, we obtain that

Lt = R2(t, 0)m0 +

ˆ t

0

R2(t, s)µ0s

1− χsds+

ˆ t

0

R2(t, s)Bs

1− χsdXs, (B.5)

62

i.e., L is a (linear) function of the paths of X as conjectured. Moreover, in the particular

case of ν = 0, it is easy to verify that dLt = (`0t + `1tLt)dt+BtdXt, where

l0t = − γtχtδ0tδ1t

σ2X(1− χt)

(B.6)

l1t = −γtχtδ1t(δ1t + δ2t)

σ2X

(B.7)

Bt =γtχtδ1t

σ2X(1− χt)

. (B.8)

We will ultimately find a solution to the equation for χ that is of class C1 and that takes

values in [0, 1). In particular, if χ is differentiable,

χt = µ1tχt + µ3t

= α3tγ1tΣ [χtβ1t − α3tχt]︸︷︷︸=α3t(1−χt)−β3t

+χtνα3tγ1t + γ2tδ1t

σ2X

[−νβ1t − δ1t]

+α3tγ1tβ3tΣ +να3tγ1t + γ2tδ1t

σ2X

[−νβ3t]. (B.9)

Using that α3t = β3t + β1tχt, we obtain the following system of ODEs

γ1t = −γ21t(β3t + β1tχt)

2Σ

γ2t = −2γ2tγ1t(β3t + β1tχt)2Σ + γ2

1t(β3t + β1tχt)2Σ−

(νγ1t(β3t + β1tχt) + γ2tδ1t

σX

)2

χt = γ1t(β3t + β1tχt)2Σ(1− χt)− (ν[β3t + β1tχt] + δ1tχt)

(νγ1t(β3t + β1tχt) + γ2tδ1t

σ2X

).

In the proof of the next lemma we establish that χ = γ2/γ1. After replacing ν = 0 and

γ2 = χγ in the third ODE, and using γ for γ1, the first and third equations of the previous

system correspond to (15)–(16) as desired. The representation Lt = E[θ|FXt ] is proved in

Lemma B.1 at the end of this subsection. �

Proof of Lemma 3. Consider the system (γ1, γ2, χ) from the proof of the previous lemma

when ν = 0 (in particular, Σ becomes 1/σ2Y ). Also, let δ1t := uθ+uaα3t.

36 The local existence

of a solution follows from continuity of the associated operator. Suppose that the maximal

interval of existence is [0, T ), with T ≤ T .

Since the system is locally Lipschitz continuous in (γ1, γ2, χ) uniformly in t ∈ [0, T ] for

36All the results in this proof extend to a generic continuous function δ1 over [0, T ] in which the explicit

dependence on ~β and χ is not recognized, which happens when the myopic player becomes forward looking.

63

given continuous coefficients, it solution is unique over the same interval (Picard-Lindelof).

In particular, observe that (γ1t, γ2t, χ) = (γo, 0, 0) solves the system as long as β3 = 0.

Without loss of generality then, assume β30 6= 0.

Observe that γ1 is (weakly) decreasing over [0, T ), so γ1t ≤ γo. Suppose there is a time

at which γ1 is strictly negative. Let s < t be the first time γ1 crosses zero, and notice that

for t > s close to s,

0 > γ1t =

ˆ t

s

γ1udu = −ˆ t

s

γ21u[β3u + β1uχu]

2Σds ≥ 0,

which is a contradiction. Thus, γ1t ∈ [0, γo] for all t ∈ [0, T ). Moreover, if γt > 0, straight-

forward integration shows that

γ1t =γo

1 +´ t

0[β3s + β1sχs]2Σds

.

Since ~β is continuous over [0, T ], if γ ever vanishes in [0, T ) we must have that χ diverges

at such point; by definition of T , however, that point must be T . Thus, γ1t > 0 in [0, T )

(regardless of whether χ diverges at T or not).

We now show that 0 < γ2t < γ1t for t > 0. In fact, since γ20 = 0, γ10 > 0 and β30 > 0, we

have γ2ε > 0 for ε small. Consider now [ε, t] with t ∈ (ε, T ). Then,

fγ2(t, x) := −2xγ1t(β3t + β1tχ1t)

2

σ2Y

+γ2

1t(β3t + β1tχ1t)2

σ2Y

−(xδ1t

σX

)2

,

is locally Lipschitz continuous with respect to x uniformly in t ∈ [ε, t]. Since 0− fγ2(t, 0) ≤0 = γ2t−fγ2(t, γ2t) and 0 < γ2ε, we obtain that γ2t > 0 for all t ∈ [0, t] by means of standard

comparison theorems (e.g., Theorem 1.3 in Teschl), and hence over (0, T ) as well.

Now, let zt := γ2t − γ1t, t < T . Using the ODEs for γ1 and γ2 we deduce that

zt < −2γ1t(β3t + β1tχ1t)zt

σ2Y

, z0 = γ20 − γ10 = −γo < 0.

It is then easy to conclude that (Gronwall’s inequality),

zt < z0 exp

(−ˆ t

0

2γ1s(β3s + β1sχ1s)

σ2Y

ds

)< 0, t < T ,

as γ1t(β3t + β1tχ1t) is continuous over [0, t], t < T . Thus, γ2t < γ1t for all t ∈ [0.T ).

With this in hand, γ2t/γ1t ∈ (0, 1) for all t ∈ (0, T ), and γ20/γ10 = 0. Moreover, it is easy

64

to verify that the previous ratio solves the χ−ODE. By uniqueness, χ = γ2/γ1. Replacing

γ2 = χγ1 and ν = 0 in the χ−ODE above yields (16), i.e.,

χt = γ1t(β3t + β1tχt)2 (1− χt)

σ2Y

− γ1t(δ1tχt)

2

σ2X

, t ∈ [0, T ).

By the previous analysis, (γ1, γ2, χ) is bounded over [0, T ). If T < T , the solution can

be extended strictly beyond T thanks to the continuity of the associated operator (Peano’s

theorem), contradicting the definition of T . Thus, the only option is that T = T , in which

case the system admits a continuous extension to T .37 By continuity, such an extension is

unique, the desired properties (χ = γ2/γ1 stated in Lemma 2; χ solves (16) and χ ∈ (0, 1);

and γo ∈ (0, γo]) hold up to T by the exact same arguments previously applied over strict

compact subsets of [0, T ) now over [0, T ].38 �

Proof of Lemma 4. The long-run player’s problem is to choose an admissible a := (at)t∈[0,T ]

that maximizes

U(a) := E0

[ˆ T

0

e−rtU(at, δ0t + δ1tMt + δ2tLt, θ)dt

]where (Mt)t≥0 is given by (B.1) and (Lt)t≥0 by (B.5). Using that the flow is quadratic, we

obtain that

U(a) = E0

[ˆ T

0

e−rtU(at, δ0t + δ1tMat + δ2tLt, θ)dt

]+Uaa2

E0

[ˆ T

0

e−rtEt[(Mat − Ma

t )2]dt

]with Ma

t := Et[Mat ], where we have made explicit the dependence of both processes on the

strategy followed. By the proof of Lemma 2, (Mat )t∈[0,T ] evolves as in (B.3), i.e.,

dMat = (µ0t + µ1tat + µ2tM

at )dt+

σXBXt + γ2tδ1t

σXdZa

t

where dZat := [dXt−(νat+δ0t+δ1tM

at +δ2tLt)dt]/σX is a Brownian motion from the long-run

player’s standpoint, (µ0, µ1, µ2, BXt ) are given by (B.2), and where γ2t evolves as in (B.4).

Moreover, from the same filtering equations (B.3)–(B.4) we know that Et[(Mat − Ma

t )2] is

independent of the strategy followed, and that it coincides with γ2t, t ∈ [0, T ]. Thus, the

37For a generic system zt = f(t, zt), if z is bounded over [0, T ) and f continuous, there exists K s.t.|xt − xs| < M |t− s|; but this implies that (xs)s↗T is Cauchy, and hence the limit exists.

38An alternative way of seeing that χ < 1 is that χ ≤ γ1t(β3t +β1tχt)2(1−χt)/σ2

Y , and so χt ≤ 1− γt/γoby standard comparison theorems, as the latter function satisfies zt = γ1t(β3t + β1tzt)

2(1− zt)/σ2Y , z0 = 0.

65

long-run player’s problem reduces to

max(at)t≥0 admissible

E0

[ˆ T

0

e−rtU(at, δ0t + δ1tMat + δ2tLt, θ)dt

]where (Ma

t )t∈[0,T ] is as above, and (Lt)t≥0 is linear in the paths of X according to (B.5). In

differential form, the latter process can be written as

dLt =1

1− χ1

{Lt[µ1t + µ2t + µ3t] + µ0t + Bt[νat + δ0t + δ1tM

at + δ2tLt]

}dt+

σXBt

1− χtdZa

t .

where we used that dXt = (νat + δ0t + δ1tMat + δ2tLt)dt+ σXdZ

at from the long-run player’s

standpoint. (Refer to the proof of Lemma 2 for the expressions for (µ0t, µ1t, µ2t, µ3t, BXt ).)

So far, we have fixed an admissible strategy (at)t∈[0,T ] (in the sense of Section 3) for

the long-run player, and then obtained processes Ma and Za that potentially depend on

that choice. The above problem thus differs from traditional control problems with perfectly

observed states in that the Brownian motion is, in principle, affected by the choice of strategy.

With linear dynamics, however, the separation principle (e.g., Liptser and Shiryaev, 1977,

Chapter 16), applies. In fact, the solution to the long-run player’s problem can be found

by first fixing a Brownian motion, say, Zt := Z0t (i.e., Za

t when a ≡ 0), and then solving

the optimization problem that replaces Za by Z in the laws of motion of Ma and L. The

method works to the extent that Za ≡ Z for all (at)t≥0: it is easy to conclude from (B.1) and

(B.3) that the process Mat −Ma

t is independent of the strategy followed, and hence so is Zat ,

given that σXdZat = dXt − (νat + δ0t + δ1tM

at + δ2tLt)dt = δ1t(M

at −Ma

t )dt+ σXdZXt under

the true data-generating process, thanks to the linearity of the dynamics. In this procedure,

therefore, one filters as a first step, and then optimizes afterwards using the posterior mean

as a controlled state.39

Returning to the ν = 0 case, we can then insert Zt in the dynamic of Mat . Omitting the

dependence of the resulting process on a (as any control problem does), it is easy to see that

dMt =γtα3t

σ2Y

(at − [α0t + α2tLt + α3tMt])dt+χtγtδ1t

σXdZt.

As for the expression for L (display (19)), this one follows from (17) using that dXt =

39Relative to Chapter 16 in Liptser and Shiryaev (1977), our problem is more general in that it allows for alinear component in the flow, and the public signal can be controlled (when ν 6= 0). The first generalizationis clearly innocuous. As for the second, the key behind the separation principle is that the innovationsdXt−Et[dXt] are independent of the strategy followed, which also happens when ν 6= 0. Given any admissiblestrategy (at)t≥0, therefore, the fact that the filtrations of Z, Za and Xa satisfy FZt = FZa

t ⊆ FXa

t , t ≥ 0,means the optimal control found by using Z is weakly better than any such (at)t≥0. See p.183 in section16.1.4 in Liptser and Shiryaev (1977) for more details in a context of a quadratic regulator problem.

66

(δ0t + δ2tLt + δ1tMt)dt+ σXdZt from the long-run player’s perspective. In fact, it is easy to

see from (B.6)–(B.8) that

l0t +Btδ0t + (l1t +Btδ1t)Lt +Btδ1tMt =γtχtδ1t

σ2X(1− χt)

(Mt − Lt).

This concludes the proof. �

Lemma B.1. The process L is the belief about θ held by an outsider who observes only X.

Moreover,

(θ

M1t

)|FXt ∼ N (M out

t , γoutt ) where M outt =

(Lt

Lt

)and γoutt =

(γ1t

1−χtγ1tχt1−χt

γ1tχt1−χt

γ1tχt1−χt

).

Proof. The outsider jointly filters the state vt = (θt,M1t)′. For the evolution of the state

and the signal, we adopt notation from Liptser and Shiryaev (1977) (Section 12.3). From

the outsider’s perspective, both players (and in particular player 2) are on the equilibrium

path, and thus the outsider believes that vt evolves as

dvt = a1(t,Xout)vtdt+ b1(t,X)dW1(t) + b2(t,X)dW2(t),

where a1(t,Xout) :=

(0 0

zt −zt

)for zt := (να3t)2γ1t

σ2X

+α23tγ1t, b1(t,X) :=

(0 0

0 α3tγ1tσY

), b2(t,X) :=(

0να3tγ1tσX

), W1(t) :=

(W11(t)

ZYt

)and W2(t) := ZX

t , where W11(t) is a standard Brownian mo-

tion and W11(t), ZYt and ZX

t are mutually independent. The signal is

dXoutt := dXt − [δ0t + δ2tLt + ν(α0t + α2tLt)]dt

= A1(t,X)vt +B1(t,X)W1(t) +B2(t,X)W2(t),

where A1(t,X) :=(να3t δ1t

), B1(t,X) :=

(0 0

)and B2(t,X) = σX .

Hence, denoting M outt =

(M out

t,1

M outt,2

)and γoutt =

(γoutt,11 γoutt,12

γoutt,21 γoutt,22

)and imposing γoutt,21 = γoutt,12,

we have from the standard filtering equations of Liptser and Shiryaev (1977) (Theorem 12.7)

67

that

(θ

M1t

)|FXt ∼ N (M out

t , γoutt ), where M out and γout are the unique solutions to

dM outt = a1(t,X)M out

t +1

σ2X

[(0

να3tγ1t

)+ γoutt

(να3t

δ1t

)]× · · · (B.10)

· · · ×{dXout

t − (να3tMoutt,1 + δ1tM

outt,2 )dt

}(B.11)

γoutt = a1(t,X)γoutt + γoutt a∗1 +1

σ2X

[(0

να3tγ1t

)+ γoutt

(να3t

δ1t

)][(0

να3tγ1t

)+ γoutt

(να3t

δ1t

)]∗(B.12)

with initial conditions M out0 =

(m0

m0

)and γout0 =

(γo 0

0 0

).

Recall that

γ1t = −α23t

σ2Y

γ21t

χt =γ1tα

23t(1− χt)σ2Y

− γ1tδ21tχ

2t

σ2X

with initial conditions γ10 = γo and χ0 = 0. It is then straightforward to verify that

γoutt =

(γ1t

1−χtγ1tχt1−χt

γ1tχt1−χt

γ1tχt1−χt

)satisfies the differential equation for γout above along with given initial

condition. Moreover, γoutt = γ1t1−χt

(1 χt

χt χt

)is positive semidefinite as its leading principal

minors are positive multiples of 1 and χ− χ2 > 0.

Next, substitute given the solution γoutt into (B.11) and subtract the equation for the

second component from its first to obtain the following SDE for M := M out1 −M out

2

dMt = −ΣMtα23tγ1t

with initial condition M0 = 0. Now if Mt > 0, then dMt < 0, giving us a contradiction;

likewise for the case Mt < 0. It follows that Mt = 0, and thus M outt,1 = M out

t,2 , for all t ≥ 0.

Substituting this back into (B.11), we have

68

dM outt,1 =

γ1t(να3t + δ1tχt)

σ2X(1− χt)

(dXoutt − (να3t + δ1t)M

outt,1 dt)

=γ1t(να3t + δ1tχt)

σ2X(1− χt)

[dXt − (να0t + δ0t +M out1,t (να3t + δ1t) + Lt(να2t + δ2t))dt] (B.13)

On the other hand, we have

dLt =Lt[µ1t + µ2t + µ3t]dt+ µ0tdt+ BtdXt

1− χt

=γ1t(να3t + δ1tχt)

σ2X(1− χt)

[dXt − (δ0t + να0t + Lt[ν(α2t + α3t) + δ1t + δ2t])dt] (B.14)

Defining Lt := M outt,1 − Lt and subtracting (B.14) from (B.13), we obtain

dLt = −γ1t(να3t + δ1tχt)

σ2X(1− χt)

Lt(να3t + δ1t)

with initial condition L0 = m0 − m0 = 0. We conclude that Lt = 0, and thus Lt = M outt,1 =

M outt,2 , for all t ≥ 0.

Proof of Lemma 5. Suppose δ1 = uaα3, ua > 0. The χ-ODE for ν ∈ [0, 1] boils down to

χ1t = γ1tα23t

([ν2

σ2X

+1

σ2Y

](1− χ1t)−

(ν + uaχt)2

σ2X

):= −γ1tα

23tQ(χt).

The goal is to find a function f : [0, χ) → [0, γo], some χ ∈ (0, 1), such that f(χt) = γt for

all t ≥ 0. When this is the case, and such f is differentiable, f ′(χt)χt = γt. Thus, if α3t > 0,

f ′(χt)

f(χt)=

1

σ2YQ(χt)

.

Thus, we aim to solve the ODE

f ′(χ)

f(χ)=

1

σ2YQ(χ)

, χ ∈ (0, χ), and f(0) = γo,

over some domain [0, χ), with the property that f(χ) > 0 if χ > 0.

To this end, let

c2 =

√b2 + 4(ua)2/[σXσY ]2 − b

2(ua/σX)2and − c1 =

−√b2 + 4(ua)2/[σXσY ]2 − b

2(ua/σX)2,

69

where b := [ν2/σ2X + 1/σ2

Y ] + 2νua/σ2X , be the roots of the quadratic

Q(χ) =

(uaσX

)2

χ2 + χ

([ν2

σ2X

+1

σ2Y

]+

2νuaσ2X

)− 1

σ2Y

.

Clearly, −c1 < 0 < c2. Also, it is easy to verify that c2 < 1.40 Thus,

1

σ2YQ(χ)

= − σ2X

(σ2Y ua)

2(c1 + c2)

[1

χ+ c1

− 1

χ− c2

]is well defined (and negative) over [0, c2) with 1/(χ+ c1) > 0 and −1/(χ− c2) > 0 over the

same domain. We can then set χ = c2 and solve

ˆ χ

0

f ′(s)

f(s)ds = − σ2

X

(σY ua)2(c1 + c2)log

(χ+ c1

c2 − χc2

c1

)⇒ f(χ) = f(0)

(c1

c2

)1/d(c2 − χχ+ c1

)1/d

where 1/d = σ2X/[(σY ua)

2(c1 + c2)] > 0. We then impose f(0) = γo, thus obtaining a strictly

positive and decreasing function that has the initial condition we look for. Moreover, letting

γ = f(χ), its inverse is decreasing and given by

χ(γ) = f−1(γ) = c1c21− (γ/γo)d

c1 + c2(γ/γo)d.

When γ1 = γo, we have that χ = 0, whereas when γ = 0, it follows that χ = c2 as desired.

We verify that this candidate satisfies the χ-ODE; we do this for the ν = 0 case only. To

this end, it is easy to verify that

d(χ(γt))

dt=

α23tγt

σ2Y [c1 + c2(γ/γo)d]2

c1c2d[c1 + c2]

(γtγo

)d.

By construction, moreover,

c1c2 = c1 − c2 =σ2X

σ2Y u

2a

which follows from equating the first- and zero-order coefficients in Q(χ) = u2aχ

2/σ2X+χ/σ2

Y−1/σ2

Y = u2a(χ− c2)(χ+ c1)/σ2

X . Thus, dc1c2 = c1 + c2.

On the other hand,

[uaχ(γ)]2

σ2X

=u2a

σ2X

[c1c2

1− (γ/γo)d

c1 + c2(γ/γo]d

]2

=c2

1(1− c2)

σ2Y

[1− (γ/γo)d

c1 + c2(γ/γo)d

]2

40This follows from squaring both sides of√b2 + 4(ua)2/[σXσY ]2 < b + 2(ua)2/σ2

X using that b +2(ua)2/σ2

X > 0 and b =[ν2/σ2

X + 1/σ2Y

]+ 2νua/σ

2X .

70

where we used that c21c

22/σ

2X = c2

1(1−c2)/σ2Y follows from u2

ac22/σ

2X = (1−c2)/σ2

Y by definition

of c2. Thus, the right-hand side of the χ-ODE evaluated at our candidate χ(γ) satisfies

γ1α23

(1− χσ2Y

− (uaχ)2

σ2X

) ∣∣∣∣∣χ=χ(γ)

=α2

3γ1

σ2Y

(1− χ− c2

1(1− c2)

[1− (γ/γo)d

c1 + c2(γ/γo)d

]2).

Thus, using that c1c2d = c1 + c2 in our expression for d(χ(γt))/dt, it suffices to show that

[c1 + c2]2(γtγo

)d= (1− χ)[c1 + c2(γ/γo)d]2 − c2

1(1− c2)[1− (γ/γo)d]2.

Using that χ[c1 + c2(γ/γo)d] = 1 − (γ/γo), it is easy to conclude that this equality reduces

to three equations

0 = c21 − c2

1c2 − c21 + c2

1c2

(c1 + c2)2 = 2c1c2 − c1c2(c2 − c1) + 2c21(1− c2)

0 = c22 + c1c

22 − c2

1(1− c2).

capturing the conditions on the constant, (γ/γo)d and (γ/γo)2d, respectively. The first

condition is trivially satisfied. As for the third, by definition of c1 and c2 we have that

c22/(1− c2) = σ2

X/(uaσY )2 = c21/(1 + c1). Thus, c1

1(1− c2) = c22(1 + c1), and the result follows.

For the second, use that c1c2(c2 − c1) = −(c1 − c2)2 and that c11(1 − c2) = c2

2(1 + c1) to

conclude

2c1c2−c1c2(c2−c1)+2c21(1−c2) = c2

1 +c22 +2c2

2(1+c1) = c21 +c2

2 +2c22 +2c2 c2c1︸︷︷︸

=c1−c2

= (c1 +c2)2.

Thus, χ(γ) as postulated satisfies the χ-ODE. We then conclude by uniqueness of any such

solution.

Finally, when ua = 0, we have that δ1 ≡ 0, and the χ-ODE reduces to χ = α23tγt(1 −

χt)/σ2Y , χ0 = 0. It is then easy to verify that χ(γ) = 1− γt/γo satisfy the ODE, and hence

we conclude using the same uniqueness argument.

Proof of Theorem 1

We begin by proving that a solution to the BVP exists, and any solution has the stated

properties; from there we will establish the rest of the solution. Recall that αt = α3t =

β1tχt + β3t.

71

We being by reversing time and posing the associated IVP parameterized by an initial

guess γ0 = γF ∈ [0, γo]:

v6t = −β22t − 2β1tβ2t(1− χt) + β2

1t(1− χt)2 − 2v6tα2tγtχt

σ2X(1− χt)

(B.15)

v8t = 2β2t + 2(1− 2αt)β1t(1− χt) + 4β21tχt(1− χt)−

v8tα2tγtχt

σ2X(1− χt)

(B.16)

β1t =αtγt

2σ2Xσ

2Y (1− χt)

{−2σ2

X(αt − β1t)β1t(1− χt)

+2σ2Y αtχt(β2t[1 + 2β1tχt]− β1t[1− χt]) + α2

tβ1tγtχtv8t

} (B.17)

β2t =αtγt

2σ2Xσ

2Y (1− χt)

{−2σ2

Xβ21t(1− χt)2 − 2σ2

Y αtβ2tχ2t (1− 2β2t)

+α2tγtχt(2v6t + β2tv8t)

} (B.18)

β3t =αtγt

2σ2Xσ

2Y (1− χt)

{2σ2

Xβ1t(1− χt)β3t − 2σ2Y αtβ2tχ

2t (1− 2β3t)

+α2tβ3tγtχtv8t

} (B.19)

αt =α3tγtχt

2σ2Xσ

2Y (1− χt)

{4σ2

Y β2tχt + αtγtv8t

}(B.20)

γt =γ2t α

2t

σ2Y

(B.21)

with initial conditions v60 = v80 = 0, β10 = 12(2−χ0)

, β20 = 1−χ0

2(2−χ0), β30 = 1

2and γ0 = γF , where

χt = χ(γt) using the function χ : R+ → (−∞, c2) as defined in Lemma 5. Hereafter, we use

the notation χ := c2. Recall also that c2 < 1, and note by inspection that χ(γ) ↘ −c1 < 0

as γ → +∞, and since χ is decreasing, it has range (−c1, x). The right hand sides of the

equations above are of class C1 over the domain {(v6, v8, β1, β2, β3, γ) ∈ R5×R+}, and hence,

subject to existence, they have unique solutions over [0, T ] given the initial conditions. To

solve the BVP, our goal is to show that there exists γF ∈ (0, γo) such that a (unique) solution

to the IVP above exists and it satisfies γT (γF ) = γo.

Note that if γF = 0, then the IVP has the following (unique) solution: for all t ∈ [0, T ],

χt = χ, β1t = 12(2−χ)

, β2t = 1−χ2(2−χ)

and β3t = 12

(so that αt = 12−χ) and γt = 0, with v6 and v8

obtained from integration of their ODEs. (Clearly, this does not correspond to a solution to

the BVP, since γT (0) = 0 < γo.)

We now consider γF > 0. Given the C1 property mentioned above, and the existence of

a solution to the IVP for γF = 0, there exists ε > 0 such that a (unique) solution to the IVP

exists over [0, T ] for each γF ∈ (0, ε) (see Theorem on page 397 in Hirsch et al. (2004)).

Observe that for γF > 0, we can change variables using vi := γvi for i = 6, 8 and create

a new IVP in (v6, v8, β1, β3, β3, γ). We label this System 1, and it consists of (B.17)-(B.21)

72

(replacing all instances of γvi with vi, i = 6, 8) together with

˙v6t = γt

{−β2

2t − 2β1tβ2t(1− χt) + β21t(1− χt)2 + v6tα

2t

[1

σ2Y

− 2χtσ2X(1− χt)

]}(B.22)

˙v8t = γt

{2β2t + 2(1− 2αt)β1t(1− χt) + 4β2

1tχt(1− χt) + v8tα2t

[1

σ2Y

− χtσ2X(1− χt)

]}(B.23)

subject to v60 = v80 = 0 and the remaining initial conditions above. We will argue later that

when this system has a solution over [0, T ], v6 = v6/γ and v8 = v8/γ are well-defined, and

hence a solution to the original IVP can be recovered.

System 1 has the property that in its solution, v6 and β2 can be expressed directly as

functions of the other variables. In anticipation of this property (which we will soon verify),

it is convenient to work with a reduced IVP in (v8, β1, β3, γ), which we call System 2, as

follows:

˙v8t = γt{

2[1− β1t − β3t] + 2(1− 2αt)β1t(1− χt) + 4β21tχt(1− χt)

+v8tα2t

[1/σ2

Y − χt/(σ2X [1− χt])

]} (B.24)

β1t =αtγt

2σ2Xσ

2Y (1− χt)

{−2σ2

X(αt − β1t)β1t(1− χt)

+2σ2Y αtχt([1− β1t − β3t][1 + 2β1tχt]− β1t[1− χt]) + α2

tβ1tχtv8t

} (B.25)

β3t =αtγt

2σ2Xσ

2Y (1− χt)

{2σ2

Xβ1t(1− χt)β3t − 2σ2Y αt[1− β1t − β3t]χ

2t (1− 2β3t)

+α2tβ3tχtv8t

} (B.26)

and (B.21), subject to the initial conditions v80 = 0, β10 = 12(2−χ(γF ))

, β30 = 12

and γ0 = γF .

Observe that in this system, we have substituted 1 − β1 − β3 for all instances of β2 in the

original ODEs. We will show that there exists a γF ∈ (0, γo) such that a (unique) solution

to System 2 exists and satisfies γT (γF ) = γo, and then we will show that given this solution,

we can construct the solutions to (B.18) (which will be 1 − β1 − β3) and (B.22) directly,

solving System 1 (and hence the BVP).

Based on System 2, define

γ := sup{γF > 0 | a solution to System 2 with γ0 = γF exists over [0, T ]},

with respect to set inclusion. Since the RHS of the equations that comprise System 2 are

of class C1, the solution is unique when it exists, and there is continuous dependence of the

solution on the initial conditions; in particular, the terminal value γT is continuous in γF

73

(see Theorem on page 397 in Hirsch et al. (2004)). Hence if there exists γF ∈ (0, γ) such

that γT (γF ) ≥ γo, by the intermediate value theorem there exists a γF ∈ (0, γ) such that

γT (γF ) = γo, allowing us to construct a solution to System 1. We rule out the alternative case

by contradiction, and then we return to the case just described to complete the construction.

Suppose by way of contradiction that for all γF ∈ (0, γ), γT (γF ) < γo. In particular,

because γt is nondecreasing in the backward system for any initial condition, we have that

γt ∈ (0, γo) and by Lemma 5, χt ∈ (0, χ) for all t ∈ [0, T ] when γF ∈ (0, γ).

To reach a contradiction, it suffices to show that the solution to System 2 can be bounded

uniformly over γF ∈ (0, γ), as this would imply that the solution can be extended strictly to

the right of γ in this case, violating the definition of γ.

To establish uniform bounds, we decompose β1 and β3 as sums of forward-looking and

myopic components and show that both of these components are uniformly bounded; for

v8, we establish the bound without any decomposition. Specifically, define βm1t := 12(2−χt) ,

βm3t = 12

and βfit := βit − βmit for all t ∈ [0, T ], i = 1, 3. Observe that for γF ∈ (0, γ), βm1t is

uniformly bounded by [1/4, 1/2] ⊂ [0, 1], as χt = χ(γt) ∈ [0, 1] for all t ∈ [0, T ], and trivially

βm1t ∈ [0, 1]. Hence, we define one final system, System 3, in (v8, βf1 , β

f3 , γ) to be (B.21),

(B.24), and

βf1t = − χt2(2− χt)2

+αtγt

2σ2Xσ

2Y (1− χt)

{−2σ2

X(αt − [βf1t + βm1t ])[βf1t + βm1t ](1− χt)

+2σ2Y αtχt

([1− (βf1t + βm1t)− (βf3t + βm3t)][1 + 2(βf1t + βm1t)χt]

)−2σ2

Y αtχt(βf1t + βm1t)(1− χt) + α2

t (βf1t + βm1t)χtv8t

},

(B.27)

=: hβf1 (βf1t, β

m1t , β

f3t, β

m3t , v8t, γt)

βf3t =αtγt

2σ2Xσ

2Y (1− χt)

{2σ2

X(βf1t + βm1t)(1− χt)(βf3t + βm3t) + α2

t (βf3t + βm3t)χtv8t

+4σ2Y αtβ

f3tχ

2t [1− (βf1t + βm1t)− (βf3t + βm3t)]

} (B.28)

=: hβf3 (βf1t, β

m1t , β

f3t, β

m3t , v8t, γt)

subject to initial conditions v80 = 0, βf10 = 0, βf30 = 0 and γ0 = γF ∈ (0, γ), where αt =

[βf3t+βm3t ]+[βf1t+β

m1t ]χt. Define hv8(βf1t, β

m1t , β

f3t, β

m3t , v8t, γt) as the RHS of (B.24) with βfit+β

mit

substituted for βit, i = 1, 3.

Given that βm1 and βm2 are uniformly bounded, it suffices to show that the solutions

(v8, βf1 , β

f3 ) are uniformly bounded by some [−K,K]3. (Recall that γ is already bounded by

[0, γo].)

Define α = (K + 1)χ + (K + 1), where we suppress dependence on K. Next, for x ∈

74

{v8, βf1 , β

f3 } define hx : R2

++ → R++ as follows:

hv8(K, γo) := γo {2[1 + 2(K + 1)] + 2(1 + 2α)(K + 1)

+4(K + 1)2χ+Kα2[1/σ2

Y + χ/(σ2X [1− χ])

]} (B.29)

hβf1 (K, γo) :=

α2γo [1/σ2Y + χ/(σ2

X [1− χ])]

2(2− χ)2

+αγo

2σ2Xσ

2Y (1− χ)

{2σ2

X [α +K + 1](K + 1)

+2σ2Y αχ([1 + 2(K + 1)][1 + 2(K + 1)χ] +K + 1) + α2K(K + 1)χ

}(B.30)

hβf3 (K, γo) :=

αγo

2σ2Xσ

2Y (1− χ)

{2σ2

X(K + 1)2 + α2χK(K + 1)

+4σ2Y αχ

2K[1 + 2(K + 1)]},

(B.31)

Define

T (γo) := maxK′>0

minx∈{v8,βf1 ,β

f3 }

K ′

hx(K ′, γo),

and let K denote the arg max.41 We now show that given T < T (γo), (v8, βf1 , β

f3 ) are uni-

formly bounded by [−K,K]3. Suppose otherwise, and define τ = inf{t > 0 : (v8t, βf1t, β

f3t) /∈

[−K,K]3}; by supposition and continuity of the solutions, τ ∈ (0, T ) and |xτ | = K, some

x ∈ {v8, βf1 , β

f3 }. Now by construction of the hx(K, γo), for all t ∈ [0, τ ] and for each

x ∈ {v8, βf1 , β

f3 } we have

|xt| = |hx(βf1t, βm1t , βf3t, β

m3t , v8t, γt)| < hx(K, γo)

and thus by the triangle inequality,

|xτ | < 0 + τ · hx(K, γo) < T (γo)hx(K, γo) ≤ K,

a contradiction. We conclude that the solutions (v8, βf1 , β

f3 , γ) to System 3 are uniformly

bounded by [−K,K]3×[0, γo], and hence the solutions (v8, β1, β3, γ) to System 2 are uniformly

bounded by [−K,K]× [−(K + 1), K + 1]2 × [0, γo]. This gives us the desired contradiction

on the definition of γ from before, so we conclude that there exists γF ∈ (0, γ) such that the

solution to System 2 satisfies γT (γF ) = γo. (Note that any such γF lies in (0, γo), as γ is

nondecreasing in the backward system.)

Now consider any such γF and solution (v8, β1, β3, γ) to System 2 with γT (γF ) = γo.

We claim using the comparison theorem that α > 0 over [0, T ]. First, we have α0 > 0, as

41Note that T (γo),K <∞ as the hx grow faster than linearly in K.

75

α0 = 12−χ(γF )

, and by Lemma 5, 0 ≤ χ(γF ) < χ < 1. Now the RHS of (B.20) is locally

Lipschitz continuous in α, uniformly in t, as the remaining coefficients appearing in that

ODE are bounded (being continuous functions of time over the compact set [0, T ]). By

standard application of the comparison theorem in Teschl (2012, Theorem 1.3), we have

αt > 0 for all t ∈ [0, T ].

Hence, we can define

vcand6t :=σ2Y [−1 + 2β1t(1− χt) + αt]

αt− v8t

2

βcand2t := 1− β1t − β3t.

Observe that in System 2, (B.24)-(B.26) was obtained by replacing β2t with βcand2t in

(B.23), (B.17) and (B.19), so (v8, β1, βcand2 , β3, γ) solves (B.23), (B.17), (B.19) and (B.21).

It is tedious but straightforward to verify that vcand6t and βcand2t solve (B.22) and (B.18),

respectively. Now the RHS of (B.22) and (B.18), given the solutions (v8, β1, β3, γ), are locally

Lipschitz continuous in their respective variables, uniformly in t, and thus (vcand6 , βcand2 ) are

the unique solutions to (B.22) and (B.18) given the other variables.

We have γt ≥ γF > 0 for all t ∈ [0, T ]. Thus, we can recover v6 = v6/γ and v8 = v8/γ as

the solutions to (B.15) and (B.16). Hence, we have established the existence of γF ∈ (0, γo)

and solution to the associated IVP posed at the beginning of the proof such that γT (γF ) = γo.

By reversing the direction of time, this is a solution to the BVP in the theorem statement.

For the remainder of the proof, we refer to the forward system. Now in the full system

of equations for the equilibrium, learning and value function coefficients, we have

v2t =2σ2

Y β0t

γtαt

v5t = −σ2Y [β3t − β1t(2− χt)]

γtαt

v7t =2σ2

Y (2β3t − 1)

γtαt

v9t =2σ2

Y [β2t − β1t(1− χt)]γtαt

76

and a system of ODEs for v0, v1, v3, v4, β0:

v0t := β20t +

αtγtχtσ2X(1− χt)2

{σ2Y [−2β2tχt(1− χt) + αtχt(1− χt)2]

+αtσ2X(1− χt)2 − αtγtχtv6t

}v1t := −2β0t

v3t :=α2tγtχtv3t

σ2X(1− χt)

+ 2β0t[β1t(1− χt) + β2t]

v4t := 1− 2β23t

β0t := −α2tγtχt[αtγt(v3t + β0tv8t) + 4σ2

Y β0tβ2tχt]

2σ2Xσ

2Y (1− χt)

,

all of which have terminal values 0. Since the system of ODEs above is linear, it has a unique

solution (given the solution to the BVP), and v2, v5, v7 and v9 are well-defined by the above

formulas since α, γ > 0. Furthermore, by inspection, (β0, v3) = (0, 0) solve their respective

ODEs. We conclude that there exists a LME.

Proof of Theorem 2

Let

β2 = β2/(1− χ); v6 = v6γ/(1− χ)2; v8 = v8γ/(1− χ).

77

The boundary value problem is

˙v6t = γt

{−β2

1t + 2β1tβ2t + β22t + v6t

(α2t

σ2Y

+2(uθ + αt)

2χtσ2X

)}(B.32)

˙v8t = γt

{(−2 + 4αt)β1t − 2β2t +

v8t(uθ + αt)2χt

σ2X

− 4β21tχt

}(B.33)

β1t =γt

4σ2Xσ

2Y (1 + uθχt)

×{

2σ2Xαt

(u2θ − 2β2

1t + αt(uθ + 2β1t))

+v8tαt(uθ + αt)2(uθ − 2β1t)χt

+4β1tχt[u2θσ

2Y +

(2uθσ

2X + σ2

Y

)α2t + uθαt

(uθσ

2X + 2σ2

Y − σ2Xβ1t

)]−4σ2

Y (uθ + αt)2β2tχt + 4σ2

Y (uθ + αt)2β1t(uθ − 2β2t)χ

2t

}(B.34)

˙β2t =γt

4σ2Xσ

2Y (1 + uθχt)

×{

2σ2Xαt

[u2θ + 2β2

1t + αt(uθ + 2β2t)]

+αtχt(uθ + αt)2[−4v6t + v8t(uθ − 2β2t)

]+4αtχtuθσ

2X

[β2

1t + (uθ + 2αt)β2t

]−4(uθ + αt)

2[uθv6tαt + σ2

Y β2t(−uθ + 2β2t)]χ2t

}(B.35)

β3t =γt

4σ2Xσ

2Y (1 + uθχt)

×{−4σ2

Xα2tβ1t

+2αtχt(uθ + αt)[−uθσ2

X + 2uθσ2Xαt − v8tαt(uθ + αt)

]−2αtχt[2uθσ

2Xαtβ1t − 2σ2

Xβ21t]

−χ2t [v8tαt(uθ + αt)

2(uθ − 2β1t) + 4uθσ2Xαt(uθ + αt − β1t)β1t]

+4σ2Y χ

2t (uθ + αt)

2(−1 + 2αt)β2t + 8σ2Y (uθ + αt)

2β1tβ2tχ3t

}(B.36)

γt = −α2tγ

2t

σ2Y

(B.37)

χt = γt{α2t (1− χt)/σ2

Y − (uθ + αt)2χ2

t/σ2X

}. (B.38)

with boundary conditions (γ0, χ0, v6T , v8T , β1T , β2T , β3T ) = (γo, 0, 0, 0, 1+2uθ2(2−χT )

, 1+2uθ2(2−χT )

, 1/2).

The core of the proof is to establish the existence of a solution (v6, v8, β1, β2, β3, γ, χ) to

the boundary value problem for all T < T (γo); from there, it is straightforward to verify

that the remaining coefficients are well-defined and that the HJB equation is satisfied. We

complete these steps at the end of the proof.42

It is useful to introduce z = (v6, v8, β1, β2, β3, γ, χ) and write the system of ODEs (B.32)-

(B.38) as zt = F (zt). We write z = (z1, z2, . . . , z5) and F (z) = (F1(z), F2(z), . . . , F5(z)).

42In particular, we have ignored β0 for now since it does not appear in any of the ODEs displayed here;afterward, it is easily shown to be identically zero.

78

Define B : R2+ → R5 by B(γ, χ) =

(0, 0, 1+2uθ

2(2−χ), (1+2uθ)

2(2−χ), 1/2

), formed by writing the termi-

nal value of z as a function of (γ, χ). Define s0 ∈ R5 by s0 = B(γo, 0) = (0, 0, 1+2uθ4

, 1+2uθ4

, 1/2).

For x ∈ Rn, let ||x||∞ denote the sup norm, sup1≤i≤n |xi|. For any ρ > 0, let Sρ(s0) denote

the ρ-ball around s0

Sρ(s0) := {s ∈ R5| ||s− s0||∞ ≤ ρ}.

For all s ∈ Sρ(s0), let IVP-s denote the the initial value problem defined by (B.32)-(B.38)

and initial conditions (v60, v80, β10, β20, β30, γ0, χ0) = (s, γo, 0). Whenever a solution to IVP-s

exists, it is unique as F is of class C1; denote it by z(s), where z(s) = (z(s), γ(s), χ(s)) =

(v6(s), v8(s), β1(s), β2(s), β3(s), γ(s), χ(s)). Note that such a solution solves the BVP if and

only if

zT (s) = B(γT (s), χT (s)), (B.39)

as the initial values γ0(s) = γo and χ0(s) = 0 are satisfied by construction. Note also that

zT (s) = s +´ T

0F (zt(s))ds; hence (B.39) is satisfied if and only if s is a fixed point of the

function g : Sρ(s0)→ R5 defined by

g(s) := B(γT (s), χT (s))−ˆ T

0

F (zt(s))dt. (B.40)

Note, moreover, that for any solution, we have by Lemma 3 that χt ∈ [0, χ) where we

define χ as 1 for the purpose of this proof.

Before establishing conditions sufficient for g(s) to be a continuous self-map on Sρ for a

given ρ > 0, we establish the following result, which gives existence, uniqueness and uniform

bounds of solutions to IVP-s for all s ∈ Sρ. Specifically, for arbitrary K > 0, we ensure that

the solution zt(s) varies at most K from its starting point s for all t ∈ [0, T ], and thus by

the triangle inequality, this solution varies most ρ + K from s0. These bounds will be used

further when we turn to the self-map property.

Lemma B.2. Fix γo > 0, ρ > 0 and K > 0. Then there exists a threshold T SBC(γo; ρ,K) >

0 such that if T < T SBC(γo; ρ,K), then for all s ∈ Sρ(s0) a unique solution to IVP-s exists

over [0, T ]. Moreover, for all t ∈ [0, T ], zt(s) ∈ Sρ+K(s0). We call this property the System

Bound Condition (SBC).

Proof. Recall that F is of class C1, and hence given s ∈ Sρ(s0), the solution z(s) is unique

whenever it exists. Toward the SBC, note that it suffices to ensure that for all ||z(s)−s||∞ <

K, since then by the triangle inequality, ||z(s)− s0||∞ ≤ ||z(s)− s||∞ + ||s− s0||∞ < ρ+K.

79

In what follows, we construct bounds on F by writing F (z(s)) = F (z(s)− s0 + s0) and using

the conjectured bounds ||z(s)−s0||∞ < ρ+K, γ ∈ (0, γo], χ ∈ [0, χ) for the solution, when it

exists. Using these bounds on F , we then identify a threshold time T SBC(γo; ρ,K) such that

at all times t < T SBC(γo; ρ,K) the solution to IVP-s (exists and) satisfies the conjectured

bounds.

Note that the desired component-wise inequalities |zit(s)−si0| < ρ+K, i ∈ {1, 2, . . . , 5},imply the further bounds

|v6t|, |v8t| < ρ+K

|β1t| < β1(ρ,K) :=1 + 2uθ

4+ ρ+K

|β2t| < β2(ρ,K) :=1 + 2uθ

4+ ρ+K

|β3t| < β3(ρ,K) := 1/2 + ρ+K

|αt| < α(ρ,K) := β1(ρ,K)χ+ β3(ρ,K).

Hereafter, we suppress the dependence of βi, i ∈ {1, 2, 3}, and α on (ρ,K).

Define functions hi : R3++ → R++ as follows:43

h1(γo; ρ,K) := γo{

(β1 + β2)2 + v6

(α2/σ2

Y + 2(uθ + α)2χ/σ2X

)}h2(γo; ρ,K) := γo

{(2 + 4α)β1 + 2β2 + v8(uθ + α)2χ/σ2

X + 4β21 χ}

h3(γo; ρ,K) :=γo

4σ2Xσ

2Y

×{

2σ2X α(u2θ + 2β2

1 + α(uθ + 2β1))

v8α(uθ + α)2(uθ + 2β1)χ

+4β1χ[u2θσ

2Y +

(2uθσ

2X + σ2

Y

)α2 + uθα

(uθσ

2X + 2σ2

Y + σ2X β1

)]+4σ2

Y (uθ + α)2[β2χ+ β1(uθ + 2β2)χ2

]}h4(γo; ρ,K) :=

γo

4σ2Xσ

2Y

×{

2σ2X α[u2θ + 2β2

1 + α(uθ + 2β2)]

αχ(uθ + α)2[4v6 + v8(uθ + 2β2)

]+ 4αχuθσ

2X

[β2

1 + (uθ + 2α)β2

]+4(uθ + α)2χ2

[uθv6α + σ2

Y β2(uθ + 2β2)]}

h5(γo; ρ,K) :=γo

4σ2Xσ

2Y

×{

4σ2X α

2β1 + 2αχ(uθ + α)[uθσ

2X + 2uθσ

2X α + v8α(uθ + α)

]+2αχ[2uθσ

2X αβ1 + 2σ2

X β21 ]

χ2[v8α(uθ + α)2(uθ + 2β1) + 4uθσ

2X αβ1(uθ + α + β1)

]+4σ2

Y χ2(uθ + α)2(1 + 2α)β2 + 8σ2

Y (uθ + α)2β1β2χ3}.

43We use R++ to denote (0,+∞).

80

Now for arbitrary (ρ,K) ∈ R2++, define

T SBC(γo; ρ,K) := mini∈{1,2,...,5}

K

hi(γo; ρ,K). (B.41)

We claim that by construction, for any t < T SBC(γo; ρ,K), if a solution exists at time

t, then ||zt(s) − s||∞ < K, γt ∈ (0, γo] and χt ∈ [0, χ). To see this, suppose by way of

contradiction that there is some s ∈ Sρ and some t < T SBC(γo; ρ,K) at which a solution to

IVP-s exists but either |zit(s)− si| ≥ K for some i ∈ {1, 2, . . . , 5}, γt /∈ (0, γo] or χt /∈ [0, χ);

let τ be the infimum of such times. Now by Lemma 3, it cannot be that γt /∈ (0, γo] or

χt /∈ [0, χ) while zt(s) exists, so (by continuity of z(s) w.r.t. time) it must be that for some

i ∈ {1, 2, . . . , 5}, |ziτ (s) − si| ≥ K, and the bounds γt ∈ (0, γo] and χt ∈ [0, χ) are satisfied

for all t ∈ [0, τ ].

By construction of the hi(γo; ρ,K), for all t ∈ [0, τ ] we have |Fi(zt(s))| ≤ hi(γ

o; ρ,K) and

thus

|ziτ (s)− si| = |ˆ τ

0

Fi(zt(s))dt|

≤ˆ τ

0

|Fi(zt(s))|dt

≤ τ · hi(γo; ρ,K)

< T SBCi (γo; ρ,K)hi(γo; ρ,K)

≤ K,

where the second to last line uses that τ < T SBC(γo; ρ,K) and the last line uses the definition

of T SBCi (γo; ρ,K); but via the strict inequality, this contradicts the definition of τ , proving

the claim. By the triangle inequality, it follows that zt(s) ∈ Sρ+K(s0) if a solution exists at

time t < T SBC(γo; ρ,K). Together, these bounds imply that the solution cannot explode

prior to time T SBC(γo; ρ,K). In other words, a unique solution must exist over [0, T ] for any

T < T SBC(γo; ρ,K) and it satisfies the SBC.

In order to invoke a fixed point theorem, the key remaining step is to establish, through

the following lemma, that g is a well-defined, continuous self-map on Sρ when T is below a

threshold T (γo; ρ,K). The expression for the latter is provided in the proof of the lemma.

Lemma B.3. Fix γo > 0, ρ > 0 and K > 0. There exists T (γo; ρ,K) ≤ T SBC(γo; ρ,K)

such that for all T < T (γo; ρ,K), g is a well-defined, continuous self-map on Sρ.

Proof. First, the inequality T (γo; ρ,K) ≤ T SBC(γo; ρ,K), which holds by construction (as

carried out below), ensures that a unique solution to IVP-s exists for all s ∈ Sρ. Next, we

81

argue that g is continuous. Note that g(s) can be written as B(γT (s), χT (s)) − [zT (s) − s].Since F is of class C1 on the domain Sρ+K × (0, γo]× [0, χ), zt(s) (which includes γ and χ)

is locally Lipschitz continuous in s, uniformly in t ∈ [0, T ],44 and B is continuous, and thus

continuity of g follows readily.

To complete the proof, we show that if T < T (γo; ρ,K), g satisfies the condition

||g(s)− s0||∞ ≤ ρ for all s ∈ Sρ,

which we refer to as the Self-Map Condition (SMC).

Note that g(s)− s0 = ∆(s)−´ T

0F (zt(s))dt, where

∆(s) := B(γT (s), χT (s))−B(γo, 0)

=

(0, 0,

1 + 2uθ2

[1

2− χT (s)− 1

2

],1 + 2uθ

2

[1

2− χT (s)− 1

2

], 0

).

The hi(ρ,K) constructed in the proof of the previous lemma will provide us a bound for

the components of´ T

0F (zt(s))dt, but we must also bound ∆(s), and in particular, ∆3(s)

and ∆4(s). Note that ∆3(s) = ∆4(s).

Recalling that χ ∈ [0, 1), the ODE for χ implies that

χt ≤ γt{α2t (1− χt)/σ2

Y

}≤ γoα2/σ2

Y ,

which depends on (ρ,K) through α. Hence by the fundamental theorem of calculus, we have

χt =´ t

0χsds ≤ (γoα2/σ2

Y ) t.

Hence, using χT (s) ≤ 1 to bound (2 − χT (s)) in the denominators from below by 1, we

have the following bound for ∆3(s) = ∆4(s):

|∆3(s)| =∣∣∣∣1 + 2uθ

2

[1

2− χT (s)− 1

2

]∣∣∣∣ =1 + 2uθ

2

∣∣∣∣ χT (s)

2(2− χT (s))

∣∣∣∣ ≤ 1 + 2uθ4

(γoα2/σ2

Y

)T

For arbitrary (ρ,K) ∈ R2++, define ∆i(γ

o; ρ,K) = 1+2uθ4

(γoα2/σ2Y ) for i ∈ {3, 4} and define

∆i(γo; ρ,K) = 0 for i ∈ {1, 2, 5}. Note that for all i ∈ {1, 2, 3, 4, 5}, ∆i(ρ,K) is proportional

to γo, and by construction, |∆i(s)| ≤ T ∆i(γo; ρ,K).

Now for arbitrary (ρ,K) ∈ R2++, define

T (γo; ρ,K) := min

{T SBC(γo; ρ,K), min

i∈{1,2,...,5}

ρ

∆i(γo; ρ,K) + hi(γo; ρ,K)

}. (B.42)

44See Theorem on page 397 in Hirsch et al. (2004).

82

To establish the SMC, it suffices to establish for each i ∈ {1, 2, . . . , 5} that |gi(s)−si0| ≤ ρ

for all s ∈ Sρ(s0).

We calculate

|gi(s)− si0| = |Bi(γT (s), χT (s))−Bi(γo, 0)︸︷︷︸

=∆i(s)

−ˆ T

0

Fi(zt(s))dt|

≤ |∆i(s)|+ˆ T

0

|Fi(zt(s))|dt

≤ T ∆i(γo; ρ,K) + Thi(γ

o; ρ,K)

< ρ,

where (i) in the second to last line we have used the definition of ∆i(γo; ρ,K) and that

|Fi(zt(s))| ≤ hi(γo; ρ,K); and (ii) in the last line we have used that T < T (γo; ρ,K) ≤

ρ∆i(γo;ρ,K)+hi(γo;ρ,K)

by construction. Hence, for all i ∈ {1, 2, . . . , 5} we have |gi(s)− si0| ≤ ρ,

completing the proof.

To complete the solution to the boundary value problem (B.32)-(B.38), note that by

Lemma B.3, g is a well-defined, continuous self-map on the compact set Sρ. By Brouwer’s

Theorem, there exists s∗ such that s∗ = g(s∗), and hence the solution to IVP-s∗ is a solution

to the BVP. To see that T (γo) ∈ O(1/γo), note simply that γo appears as an outside factor

in the denominators of the expressions defining T SBC(γo; ρ,K) and T (γo; ρ,K). Moreover,

since ρ,K have been chosen arbitrarily, we can then optimize T (γo; ρ,K) over choices of

(ρ,K) ∈ R2++.

We argue that α is finite and that γ and α are strictly positive. Finiteness comes directly

from the definition α = β1χ+ β3 and the finiteness of the underlying variables. This implies

that γt > 0 for all t ∈ [0, T ]. The ODE for α is

αt =αt(uθ + αt)γtχt

2σ2Xσ

2Y (1 + uθχt)

{2uθσ

2Xαt − v8tαt(uθ + αt)− 4σ2

Y (uθ + αt)β2tχt

}. (B.43)

By continuity of the solution to the BVP, the RHS of the equation above is locally Lipschitz

continuous in α, uniformly in t. Moreover, αT = β1T + β3TχT = 1+uθχT2−χT

> 0. By a standard

application of the comparison theorem to the backward version of the previous ODE, it must

be that αt > 0 for all t ∈ [0, T ].

Using the solution to the BVP and the facts above, we solve for the rest of the equilibrium

83

coefficients. First, we have directly

v2t =2σ2

Y β0t

γtαt

v5t =σ2Y [β1t(2− χt)− β3t − uθ]

γtαt

v7t = −2σ2Y (1− 2β3t)

γtαt

v9t =2σ2

Y [β2t − β1t(1− χt)]γtαt

.

The last three are clearly well-defined due to α, γ > 0.

The remaining ODEs for β0, v0, v1, v3 and v4 are

β0t = − (uθ + αt)γtχt2σ2

Xσ2Y (1− χt)(1 + uθχt)

{4uθσ

2Y β0tβ2t(1− χt)χt

+α2t [v8tβ0t(1− χt) + v3tγt(1 + uθχt)]

+αt

[uθv3tγt(1 + uθχt) + β0t(1− χt)

(−2uθσ

2X + uθv8t + 4σ2

Y β2tχt

)]}, β0T = 0,

v0t = β20t + (uθ + αt)

2γtχt

+(uθ + αt)

2γtχ2t

σ2X

[−v6t + σ2

Y (uθ + αt − 2β2t)/αt

], v0T = 0,

v1t = −2β0t, v1T = 0,

v3t = 2β0t(β1t + β2t)(1− χt) +v3t(uθ + αt)

2γtχtσ2X(1− χt)

, v3T = 0, and

v4t = 1− 2β23t, v4T = 0.

Observe the system for (β0, v1, v3) is uncoupled from (v0, v4). By inspection, the former has

solution (β0, v1, v3) = (0, 0, 0), and uniqueness follows from the the associated operator being

locally Lipschitz continuous in (β0, v1, v3) uniformly in t ∈ [0, T ]. It follows that v2 = 0, and

the solutions for (v0, v4) can be obtained directly by integration, given their terminal values.

We conclude that a linear Markov equilibrium exists.

Appendix C: Proofs for Section 5


We first analyze the public case, then the no feedback case, and then we analyze the learning

and payoff comparisons. Proposition 5 is then an immediate consequence of Lemmas C.5

84

and C.6.

Public Case

System of ODEs We look for an equilibrium of the form at = β0t + β1tMt + β3tθ, where

Mt = Mt is publicly known.

The (backward) system of ODEs is

β0t = −rβ0tβ3t

β1t = −β1tβ3t

(r +

β3tγtσ2Y

)β3t = −β3t

[−r + β3t

(r − β1tγt

σ2Y

)]γt =

β23tγ

2t

σ2Y

,

with initial conditions

β00 = 0, β10 = −ψγtσ2Y

≤ 0, β30 = 1 and γ0 = γF ∈ (0, γo).

Note: for the value function written as

V (θ,Mt, t) = v0t + v1tθ + v2tMt + v3tθ2 + v4tM

2t + v5tθMt,

we have the (backward) system

v0t = −rv0t −(v2

2t − 4σ2Y v4t)γ

2t

(−2σ2Y + v5tγt)2

v1t = −rv1t −2v2tγt

−2σ2Y + v5tγt

v2t = −rv2t −v2t(4σ

2Y γt + 4v4tγ

2t )

(−2σ2Y + v5tγt)2

v3t = −1− rv3t +4σ4

Y

(−2σ2Y + v5tγt)2

v4t = −rv4t −4v4tγt(2σ

2Y + v4tγt)

(−2σ2Y + v5tγt)2

v5t = −rv5t −4γt[σ

2Y (−2v4t + v5t) + v4tv5tγt]

(−2σ2Y + v5tγt)2

,

with initial conditions v00 = v10 = v20 = v30 = v50 = 0 and v40 = −ψ.

85

In terms of the β coefficients (for which existence of a solution is shown below), we have

v2t =2σ2

Y β0t

β3tγt

v4t =σ2Y β1t

β3tγt

v5t = −2σ2Y (1− β3t)

β3tγt.

Since β3t > 0 and thus v5tγt = −2σ2Y (1−β3t)β3t

< 2σ2Y , the denominator in each ODE is

bounded away from zero. Given v2, v4 and v5, the ODEs for v0, v1 and v3 are linear and

uncoupled and thus have solutions.

Existence of Linear Markov Equilibrium: r = 0 case When r = 0, the backward

system simplifies to

β0t = 0

β1t = −β1tβ23tγt

σ2Y

β3t =β1tβ

23tγt

σ2Y

γt =β2

3tγ2t

σ2Y

,

with initial conditions

β00 = 0, β10 = −ψγtσ2Y

≤ 0, β30 = 1 and γ0 = γF ∈ (0, γo).

Define ψ := ψγo/σ2Y and T := Tγo/σ2

Y .

Lemma C.1. Suppose r = 0. For all T > 0 and all ψ > 0, there exists a linear Markov

equilibrium. The corresponding γT ∈ (0, γo) satisfies gpub(γT/γo) = 0, where

gpub(ρ) := −T ψρ2(1− ρ) + ρ(1 + T )− 1 = 0.

In addition, β3 ∈ (0, 1] is increasing and β1 < 0 is decreasing, and β1 + β3 is constant.

Proof. Since gpub(0) = −1 < 0 < gpub(1) = T , there exists γF ∈ (0, γo) as in the statement of

the proposition. We now show that for any such γF , there exists a solution to the backward

86

IVP with γ0 = γF , and it satisfies γT = γo. The proof is constructive, and the solution is

unique conditional on γF .

Note that β1t + β3t = 0, so β1 + β3 is constant, and

β1t + β3t = β10 + β30 = 1 + β10

=⇒ β1t = 1 + β10 − β3t.

Hence, a uniformly bounded solution for β3 exists if and only if the same holds for β1.

Next, define Π := β1γ and observe that Π ≡ 0, so

β1tγt = β10γF

=⇒ β1t = β10γF/γt (C.1)

=⇒ β3t = 1 + β10(1− γF/γt), (C.2)

where γt ≥ γF > 0 for all t over the interval of existence, since γ is nondecreasing. Now

|β1t| ≤ |β10|, so both β1 and β3 are uniformly bounded; we now show that γ is uniformly

bounded above.

Using (C.2), the ODE for γ is

γt = [1 + β10(1− γF/γt)]2γ2t /σ

2Y

= [(1 + β10)γt − β10γF ]2/σ2

Y .

Integrating and using the initial condition for β10 and γ0 = γF yields

γt =γF [σ4

Y + tψ(γF )2]

σ4Y − tγF (−γFψ + σ2

Y ),

wherever this exists. The denominator is strictly positive if and only if

0 < σ4Y

[1− tγo

σ2Y

(−ρ2ψ + ρ)

]=: σ4

Y h(t).

Now h(t) is linear and thus bounded between h(0) = 1 > 0 and h(T ) = 1− T (−ρ2ψ + ρ) =ρ2T1−ρ > 0, where we have used the identity gpub(ρ) = 0 to eliminate ρ2ψ. We conclude that

the denominator is strictly positive for all t.

87

Moreover, at time t = T , we have

γT =γF [σ4

Y + Tψ(γF )2]

h(T )

=ργoσ4

Y [1 + T ψρ2]

σ4YTρ2

1−ρ

= γo[1 + T ψρ2](1− ρ)

T ρ

= γo,

where the last equality follows from gpub(ρ) = 0.

Returning to (C.1), we obtain that β1 is negative and increasing (in the backward system),

and applying the comparison theorem, it cannot change sign. From (C.2), we obtain that

β3 is less than 1, is decreasing in the backward system, and cannot change sign.

Define functions ψ, ψ : [8,∞) → [27/8,∞) by ψ(T ) = − 1T

+ 52

+ T8

+ (T−8)3/2

8√T

and

ψ(T ) = − 1T

+ 52

+ T8− (T−8)3/2

8√T

.

Lemma C.2. There is a unique linear Markov equilibrium if T ≤ 8 or if T > 8 and

ψ /∈ [ψ(T ), ψ(T )]. For ψ ≤ 27/8, there is a unique linear Markov equilibrium for all T (and

thus for all T ). For the remaining parameter settings, there is multiplicity of equilibria.

Proof. We he have (gpub)′(ρ) = −T ψρ(2− 3ρ) + 1 + T . This quadratic has roots if and only

if ψ ≥ 3(1+T )

T; since (gpub)′(0) = 1 + T > 0, ψ ≥ 3(1+T )

Timplies that gpub is strictly increasing

in [0, 1] and there is a unique solution to gpub(ρ) = 0. For the rest of the proof, suppose that

ψ > 3(1+T )

T. Denote by ρ and ρ the roots of (gpub)′(ρ) = 0, with 0 < ρ < ρ < 1, and note

that these are continuously differentiable in T and ψ in the assumed domain:

ρ :=1−

√1− 3(1 + T )/(T ψ)

3

ρ :=1 +

√1− 3(1 + T )/(T ψ)

3.

Since gpub is increasing on [0, ρ)∪ [ρ, 1] and decreasing on [ρ, ρ), so a necessary and sufficient

condition for uniqueness is that either 0 < gpub(ρ) or gpub(ρ) < 0.

88

By the envelope theorem,

d

dψgpub(ρ) =

∂

∂ψgpub(ρ)|ρ=ρ < 0 and

d

dψgpub(ρ) =

∂

∂ψgpub(ρ)|ρ=ρ < 0.

By construction we have gpub(ρ) > gpub(ρ). Moreover, we have the limits

limψ↓ 3(1+T )

T

gpub(ρ) = limψ↓ 3(1+T )

T

gpub(1/3)

=T − 8

9and

limψ↑∞

gpub(ρ) = −1.

For T ≤ 8, the first limit is nonpositive, and hence for all ψ in the domain, gpub(ρ), gpub(ρ) <

0, giving uniqueness. For T > 8, the first limit is positive, and thus there exist thresholds

ψ, ψ with 3(1+T )

T< ψ < ψ <∞ defined implicitly by

gpub(ρ(ψ)); ψ) = 0 (C.3)

gpub(ρ(ψ)); ψ) = 0, (C.4)

such that (i) if ψ < ψ, gpub(ρ) > 0, giving uniqueness and (ii) if ψ < ψ, gpub(ρ) > 0, giving

uniqueness. We now identify ψ and ψ. Note that if either (C.3) or (C.4) holds, there exists

a double root at some ρ2 ∈ (0, 1) and gpub must be of the form

gpub(ρ) ≡ K(ρ− ρ1)(ρ− ρ2)2

⇐⇒ −T ψρ2(1− ρ) + ρ(1 + T )− 1 ≡ K(ρ− ρ1)(ρ− ρ2)2.

Matching coefficients w.r.t. powers of ρ yields the system

T ψ = K (C.5)

−T ψ = −K(ρ1 + 2ρ2) (C.6)

1 + T = K(ρ22 + 2ρ1ρ2) (C.7)

−1 = −Kρ1ρ22 (C.8)

in variables (K, ψ, ρ1, ρ2). Together (C.5) and (C.6) imply ρ1 = 1 − 2ρ2, with which (C.8)

becomes 1 = K(1 − 2ρ2)ρ22. Dividing (C.7) by this yields 1 + T = 2−3ρ2

ρ2(1−2ρ2). Solving this

89

yields ρ2 =4+T±√

(T−8)T

4(1+T ). Plugging these values of ρ2 back into (C.7), solving for K and

then using (C.5) to solve for ψ yields

ψ, ψ ∈

−16(1 + T )3

T

(4 + T −

√T (T − 8)

)(4− 5T + 3

√T (T − 8)

) ,

− 32(1 + T )3

T

(−2 + T +

√T (T − 8)

)(4 + T +

√T (T − 8)

) .

Multiplying each fraction by the conjugate of each factor in the denominator and simplifying

yields the expressions given before the statement of the proposition.

No Feedback Case

System of ODEs We look for an equilibrium of the form at = β0m0 + β1Mt + β3θ, where

Mt = E1t [Mt].

The backward system is

β0t = −αt[rσ2Y β0t + m0β

21tγt(1− χt)]

σ2Y

β1t = −αtβ1t[rσ2Y + β3tγt − β1tγt(1− χt)]

σ2Y

β3t =αt[rσ

2Y (1− β3t) + β1tβ3tγt]

σ2Y

αt = r(1− αt)αt

γt =α2tγ

2t

σ2Y

,

with initial conditions β00 = 0, β10 = − ψγ0σ2Y +ψγ0χ(γ0)

, β30 = 1, α0 =σ2Y

σ2Y +ψγ0χ(γ0)

and γ0 = γF ,

where χ(γ) := 1− γ/γo.Writing the value function as

V (t, θ,Mt) = v0t + v1tθ + v2tMt + v3tθ2 + v4tM

2t + v5tθMt,

90

we have the (backward) system

v0t = −rv0t −v2tγ

2t [v2t + 4m0v4t(1− χt)]

[−2σ2Y + γt(v5t + 2v4tχt)]2

v1t = −rv1t −2γt[2m0v4tv5tγt(1− χt) + v2t(−2σ2

Y + γt[v5t + 2v4tχt])]

[−2σ2Y + γt(v5t + 2v4tχt)]2

v2t = −rv2t −4γt[σ

2Y v2t + 2m0v

24tγt(1− χt)

[−2σ2Y + γt(v5t + 2v4tχt)]2

v3t = −rv3t −v5tγt[−4σ2

Y + γt(v5t + 4v4tχt)

[−2σ2Y + γt(v5t + 2v4tχt)]2

v4t = −rv4t +4v4tγt[−2σ2

Y + v4tγt]

[−2σ2Y + γt(v5t + 2v4tχt)]2

v5t = −rv5t −4v4tγt

−2σ2Y + γt(v5t + 2v4tχt)

+4v5tγt[−σ2

Y + v4tγt]

[−2σ2Y + γt(v5t + 2v4tχt)]2

with initial conditions v10 = v20 = v30 = v50 = 0, v00 = −ψγ0χ(γ0) and v40 = −ψ. Given a

solution to the β system (with existence shown in the next subsection),

v2t =2σ2

Y β0t

αtγt

v4t =σ2Y β1t

αtγt

v5t = −2σ2Y (1− β3t)

αtγt.

As α is positive and constant (as shown in the next subsection), the coefficients v2, v4

and v5 are well-defined in terms of the solution to the β system. Now in the ODEs for v0,

v1 and v3, we have in the denominators −2σ2Y + γt(v5t + 2v4tχt) =

2σ2Y

αtbounded away from

0. Given v2, v4 and v5, these ODEs are linear and uncoupled and thus have solutions.

91

Existence of Linear Markov Equilibrium: r = 0 case When r = 0, the associated

backward system is:

β1 = −β1γ(β3 + β1χ− β1)(β3 + β1χ)

σ2Y

β3 =β3β1γ(β3 + β1χ)

σ2y

β0 = −β21γ(1− χ)(β3 + β1χ)

σ2Y

γt =γ2(β3 + β1χ)2

σ2Y

where χ = χ(γ) = 1− γ/γo, and with initial conditions

β30 = 1, β00 = 0, γ0 = γF ∈ (0, γo), and β10 = − ψγF

σ2Y + ψγFχ(γF )

< 0.

Lemma C.3. For all T > 0 and ψ > 0, there exists a linear Markov equilibrium. The

corresponding γT ∈ (0, γo) satisfies gNF (γT/γo) = 0 where

gNF (ρ) := T ρ− (1− ρ)[1 + ψρ(1− ρ)]2, ρ ∈ [0, 1],

with T := Tγo/σ2Y and ψ := ψγo/σ2

Y .

Proof. Observe first that since gNF (0) < 0 and gNF (1) > 0, there exists γF ∈ (0, γo) as in

the statement of the Proposition. Now fix any such solution as the initial condition for the

posterior variance in the backward IVP. We will show that this problem admits a solution

over [0, T ] with the property that γT = γo.

Consider the backward IVP indexed by γF over its maximal interval of existence. Notice

first that β0 + β1 + β3 ≡ 0, and so

β0t + β1t + β3t = β00 + β10 + β30 = 1− ψγF


.

Thus, as long as β1 and β3 exist, β0 will too, and since β0 does not appear in any of the

other ODEs, we can ignore it from the analysis.

Similarly, α := β3 + β1χ satisfies α ≡ 0, and so

αt := β3t + β1tχt = α := 1− ψγFχ(γF )


=σ2Y


∈ (0, 1).

92

Consider now the subsystem

β1 = −β1γ(β3 + β1χ− β1)(β3 + β1χ)

σ2Y

= −β1αγ(α− β1)

σ2Y

β3 =β3β1γ(β3 + β1χ)

σ2Y

=β3β1γα

σ2Y

,

(C.9)

and observe that since β10 < 0 and β30 = 1 > 0, the same inequalities hold in a neighborhood

of zero.

We claim that β3 and β1 do not change signs. First, both cannot vanish at the same

time, as this would violate that αt = α > 0. Now suppose β3 is the first to do it, say at time

t; then for all s ∈ [0, t], β10 < 0 and by the comparison theorem, β3 > 0 for all s ∈ [0, t], a

contradiction. Likewise, a contradiction obtains if β1 vanishes first. We therefore conclude

that β1 is increasing while β3 is decreasing, and that they lie in [β10, 0] and [0, 1] as long as

they exist.

The existence of a solution of the IVP over [0, T ] then reduces to the existence of a solution

to the γ-ODE when this one is driven by α. As long as this one exists, straightforward

integration shows that

γt =γFσ2

Y

σ2Y − γF α2t

.

Since γF > 0, the right-hand side is well defined over [0, T ] if σ2Y − γF α2T > 0. Using that

α = σ2Y /[σ

2Y + ψγF (1− γF/γo)] = 1/[1 + ψρ(1− ρ)], where ρ = γF/γo and ψ = ψγo/σ2

Y , we

get

σ2Y − γF α2T > 0⇔ 1− ρα2T > 0⇔ [1 + ψρ(1− ρ)]2 − ρT > 0.

By definition of γF , however, gNF (ρ) = 0, which implies

⇔ [1 + ψρ(1− ρ)]2 − ρT = ρ[1 + ψρ(1− ρ)]2 > 0.

Moreover,

γT =γFσ2

Y

σ2Y − γF α2T

=γF

1− ρα2T=γF [1 + ψρ(1− ρ)]2

ρ[1 + ψρ(1− ρ)]2= γo,

concluding the proof.

Lemma C.4. Suppose that ψ ∈ (0, 2]. Then gNF has a unique root in [0, 1], and thus if

ψ ∈ (0, 2], there is a unique LME.

Proof. To show uniqueness, we prove that, under the previous range of values of ψ, the

93

derivative of gNF is positive at any point that satisfies gNF (ρ) = 0; thus, gNF can only cross

zero one, and hence, it does it from below.

It is easy to verify that

(gNF )′(ρ) = T + [1 + ψρ(1− ρ)]2 − 2ψ(1− ρ)(1− 2ρ)[1 + ψρ(1− ρ)].

At a crossing point, however, T ρ+ ρ[1 + ψρ(1− ρ)]2 = [1 + ψρ(1− ρ)]2, and so

(gNF )′(ρ) =1

ρ[1 + ψρ(1− ρ)]

{1 + ψρ(1− ρ)− 2ψρ(1− ρ)(1− 2ρ)

}.

Since ψ > −4 and ρ ∈ (0, 1), [1 + ψρ(1− ρ)] > 0. Thus, it suffices to show that

Q(ρ) := 1− ψρ+ 5ψρ2 − 4ψρ3 > 0, for ρ ∈ (0, 1).

Now Q(ρ) = 1 − ψρ(1 − ρ)(1 − 4ρ), so when ψ > 0, the previous cubic is positive (and at

least 1) over [1/2, 1]. Also, over [0, 1/2] we have

Q(ρ) = 1− ψρ+ 5ψρ2[1− 4ρ/5] > 1− ψρ > 1− ψ/2.

Since ψ ≤ 2, the result follows.

Learning and Payoff Comparisons

Lemma C.5. If ψ ∈ (0, 2], then there is more learning in the public case for all T > 0.

Proof. Let ρx = γxT/γo ∈ (0, 1), where γxT is the terminal value of γ in the BVP of case

x ∈ {public, no feedback}. When ψ ∈ (0, 2], these values are the unique solutions to the

equations

0 = gNF (ρ) := ρT − (1− ρ)[1 + ψρ(1− ρ)]2

= ρ(1 + T )− 1− ψρ(1− ρ)2[2 + ψρ(1− ρ)]

0 = gpub(ρ) := ρ(1 + T )− 1− ψT ρ2(1− ρ), (C.10)

respectively. In particular, observe that ρx > 1/(1 + T ), x ∈ {public, no feedback}. Our

goal is to show ρpub < ρNF .

94

Now, using that ρpub(1 + T )− 1 = ψT (ρpub)2(1− ρpub), we get that

gNF (ρpub) =ψ(1− ρpub)

T︸︷︷︸>0

{T 2(ρpub)2 − (1− ρpub)[2ρpubT + ρpub(1 + T )− 1]

}.

Thus, letting

Q(ρ) := T 2ρ2 − (1− ρ)[2ρT + ρ(1 + T )− 1] = ρ2(T 2 + 3T + 1)− ρ(3T + 2) + 1,

it suffices to show that Q(ρpub) < 0, as gNF (ρ) < 0 if and only if ρ < ρNF .

Observe that the roots of Q are given by

ρ− :=(3−

√5)T + 2

2(T 2 + 3T + 1), and ρ+ :=

(3 +√

5)T + 2

2(T 2 + 3T + 1),

and that ρ− <1

1+T< ρ+. Consequently, it suffices to show that gpub(ρ+) > 0: this ensures

that ρpub < ρ+, and since ρpub > 11+T

> ρ−, this implies that Q(ρpub) < 0.

Straightforward algebraic manipulation yields that

gpub(ρ+) > 0

⇔ 4(1 + T )[(3 +√

5)T + 2][T 2 + 3T + 1]2 − 8[T 2 + 3T + 1]3

−ψT [(3 +√

5)T + 2]2[2T 2 + (3−√

5)T ] > 0

Since the constraint is tightest when ψ = 2, we aim to show that, for all T > 0,

4(1 + T )[(3 +√

5)T + 2][T 2 + 3T + 1]2 − 8[T 2 + 3T + 1]3

−2T [(3 +√

5)T + 2]2[2T 2 + (3−√

5)T ] > 0

⇔ 4[(1 +√

5)T + (√

5− 1)][T 2 + 3T + 1]2 − 2[(3 +√

5)T + 2]2[2T 2 + (3−√

5)T ] > 0.

95

The polynomial on the left-hand side can be then written as T5∑i=0

aiTi where

a5 = 4(1 +√

5) > 0

a4 = 4(−9 +√

5) < 0

a3 = 4(−13 + 11√

5) > 0

a2 = 68(√

5− 1) > 0

a1 = 4(−11 + 9√

5) > 0

a0 = 4(√

5− 1) > 0

(C.11)

It is trivial to check that a3 + a4T + a5T2 > 0 for all T > 0. Since the rest of the coefficients

are strictly positive, the proof is concluded.

Let V x denote the ex ante payoff to player 1 in the case x ∈ {public,no feedback}.It follows that

V pub = E0

[−ˆ T

0

(at − θ)2dt− ψM2T

]= −ˆ T

0

E0

[(β1tMt + [β3t − 1]θ)2

]dt− ψ(m2

0 + γo − γT )

= −ˆ T

0

[(β3t − 1)2γo + β2

1t(γo − γt) + 2β1t(β3t − 1)(γo − γt)

]dt− ψσ2

Y (1− ρpub)

Using the solutions for the coefficients and γt in terms of γF and carrying out the simplifi-

cations, we obtain

V pub = V pub(ρpub),

where

V pub(ρ) := σ2Y

{−ψ(1− ρ) + T ψρ2[−ψ(1− ρ) + 1] + ln

(1− ρT ρ

)}.

In the no feedback case, note that

E0[M2t ] = E0[(χtθ + (1− χt)m0)2]

= E0[χ2t θ

2]

= χ2tγ

o.

96

Hence,

E0[M2t ] = E0[(Mt −Mt)

2]︸︷︷︸=γ2t

+E0[M2t ]

= χtγt + χ2tγ

o.

Using at = β0t + β1tMt + β3tθ = αtθ, we now calculate

V NF = E0

[−ˆ T

0

(at − θ)2dt− ψ(χTγT + χ2Tγ

o)

]= E0

[−ˆ T

0

θ2(1− a)2dt− ψχT (γT + χTγo)

]= −(1− α)2γoT − ψχT (γT + χTγ

o).

Expressing χT = 1− γT/γo, γT and α in terms of γF = γT , we have

V NF = V NF (ρNF ),

where

V NF (ρ) := σ2Y

{−ψ2ρ(1− ρ)3 − ψ(1− ρ)

}.

Lemma C.6. For ψ ∈ (0, 1], the long-run player is better off in the no feedback case than

in the public case for all T > 0.

Proof. We show that (i) V pub(ρpub) < V NF (ρpub) and (ii) V NF (ρ) is increasing for ρ ≥ ρpub,

so that V pub(ρpub) < V NF (ρNF ).

Toward establishing (i), define

V (ρ) := V pub(ρ)− V NF (ρ)

= σ2Y

{T ψρ2[−ψ(1− ρ) + 1] + ln

(1− ρT ρ

)+ ψ2ρ(1− ρ)3

},

so that our first goal is to show V (ρpub) < 0. Using the inequality ln(x) < x − 1 for x > 0,

97

we have

V (ρ) < σ2Y

{T ψρ2[−ψ(1− ρ) + 1] +

[1− ρT ρ

− 1

]+ ψ2ρ(1− ρ)3

}.

=σ2Y

T ρV2(ρ),

where V2(ρ) := T 2ψρ3[1− ψ(1−ρ)] + 1−ρ(1 + T ) + T ψ2ρ2(1−ρ)3, and so it suffices to show

V2(ρpub) < 0. Now the equation gpub(ρpub) = 0 is equivalent to ψ = −1−(1+T )ρ

T ρ2(1−ρ)|ρ=ρpub ; using

this to eliminate ψ and simplifying, we obtain

V2(ρpub) = − [ρ(1 + T )− 1]3

T ρ2|ρ=ρpub ,

which is strictly negative as ρpub > 11+T

, establishing claim (i).

Toward claim (ii), differentiate

d

dρV NF (ρ) = σ2

Y

{−ψ2[−3ρ(1− ρ)2 + (1− ρ)3] + ψ

}= σ2

Y ψ︸︷︷︸>0

{−ψ(1− ρ)2(1− 4ρ) + 1

}.

The expression in braces is positive iff h(ρ) := (1− ρ)2(1− 4ρ) < 1ψ

. Now for ρ ∈ [0, 1], h(ρ)

attains its maximum value of 1 at ρ = 0. Hence, the expression is positive for all ρ ∈ (0, 1) if

ψ ≤ 1. We conclude that V NF (ρ) is increasing for all ρ ∈ (0, 1), and hence for all ρ ≥ ρpub,

if ψ ≤ 1.

Combining parts (i) and (ii) yields V pub(ρpub) < V NF (ρpub) < V NF (ρNF ) as desired.


Proof of Proposition 6. The proof is by contradiction; suppose that a linear Markov equi-

librium exists in which the long-run player plays at = β0t + β1tM2t + β2tLt + β3tθ. Define

α0t := β0t, α2t := β2t + β1t(1− χt) and α3t = β3t + β1tχt. We derive a collection of necessary

conditions for equilibrium and show that there is no real value of α30 for which they can be

satisfied.

From the long-run player’s perspective, given an action profile (at)t≥0, the state variables

98

Mt and Lt evolve according to

dLt = µL(at)dt+ σLdZXt

dMt = µM(at)dt+ σMdZXt ,

where

µL(a) =Bt

1− χt[a+ ξ(m− l)− α0t − (α2t + α3t)l]

µM(a) = Σα3tγt [a− α0t − α2tl − α3tm]

σL =BtσX1− χt

σM = BtσX

Bt =γt(α3t + ξχt)

σ2X

Σ =1

σ2X

+1

σ2Y

,

and where the learning ODEs are

γt = −γ2t α

23tΣ (C.12)

χt = γtα23tΣ(1− χt)− (α3t + ξχt)Bt. (C.13)

In any conjectured equilibrium, given (t, θ,Mt, Lt) = (t, θ,m, l), the long-run player’s

expected continuation payoff from following the equilibrium strategy is Et[´ T

t(θ − Ls)asds

],

which is of the form v0t + v1tθ + v2tm+ v3tl + v4tθ2 + v5tm

2 + v6tl2 + v7tθm+ v8tθl + v9tml.

By optimality, the long-run player’s value function must satisfy the HJB equation

0 = supa

{(θ − L)a+ Vt + µL(a)VL + µM(a)VM +

1

2σ2MVMM +

1

2σ2LVLL + σLσMVLM

},

and since this is linear in a, the following indifference condition must hold:

0 = (θ − l) +Bt

1− χtVL + α3tγtΣVM . (C.14)

Since (C.14) must hold for all values of θ,m and l, we match coefficients to obtain the

99

following four equations:

constant : 0 = γt

[Σv2tα3t +

v3t(α3t + ξχt)

σ2X(1− χt)

](C.15)

θ : 0 = 1 + Σv7tα3tγt +v8tγt(α3t + ξχt)

σ2X(1− χt)

(C.16)

m : 0 = γt

[2Σv5tα3t +

v9t(α3t + ξχt)

σ2X(1− χt)

](C.17)

l : 0 = −1 + Σv9tα3tγt +2v6tγt(α3t + ξχt)

σ2X(1− χt)

. (C.18)

Note that γ0 = γo > 0, and since χ0 = 0, (C.16) (or (C.18)) implies α30 6= 0. Hence, for

sufficiently small t, we can solve (C.17)-(C.18) to obtain

v5t = − σ2Y v9t(α3t + ξχt)

2(σ2X + σ2

Y )α3t(1− χt)(C.19)

v6t =[σ2Xσ

2Y − (σ2

X + σ2Y )v9tα3tγt](1− χt)

2σ2Y γt(α3t + ξχt)

. (C.20)

Differentiate v5t, use (C.13) to replace χt and evaluate at time t = 0 to obtain

v50 = −v90α30(ξ + α30)γ0 + σ2Y v90

2(σ2X + σ2

Y ). (C.21)

On the other hand, the indifference condition reduces the HJB equation to

0 = Vt + µL(0)VL + µM(0)VM +1

2σ2MVMM +

1

2σ2LVLL + σLσMVLM . (C.22)

Now (C.22) must hold for all θ,m and l, and by matching coefficients, we obtain a set of

10 equations. Evaluating the coefficients on m2 and ml at t = 0, and using (C.19), (C.20)

and (C.21) to eliminate v5, v6 and v5, we obtain

m2 : 0 =(σ2

X + 2σ2Y )v90α30(ξ + α30)γ0 − σ2

Xσ2Y v90

σ2X(σ2

X + σ2Y )

=⇒ v90 =(σ2

X + 2σ2Y )v90α30(ξ + α30)γ0

σ2Xσ

2Y

(C.23)

ml : 0 = ξ − (σ2X + 2σ2

Y )v90α30(ξ + α30)γ0

σ2Xσ

2Y

+ v90

=⇒ v90 = −ξ +(σ2

X + 2σ2Y )v90α30(ξ + α30)γ0

σ2Xσ

2Y

(C.24)

100

Clearly, (C.23) and (C.24) cannot hold simultaneously, which gives us the desired contradic-

tion. (Note that if v90 6= 0 and α30 > 0, equality (of the form +∞ = +∞) between the right

hand sides of (C.23) and (C.24) can be achieved if α30 = +∞, supporting the interpretation

that the long-run player would trade away all information in the first instant.)

References

Alonso, R., W. Dessein, and N. Matouschek (2008): “When does coordination re-

quire centralization?” American Economic Review, 98, 145–179.

Amador, M. and P.-O. Weill (2012): “Learning from private and public observations

of others’ actions,” Journal of Economic Theory, 147, 910–940.

Angeletos, G.-M. and A. Pavan (2007): “Efficient use of information and social value

of information,” Econometrica, 75, 1103–1142.

Aragones, E., A. Postlewaite, and T. Palfrey (2007): “Political reputations and

campaign promises,” Journal of the European Economic Association, 5, 846–884.

Back, K. (1992): “Insider trading in continuous time,” The Review of Financial Studies,

5, 387–409.

Back, K., C. H. Cao, and G. A. Willard (2000): “Imperfect competition among

informed traders,” The Journal of Finance, 55, 2117–2155.

Bolton, P. and M. Dewatripont (2013): “Authority in Organizations,” Handbook of

organizational Economics, 342–372.

Bonatti, A. and G. Cisternas (2019): “Consumer Scores and Price Discrimination,”

Review of Economic Studies.

Bonatti, A., G. Cisternas, and J. Toikka (2017): “Dynamic oligopoly with incom-

plete information,” The Review of Economic Studies, 84, 503–546.

Callander, S. and S. Wilkie (2007): “Lies, damned lies, and political campaigns,”

Games and Economic Behavior, 60, 262–286.

Carlsson, H. and S. Dasgupta (1997): “Noise-proof equilibria in two-action signaling

games,” Journal of Economic Theory, 77, 432–460.

101

Cisternas, G. (2018): “Two-sided learning and the ratchet principle,” The Review of

Economic Studies, 85, 307–351.

Dessein, W. and T. Santos (2006): “Adaptive Organizations,” Journal of Political Econ-

omy, 114, 956–995.

Dilme, F. (2019): “Dynamic Quality Signaling with Hidden Actions,” Games and Economic

Behavior, 113, 116–136.

Ely, J. C., J. Horner, and W. Olszewski (2005): “Belief-free equilibria in repeated

games,” Econometrica, 73, 377–415.

Faingold, E. and Y. Sannikov (2011): “Reputation in continuous-time games,” Econo-

metrica, 79, 773–876.

Foster, F. D. and S. Viswanathan (1994): “Strategic trading with asymmetrically in-

formed traders and long-lived information,” Journal of Financial and Quantitative Anal-

ysis, 29, 499–518.

——— (1996): “Strategic trading when agents forecast the forecasts of others,” The Journal

of Finance, 51, 1437–1478.

Fuchs, W. (2007): “Contracting with repeated moral hazard and private evaluations,”

American Economic Review, 97, 1432–1448.

Gryglewicz, S. and A. Kolb (2019): “Strategic Pricing in Volatile Markets,” Tech. rep.

He, H. and J. Wang (1995): “Differential information and dynamic behavior of stock

trading volume,” The Review of Financial Studies, 8, 919–972.

Heinsalu, S. (2018): “Dynamic Noisy Signaling,” American Economic Journal: Microeco-

nomics, 10, 225–249.

Hermalin, B. E. (1998): “Toward an economic theory of leadership: Leading by example,”

American Economic Review, 1188–1206.

Hirsch, M. W., S. Smale, and R. L. Devaney (2004): Differential equations, dynamical

systems, and an introduction to chaos, Academic press.

Holden, C. W. and A. Subrahmanyam (1992): “Long-lived private information and

imperfect competition,” The Journal of Finance, 47, 247–270.

102

Horner, J. and N. S. Lambert (2019): “Motivational Ratings,” Review of Economic

Studies.

Horner, J. and S. Lovo (2009): “Belief-free equilibria in games with incomplete infor-

mation,” Econometrica, 77, 453–487.

Horner, J. and W. Olszewski (2006): “The folk theorem for games with private almost-

perfect monitoring,” Econometrica, 74, 1499–1544.

Keller, H. B. (1968): Numerical Methods for Two-Point Boundary-Value Problems, Blais-

dell Publishing Co.

Kolb, A. (2019): “Strategic Real Options,” Journal of Economic Theory, forthcoming.

Kyle, A. S. (1985): “Continuous auctions and insider trading,” Econometrica: Journal of

the Econometric Society, 1315–1335.

Levin, J. (2003): “Relational incentive contracts,” American Economic Review, 93, 835–

857.

Liptser, r. S. and A. Shiryaev (1977): Statistics of Random Processes 1, 2, Springer-

Verlag, New York.

MacLeod, W. B. (2003): “Optimal contracting with subjective evaluation,” American

Economic Review, 93, 216–240.

Mailath, G. J. and S. Morris (2002): “Repeated games with almost-public monitoring,”

Journal of Economic Theory, 102, 189–228.

Marschak, J. (1955): “Elements for a theory of teams,” Management Science, 1, 127–137.

Marschak, J. and R. Radner (1972): Economic Theory of Teams, New Haven, CT:

Yale University Press.

Matthews, S. A. and L. J. Mirman (1983): “Equilibrium limit pricing: The effects of

private information and stochastic demand,” Econometrica, 981–996.

Mayhew, D. R. (1974): Congress: The electoral connection, vol. 26, Yale University Press.

Milgrom, P. R. and J. D. Roberts (1992): Economics, organization and management,

Prentice-Hall.

103

Morris, S. and H. S. Shin (2002): “Social value of public information,” American Eco-

nomic Review, 92, 1521–1534.

Radner, R. (1962): “Team decision problems,” The Annals of Mathematical Statistics, 33,

857–881.

Rantakari, H. (2008): “Governing Adaptation,” The Review of Economic Studies, 75,

1257–1285.

Royden, H. L. (1988): Real analysis, Macmillan.

Sannikov, Y. (2007): “Games with imperfectly observable actions in continuous time,”

Econometrica, 75, 1285–1329.

Simon, H. A. (1951): “A formal theory of the employment relationship,” Econometrica.

Spence, M. (1973): “Job Market Signaling,” The Quarterly Journal of Economics, 87,

355–374.

Teschl, G. (2012): Ordinary differential equations and dynamical systems, vol. 140, Amer-

ican Mathematical Society.

Williamson, O. E. (1996): The mechanisms of governance, Oxford University Press.

Yang, L. and H. Zhu (2018): “Back-running: Seeking and hiding fundamental information

in order flows,” Rotman School of Management Working Paper.

104

Signaling with Private Monitoring - web.mit.eduweb.mit.edu/gcistern/www/spm.pdf · Signaling with Private Monitoring Gonzalo Cisternas and Aaron Kolb August 10, 2019 Preliminary The

Documents