Misspecified Politics and the Recurrence of Populismpersonal.lse.ac.uk/YoungA/Misspecified Politics and... · London School of Economics, March 2020 We develop a model of political

Misspecified Politics and the Recurrence of Populism

Gilat Levy, Ronny Razin & Alwyn Young*

London School of Economics, March 2020

We develop a model of political competition between types that differ in their specification of the data generating process for a common outcome. We show that misspecified beliefs converge on a simpler view of the world which suffers from omitted variable bias. Periods in which those with a correctly specified and endogenously more complex model govern increase the specification error of the simpler world view, leading the latter to underrate the effectiveness of complex policies and overestimate the positive impact of a few extreme policy actions. Periods in which endogenously simple types implement their narrow world view result in subpar outcomes and a weakening of their omitted variable bias. Policy cycles arise, where each type's tenure in power sows the seeds of its eventual electoral defeat.

"Populism is Simple, Democracy is Complex." (R. Dahrendorf 2007)

________________ *We thank participants in conferences and seminars at the University of Pennsylvania, University of Chicago, Edinburgh University, LSE and Bocconi for helpful comments. This project received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 681579.

- 1 -

I. Introduction

Individuals differ not merely in their economic interests and preferences, but also

in their fundamental understanding of the data generating process that underlies observed

outcomes. Consequently, because they consider the same historical data through the

prism of different models, fully rational and otherwise similar actors can have persistent

differences of opinion, as witnessed by the endurance of academic debates in areas as

diverse as macroeconomics and physics. In politics, such differences in model

specification translate into differences in realized policy decisions when different groups

are in power. The consequent interplay between beliefs and policy can generate

systematic correlations between observed data that sustain differing beliefs and biases.

This paper considers political competition between types that share the same

interests and preferences over common outcomes but differ in their specification of the

causes of these outcomes. Because of the infinite number of potential regressors and

finite number of observations, all actors must start with some restriction on the set of

policies they consider relevant, i.e. may have non-zero effects on the common outcome.

With a minimum of policy variation, actors learn over time which policies in their initial

set are actually irrelevant and asymptotically beliefs coalesce around relevant policies

with non-zero effects. Consequently, if the beliefs of one type are misspecified, i.e.

exclude relevant determinants of policy outcomes, over time they become "simple"

relative to the "complex" views of those with an initially correctly specified model. For

example, while complex types may consider crime, income stagnation, inequality, and

housing costs as interrelated but largely separate, and best treated with a range of

policies, simple types come to view them as stemming from a single cause, e.g.

immigration.

- 2 -

Our principal finding is that in electoral competition complex types with correctly

specified beliefs are unable to permanently defeat and remove from power those with

misspecified simple beliefs. Periods in which complex types govern and implement their

broad policy agenda increase the omitted variable bias of the simple, as they attribute the

successful outcomes of the full range of complex policies to moderate actions taken on a

few dimensions. This increases the simple's assessment of the likely effectiveness of a

more decisive narrow policy and mobilizes them in support of political candidates who

will implement it. However, periods when the simple govern produce systematically

inferior results, as extreme actions are revealed to be less effective than anticipated. This

reduces the intensity of both their desired policy and political activism, thereby allowing

complex types to regain power. Thus, we find that the economy suffers from endless

political cycles characterized by a rising and falling intensity of beliefs, alternating

periods of policy moderation and extremism, and systematically better and worse

outcomes.

Our model may shed some light on the recurrence of political populism. The

amorphous concept of "populism" has perhaps as many definitions as authors.1 A

frequent theme, however, is that the policies of populist politicians are extreme,

misguided and harmful to the very groups that support them (e.g., Dornbusch and

Edwards 1991). Our framework provides a motivation for the recurrence of large policy

deviations with subpar outcomes that are supported by rational voters. Periods of

complex establishment rule increase the omitted variable bias of misspecified simple

beliefs, inducing a fully rational discounting of past failures and ensuring never-ending

1For reviews see Gidron and Bonikoeski (2013) and Mudde and Rovira Kaltwesser (2017).

- 3 -

political cycles as the actions taken by each type when in power assure their eventual

replacement by their opposition.

Our paper builds on a literature of political-economy models of sub-optimal

populist policies. Acemoglu et al (2013) model left-wing populist policies that are both

harmful to elites and not in the interests of the majority poor as arising from the need for

politicians to signal that they are not influenced by rich right-wing interests. Di Tella and

Rotemberg (2016) analyze populism in a behavioural model in which voters are betrayal

averse and may prefer incompetent leaders so as to minimize the chance of suffering

from betrayal. Guiso et al (2017) define a populist party as one that champions short-

term redistributive policies while discounting claims regarding long-term costs as

representing elite interests. Bernhardt et al (2019) show how office seeking-demagogues

who cater to voters' short term desires compete successfully with far-sighted

representatives who guard the long-run interests of voters. Morelli et al (2020) show how

in a world with information costs incompetent politicians who simplistically commit to

fixed policies can be successful. Our framework expands this literature by linking the

pursuit of sub-optimal policy to the bias created by a misspecified interpretation of the

outcomes of periods of optimal rule.

Interest in learning with misspecified models dates back at least to Arrow &

Green (1973), with examples including Bray (1982), Nyarko (1991), Esponda (2008) and,

most recently, Esponda and Pouzo (2016) and Molavi (2019). Several recent papers

feature interactions between competing belief structures that share features of our

framework. Mailath and Samuelson (2019), with market analysts' evaluations of stocks

as an example, also argue that agents impose different a priori simplifications on the data

generating process and examine the convergence of beliefs with information exchange

- 4 -

based upon a given set of data and recursively announced beliefs. Eliaz and Spiegler

(2019) present a static model of political competition based upon competing narratives

that draw voters attention to different causal variables and mechanisms, with victory

going to the narrative that can present the most positive anticipatory outcomes, while

remaining consistent with steady state observations. Montiel Olea et al (2017), with

auctions as a motivation, consider competition between agents that use simple or complex

models to explain a given set of exogenous data and find that simpler agents have greater

confidence in their estimates in smaller data sets and less confidence asymptotically. In

our framework the endogenous data produced by actors with different specifications

generates persistent biases and differences in beliefs that asymptotically keep both types

politically competitive. More technically, convergence in misspecified models is not

guaranteed, and is especially problematic with multidimensional state spaces (Heidhues,

Kőszegi & Strack 2018, Bohren & Hauser 2019, Esponda, Pouzo & Yamamoto 2019, and

Frick, Iijima & Ishii 2019). Our paper provides an example of how convergence can be

proven in a model with multiple agents, a multidimensional state space and continuous

actions.

The paper proceeds as follows: Section II begins with suggestive evidence on

simplicity, showing that the "populist" candidates of the 2016 US presidential election

cycle, Donald Trump and Bernie Sanders, both presented their arguments using fewer

distinct words than their competitors. Sections III presents our basic framework, wherein

voters differ in their beliefs regarding the possible determinants of common outcomes,

with the probability of voting for one candidate over another an increasing function of the

difference in the expected payoffs of their proposed policies. Section IV examines the

static equilibrium within a single time period, showing that voters with more extreme

- 5 -

parameter estimates perceive a greater gain to implementing their optimal policies and

are mobilized in greater numbers to vote for their preferred type. Section V uses a

simplified two policy example to illustrate the full dynamic equilibrium of competition

between simple and complex types, with fully rational updating subject to the core

specification of beliefs. Policy cycles arise, with simple types recurrently assuming

power and implementing narrow, extreme and systematically less effective policies.

Section VI generalizes the framework, considering political competition between

correctly and incorrectly specified models, with the latter only characterized by the fact

that they exclude some relevant determinants of outcomes. Competition between

"simple" and "complex" types emerges as an equilibrium outcome, rather than as a

primitive of the model, as the actions of those with the incorrectly specified model

converge onto a subset of the actions of types with a correctly specified world view.

Finally, Section VII shows that in our framework incumbents are rewarded and punished

for random outcomes, as the relative belief of each type in the efficacy of their policies

moves with random shocks. Good luck during periods of complex rule lengthens the

populism cycle, weakening the relative beliefs of the simple and keeping the complex in

power longer, while random bad shocks hasten the transition to a period of populist

misrule. Section IX concludes, while an on-line appendix provides the more technically

difficult proofs of convergence to the equilibria discussed in the paper.

- 6 -

II. Populism as a Simplified World View: Some Suggestive Evidence

In this section we present suggestive evidence that populism involves a simpler

world view based on the transcripts of the Democratic and Republican primary debates of

the 2016 US Presidential election cycle. Our focus is on the ratio of unique words to total

words used by a candidate in a given debate. For most individuals this ratio falls with the

total number of words spoken in a given setting. In a single spoken sentence each word

is usually unique and the ratio is close to one; but in prolonged discourse words tend to be

repeated and the ratio falls. Our argument is that individuals who believe multiple

outcomes have a common cause will, in response to the questions posed in a given

debate, return to the same words more frequently, producing a lower ratio of unique to

total words for a given amount of speech than their competitors. We recognize that this

metric lacks the sophistication and nuance of a thematic textual analysis, but its

advantage is that it is completely objective, transparent and cannot be coloured by the

preconceptions and biases of the researcher examining the data.

Our data come from the transcripts of The American Presidency Project2.

Transcription introduces a number of errors and ambiguities, as "Trump's" might be

erroneously written as "Trumps" or simply "Trump". We remove punctuation from the

transcript, as well as all letters after apostrophes and change words that differ from other

words by only a final s to the root word (e.g. "Americans" to "American"). 665 stop

words which are filtered out by computer text searches, e.g. "ah", "almost", "alone", are

also removed, as are all single character words (e.g. "I", "a") and numerics, which often

appear in multiple forms (e.g. 10K, 10,000 and 10000). Our unit of analysis is the total

words spoken by a candidate in a debate, and we have 141 candidate x debate

2https://www.presidency.ucsb.edu/documents/app-categories/elections-and-transitions/debates.

- 7 -

observations in 29 debates. The debate participants and the number of debates in which

they appear are, Republican: Bush (9), Carson (10), Christie (8), Cruz (12), Fiorina (8),

Gilmore (2), Huckabee (7), Jindal (4), Kasich (12), Pataki (4), Paul (6), Perry (2), Rubio

(12), Santorum (7), Trump (11), and Walker (2); and Democratic: Chafee (1), Clinton

(10), O'Malley (5), Sanders (10), Webb (1). Our focus is on Trump and Sanders, who are

generally considered the populist candidates of the 2016 US Presidential election.

Table I below reports the regression results. In panel (a) we regress ln unique

words/total words on ln total words, dummies for Trump and Sanders, and a constant,

dividing the sample into words greater than or equal to 2, 4, 6 or 8 characters, with the

last category accounting for .16 of total words spoken. For a given number of 2 character

or greater words spoken, Trump and Sanders use 15% and 7% fewer unique words,

respectively, than their political competitors. When one focuses on more complex words

of 8 characters or more, both use 7% fewer words than other debate participants. Panel

(b) introduces a Democratic debate dummy and panel (c) debate dummies, although the

latter are not statistically significant. These slightly reduce the effects for Trump and

increase them for Sanders. As shown in Figure I, this is largely a consequence of

comparison against Democratic candidate Clinton, who used a large number of unique

words. This can be seen as a unique characteristic of Clinton or the fact that Clinton was

the ultimate complex/establishment candidate, depending upon one's priors. However, as

shown in panel (a), the Sanders results exist without the dummies when identification

comes off of comparison with all observations, most of which are Republican. We see

these results as suggestive of the simple vs complex competition described in the model

below.

- 8 -

Table I: Ln(unique words/total words) as a Function of Ln(words) (N = 141 candidate x debate observations)

≥ 2 characters ≥ 4 characters ≥ 6 characters ≥ 8 characters

share of all words 1.00 0.77 0.42 0.16

(a) Sanders and Trump dummies

ln(words)

Trump

Sanders

constant

R2

-.214 (.013)

-.153 (.021)

-.073 (.025)

.857 (.085)

.79

-.210 (.012)

-.155 (.021)

-.079 (.025)

.825 (.083)

.79

-.182 (.014)

-.130 (.025)

-.094 (.029)

.598 (.089)

.68

-.151 (.018)

-.074 (.031)

-.069 (.037)

.360 (.095)

.47

(b) Democratic debate dummy added

ln(words)

Trump

Sanders

Democratic dummy

constant

R2

-.236 (.015)

-.141 (.021)

-.099 (.026)

.054 (.021)

.943 (.090)

.79

-.233 (.015)

-.142 (.021)

-.107 (.026)

.056 (.021)

.911 (.087)

.80

-.207 (.017)

-.119 (.025)

-.124 (.031)

.062 (.024)

.680 (.093)

.69

-.179 (.021)

-.065 (.031)

-.103 (.039)

.071 (.031)

.427 (.098)

.48

(c) Debate dummies

ln(words)

Trump

Sanders

Debate dummies

R2

-.276 (.023)

-.143 (.021)

-.091 (.026)

yes (p = .084)

.84

-.271 (.023)

-.144 (.021)

-.098 (.026)

yes (p = .071)

.85

-.261 (.026)

-.127 (.025)

-.114 (.031)

yes (p = .141)

.76

-.226 (.032)

-.084 (.031)

-.096 (.039)

yes (p = .145)

.60 Notes: p = p-value on the test of the significance of the debate dummies; share of all words = share of words with 2 or more characters.

5 .5 6 6 .5 7 7 .5 8

ln total words

-1 .1

-1

-0 .9

-0 .8

-0 .7

-0 .6

-0 .5

-0 .4

-0 .3

-0 .2

ln (

un

iqu

e w

ord

s/t

ota

l w

ord

s)

Others Trump Sanders Clinton

Words with 2 or more characters

5 5 .5 6 6 .5 7 7 .5

ln total words

-1

-0 .9

-0 .8

-0 .7

-0 .6

-0 .5

-0 .4

-0 .3

-0 .2

ln (

un

iqu

e w

ord

s/t

ota

l w

ord

s)

Others Trump Sanders Clinton

Words with 6 or more characters

Figure I: Ln Unique Words/Total Words as a Function of Total Words

- 9 -

III. A Modelling Framework

We model a polity in which citizens receive utility from a set of common policy-

influenced outcomes, but have fundamental disagreements over the causal determinants

of some of those outcomes. Specifically, we consider the case where there is a common

outcome y whose realization at time t is governed by the data generating process

tttty ε+′+= βnx )()III.1( ,

where xt and nt are vectors of k desired policy actions and policy noise, β the vector of

policy effects parameters, some of which may be zero, and εt a mean zero iid normally

distributed random shock.3 Although y is described as a single outcome, one can equally

think of it as a preference weighted average of multiple outcomes that are influenced by

x.4 The components of noise n are iid with zero mean and diagonal covariance matrix

kn I2σ , and are independent of both desired policy x and the shock to outcomes ε. These

can be thought of as miniscule bureaucratic errors or short hand for the experimentation

that might arise with forward looking politicians, and their main function is to eliminate

fragile multiple equilibria, as discussed further below.

Outside of y, we assume there are other common outcomes over which there is no

fundamental disagreement about causal mechanisms and the role of policy, and the utility

citizens derive from all common outcomes is given by

),()III.2( ttt RVyU +=

where Rt = xtʹxt represents the resources used in implementing policy x for y and is

expressed in a fashion that allows us not to worry about the signs of the elements of β or

x, while V represents the utility derived from policy outcomes over which there is no

3We follow standard matrix algebra notation throughout, using regular typeface to denote scalars,

and lower and upper case letters in bold typeface to denote column vectors and matrices, respectively.

4Thus, if utility is a weighted average of i components each with yit = xtʹβi + εit, then the outcome, parameters and error term in III.1 are simply the weighted average of those components.

- 10 -

disagreement regarding causal mechanisms. V is a reduced form, representing the utility

that can be achieved in other policy areas given the allocation of resources to y, and the

assumptions 0/ <∂∂ RV and 0/ 22 <∂∂ RV are natural. To derive analytical results below

we will work with a second-order approximation of V as a quadratic function of Rt, and in

the appendix we assume, quite reasonably, that there is a finite upper bound R on the

resources available for policy as this simplifies many aspects of the proofs.

Citizens in our polity are divided into two "types" based upon their prior beliefs

about the unknown policy effects parameters β. Faced with a multiplicity of possible

regressors, each type excludes some policies on a priori grounds as irrelevant, i.e. having

zero policy effects. While one type has a correctly specified model, in that the policies

they think are relevant includes all non-zero elements of β, the beliefs of the other type

are misspecified, in that the policies they think are relevant exclude some of the non-zero

elements of β. We use the subscript i to distinguish between the full k x 1 vectors of

desired policies, noise and parameters (x, n and β) and the ki sub-elements of these type i

believes are relevant (xi, ni and βi). Similarly, while Ht = Xt + Nt denotes the full t x k

history of desired actions and noise, Hit = Xit + Nit is made up of the ki columns of that

history deemed relevant by type i. The union of the sets of ki policies deemed relevant by

each type equals k, the total set of systematically implemented policies.

Prior beliefs for each type across the policies they believe are relevant are

normally distributed with mean 0iβ and joint covariance matrix σi02Vi0

-1, while the prior

probability density function on σi0 is inverted gamma. Following the observation of the

t x 1 history of outcomes yt, such beliefs give rise to mean posterior beliefs5

5This is a standard OLS Bayesian result (Zellner 1971). Our model is somewhat different than the

standard framework in that the regressors are determined by past realizations of the error term. However, since the current disturbance εt is independent of the current regressors hit, provided some initial prior exists the period by period recursive application of the updating formula (III.3) aggregates across t periods to the result given above.

- 11 -

).()()3.III( 001

0 titiiititiit yHβVHHVβ ′+′+= −

However, since one can easily define a finite "pre-history" of policy Hi0 and outcomes y0,

such that 000 iii HHV ′= and 001

000 )( yHHHβ iiii′′= − , and our results will be asymptotic, we

simplify our algebra by including these pre-histories in Hit and yt and simply writing

beliefs as

.)()III.4( 1titititit yHHHβ ′′= −

Without noise, the model described above intrinsically allows for multiple

equilibria. As beliefs converge, the variation in actions declines and the regressors

become colinear and "asymptotically uncooperative" (Schmidt 1976). This loss of

information through colinearity undermines proofs of convergence to a point. While

beliefs asymptotically typically satisfy some conditions, such as that mean expected

effects equal the mean impact of policy, as in βxβx ′=′ii , there are still a continuum of

equilibria that meet these minimal requirements and it is not possible to prove that the

economy does not forever move along that continuum, as no point has preference over

another. The smallest amount of random noise, however, easily eliminates this

indeterminacy.6 Not wishing to carry the reader through a host of technical and yet

fragile results, we resort to the simple assumption that implemented policy is composed

of desired policy x and the vector of mean-zero random noise n.

Political competition in our polity appears in the form of citizen candidates, one

for each type, who if successfully elected implement policies which are myopically

optimal given their belief type. Voting is costly, but citizens vote because they believe

that with some probability p their vote will be pivotal. Consequently, voters of type i will

6Technically speaking, the problem is that absent noise Hi′Hi/t is not guaranteed to be asymptotically

positive definite, a baseline assumption used in typical proofs of convergence. While the covariance matrix is positive definite for all t and the eigenvalues are all weakly increasing through time, when the regressors become collinear some of the eigenvalues become o(t), which is why beliefs end up on a line. With noise, all of the eigenvalues are assured to be Ω(t) (bounded above 0 when divided by t).

- 12 -

be motivated to vote for the citizen candidate of their type if the expected gain from type i

policies relative to those of belief type j exceeds the cost of voting, i.e.:

)],()([ where, )III.5( j

t

i

tiii UUEIcpI xx −=>

where Ei denotes the expectation based upon the beliefs of i and i

tx the preferred policies

of type i. Ii is the intensity of the voting preferences of type i and does not necessarily

equal -Ij as beliefs differ across the two groups. With a distribution across citizens of

costs c divided by pivotality probability p, the vote share candidates of each type garners

will be an increasing function of the intensity of their type. We assume this distribution

is the same for both types, and that both types are equally numerous. Consequently, the

election is won by the candidate representing the type with the greatest voting preference

intensity. The results below can be generalized to allow for unequal group sizes by

noting that this simply implies the smaller group will require a certain margin of voting

preference intensity to motivate its base enough to win an election.

IV. Optimal Policies in a Single Period

In this section we examine ideal policies and the determinants of intensity within

a single period. To simplify notation, we drop subscripted references to time. Our

principal result is:

(R1) The voting intensity of type i, i.e. their expected gain from implementing their ideal policies instead of those of the opposing type j, is increasing in iiββ′ . Consequently, the party with the highest iiββ′ wins the election and implements their ideal policies.

To build intuition we first consider the optimal choice of x given the resources R

allocated to y, then determine the optimal allocation of resources between areas of policy

agreement and disagreement, and finally show that voting intensity is increasing in iiββ′ .

- 13 -

Given a resource allocation R to outcome y, the optimal policy for any belief type

is determined by:

],-[ ]-[ ][max)IV.1( xxβxxxx

′+′=′+ RRyE λλ

where, as noted before, the expectation is taken across the prior and we use β to denote

the mean values of beliefs. Substituting the first order condition

λ2

)IV.2(β

x =

into the resource constraint

,2

1 use)later (for or

2

1

)2()IV.3(

2R

RR

ββ

ββ

ββxx

′=

′=→

′=′= λ

λλ

which allows us to solve for the optimal x as a function of R and β

,),()IV.4(ββ

ββx′

= RR

while the expected outcome given such policies x and beliefs β is given by:

.']),,([)IV.5( RRy βββxββx ′==

For a given level of resource use R, types which have more extreme parameter estimates,

as measured by ββ′ , believe they know how to pursue more effective policies, as

measured by βx′ in (IV.5), and consequently feel more constrained by the resource

limitation R, as measured by λ in (IV.3).

The gain in expected utility from pursuing an optimal policy x versus an

alternative policy x + δ that satisfies the same resource constraint is given by:

βδβδxβxβδxβx ′−=′+−′=+− )(],[],[)IV.6( yy

Substituting using (IV.4) and the fact that - δʹx = ½δʹδ , as both xʹx and (x+δ)ʹ(x+δ) equal

R, we see that:

2],[],[)IV.7(

δδββxββ

δβδxβx′′

=′′−=+−

RRyy

- 14 -

Individuals with more extreme parameter estimates feel the resource constraint more

keenly and hence lose more from a sub-optimal movement δ away from their constrained

choice. Their gain from moving from an expenditure level R1 to a higher level R2 (while

pursuing optimal policies in both instances) is also greater, as

)(]),,([]),,([)IV.8( 1212 RRRyRy −′=− ββββxββx .

Having chosen x optimally given R and β , the next step in the citizen's decision

involves the optimal allocation of resources between the areas of policy agreement and

disagreement:

),( ]),,([max)IV.9( RVRyR

+ββx

the first order condition for which is

.2

1)()()10.IV(

RRRV

ββ′−=−=′ λ

An increase in ββ′ raises the marginal utility of expenditure on y, raising the equilibrium

allocation of resources R to that sector. However, the optimal ratio of R/ββ′ rises with

ββ′ as the rising marginal utility of the diminished resources allocated to the area of

policy agreement requires an equal rise in the marginal utility of expenditure in the area

of policy disagreement.

We now determine the relative intensity of two types, i and j, when faced with a

choice between candidates proposing their respectively optimal policies. For the sake of

concrete exposition, we assume that jjii ββββ ′>′ and consequently Ri > Rj and

jjjiii RR // ββββ ′>′ . The voting intensity of type i is given by:

.)()(]),,([]),,([]),,([]),,([

)(]),,([)(]),,([)11.IV(

000

44 344 2144444 344444 2144444 344444 21<>>

−+−+−

=−−+=

iii C

ji

B

iijiii

A

ijjiij

jijjiiiii

RVRVRxyRxyRxyRxy

RVRxyRVRxyI

ββββββββ

ββββ

- 15 -

The positive first term, Ai, is the gain in expected y at expenditure levels Rj enjoyed by

moving from the sub-optimal policies of type j to the optimal policies of type i; the

positive second term, Bi, is the gain in expected y enjoyed by moving from the optimal

expenditure Rj of type j to the greater desired expenditure Ri of type i; and the negative

third term, Ci, is the reduction in utility in areas of policy agreement brought about the

increased allocation of resources to y by type i. In a similar vein, the voting preference

intensity of type j is given by:

,)()(]),,([]),,([]),,([),,([

)(]),,([)(]),,([)12.IV(

000

44 344 21444444 3444444 2144444 344444 21><>

−+−+−

=−−+=

jjj C

ij

B

jjijjj

A

jiijji

ijiijjjjj

RVRVRxyRxyRxyRxy

RVRxyRVRxyI

ββββββββ

ββββ

with each term having an interpretation similar to the case for i except that while Aj is

similarly positive, Bj is negative and Cj positive as Ri > Rj.

Let δ denote the difference in the optimal x policies of the two types. Using

(IV.7) earlier above, we see that:

).& (as 2

,2

)13.IV( jijjiiji

i

jj

j

j

iii RRAA

RA

RA >′>′>→

′′=

′′= ββββ

δδββδδββ

As type i believe they can implement more effective policies and are more strictly

constrained in their allocation of resources to y, the gains they expect in moving from

sub-optimal to optimal policies at the low expenditure levels of type j exceeds the gain

type j envision from a similar movement at the generous expenditure levels of type i.

Turning to the comparison of Bi + Ci and Bj + Cj, these terms represent the net gain to

each type from moving to their optimal resource allocation in the area of policy

disagreement, given an optimal choice of x at each expenditure level. Substituting using

(IV.8) and (IV.10):

- 16 -

,2/)))(()(()()(and

,0))((2)(

,0))((2)()IV.14(

jijijiji

jijjijjjj

jiiijiiii

RRRVRVRVRVCC

RRRRVRRB

RRRRVRRB

−′+′=−=−=

<−′−=−′=

>−′−=−′=

ββ

ββ

where the last line follows from the second order (quadratic) approximation of V as a

function of R. Consequently, Bi + Ci > Bj + Cj as:

.0)2))(()(()IV.15( >−+′−′=−−+ jijiijjjii RRRRRVRVCBCB

To summarize, types with more extreme beliefs, as measured by ββ′ , believe they

can pursue more effective policies, feel more constrained by the limitations of resources,

and experience a greater gain in the flow of utility in the area of policy disagreement y by

moving a given distance δ toward their optimal policy on x and a greater net utility gain

from transferring a given quantity of resources from areas of policy agreement into the y

sector. Consequently, whichever type has more extreme beliefs views the election as

more consequential and is mobilized in greater numbers to vote for their candidate. With

equal demographic shares, they win the election and their citizen candidate implements

their preferred policies.

V. Asymptotic Equilibrium with Two Potential Policies

In this section we solve the asymptotic dynamic equilibrium for a simple example,

a world in which there are two potential policies, 1 and 2, both of which are relevant, i.e.

β1 ≠ 0 and β2 ≠ 0. Those with misspecified beliefs, the "simple", only believe in the

efficacy of policy 1, while those with correctly specified beliefs, the "complex",

recognize that both policies might have effects. This example illustrates the mechanisms

of the model and the convergence to the steady state in an intuitive fashion. The

following section derives results in greater generality. Our central results are

- 17 -

(R2) For sufficiently small noise, the asymptotic equilibrium involves policy cycles, as, with the exception of equilibrium paths of probability measure zero, asymptotically autarchic rule by a single type is not possible. The larger the ratio |/| 12 ββ , the smaller is the asymptotic share of time the simple are in power, but the more biased are their asymptotic beliefs and the more ineffective their policies.

So as to not interrupt the flow of the argument, all proofs of probability limits stated

below are given in the on-line appendix. We use hk = xk + nk to denote the t x 1 history

of desired policy and noise for policy k. We will find it useful to separate out histories

that only include the periods when each type is in power, with, for example, hki denoting

the ti rows of hk associated with the periods when type i is in power, with ts + tc = t.

The complex have a correctly specified model of the world and as long as the

regressors asymptotically have any independent variation at all will converge on the true

parameters with

ββp

c =)V.1( ,

where we use the notation p

= to signify "converges in probability". A negligible amount

of random noise is enough to ensure this result. In the probability limit, the complex then

implement steady state policies

, ,)V.2( 2211ββββ

ββββ

′=

′= ′′ R

xR

xp

cp

c ββ

where Rβ′β is the optimal allocation of resources to y given intensity ββ′ . If simple

beliefs converge on the true value for policy 1, they will have lower voting intensity as

22

21

21 βββ +< . However, there exists a level of bias

2

1

22

21*)V.3(

βββτ +=

such that (τ*β1)2 2

22

1 ββ += and the simple and the complex share the same voting

intensity. We shall show that simple beliefs converge on this level of bias in an

equilibrium with policy cycles where both types alternate in power.

- 18 -

The simple's mean beliefs concerning the effects of policy 1 are given by the

coefficient estimate in the misspecified regression that only includes policy 1 as a

regressor

11

1

11

2121

11

22111

11

1 )()V.4(

hh

εh

hh

hh

hh

εhhh

hh

yhβ

′′

+′′

+=′

++′=

′′

= ββββs .

As the mean zero shocks ε are independent of policy, it is easy to see that

,0/

/ )V.5(

11

1

11

1p

t

t =′′

=′′

hh

εh

hh

εh

as the numerator averages to zero while the denominator averages to a number strictly

greater than 0. Let θs = ts/t denote the share of time up to time t the simple have been in

power. If the limit (not probability limit) of θs = 1, i.e. asymptotically the simple are

always in power, then

1) lim (if 0/

/)()( )V.6(

11

2211

11

21 ==′

+′+=′′

∞→ st

p

θt

t

hh

xnxn

hh

hh,

as the simple do not implement any policy 2, while the noise shocks for policies 1 and 2

are independent and their product averages to zero. Conversely, if asymptotically the

complex are always in power the cross product of policies 1 and 2 is non-zero and hence

0) lim if()/(

)/(

/)()(

/)()(

/

/)V.7(

221

21

1111

2211

11

21 =+′′

=+′++′+=

′′

∞→′

′s

tn

p

θR

R

t

t

t

t

σβββ

ββ

ββ

xnxn

xnxn

hh

hh

ββ

ββ .

The results given above allow us to establish two important contradictions.

Combining (V.4) - (V.7), we see that the probability limit of simple beliefs in the two

extreme cases can be expressed as

- 19 -

0).lim (if

1

1

1 and

1)lim (if 1 with ,)V.8(

22*

22*

221

22

0

11

=+

+

=+

′

′+=

===

∞→

′

′

′

′

∞→

st

n

n

n

st

p

s

R

R

R

R

s

θστ

στ

σβ

βτ

θτβτθ

ββ

ββ

ββ

ββ

ββ

ββ

β

where sθτ denotes the degree of bias and ββ′Rn /2σ is the ratio of information revealed by

noise relative to that revealed by policy. If asymptotically the share of time the simple

are in power goes to 1, their beliefs converge on the true parameter value as their

estimating equation is asymptotically no longer misspecified. But in this case, with a

probability approaching one their intensity must be strictly less than that of the complex,

thereby ensuring that, outside of equilibrium paths whose probability is measure zero,

asymptotically they cannot always be in power. 7 Conversely, if asymptotically the share

of time the complex are in power goes to 1, then the beliefs of the simple suffer an

omitted variable bias that loads up all of the positive effects of complex actions on policy

2 onto the simple's coefficient estimate for the effects of 1. For small *2 /1/ τσ <′ββRn , bias

τ0 is necessarily greater than the level τ* needed for equal intensity. So, in this case, again

outside of equilibrium paths whose likelihood is measure zero, the complex cannot

asymptotically always be in power, establishing another contradiction. It follows that for

small enough noise, outside of paths of probability measure zero, we can rule out the

possibility that the limit of ts/t is 0 or 1.

7For the limit of θs to equal 1, the share of any fixed time interval that simple intensity is greater than

or equal to that of the complex must asymptotically go to 1. However, since the plim of simple intensity is less than that of the complex, the probability that simple intensity is greater than that of the complex in any period must go to zero. Consequently, the probability measure of paths with the characteristic that "the share of any fixed time interval simple intensity is greater than or equal to complex intensity goes to 1" must be zero. These equilibrium paths, and other such mentioned below, are those along which an increasingly unlikely sequence of shocks ε and n keep beliefs from converging on the level implied by the parameters and (in the case of omitted variables) regressors.

- 20 -

We now divide the history of the regressors into that observed during periods

when each type is in power and re-express simple beliefs (V.4) as:

11

1

11

2121

11

11

11

2121

11

11 1)V.9(hh

εh

hh

hh

hh

hh

hh

hh

hh

hhβ

′′

+

′′

+

′′

−+

′′

+

′′

=cc

ccss

ss

ssss

s ββββ .

The probability limit of the last term is, as noted before, zero. Since the limit of ts/t is

neither 0 or 1, asymptotically each type must be in power an infinite number of times, so

we can state

,/

/ &

/

/)V.10( 10

11

2121111

11

2121 βτββββτββ

p

ccc

cccp

sss

sss

t

t

t

t =′′

+==′′

+hh

hh

hh

hh

that is, in the limit the terms in brackets [] in (V.9) are the beliefs that would arise if

asymptotically the simple were always or never in power. It follows that

. where,0)1()V.11(11

11101

hh

hhβ

′′

==−−− ssp

s ηβτηηβ

The limit of the probability sβ deviates by more than a negligible epsilon from β1(η+(1-

η)τ0) is zero. η is the fraction of the squared history of policy 1 that occurred under the

simple's watch, or equivalently the fraction of the information regarding the effectiveness

of that policy revealed when only policy 1 is actively pursued.

There exists a scalar η* such that

11

)]/(1[

1)1()V.12(

*

2**

0

*0**

0** <

+−

=−

−=→=−+ ′

τσττ

τττηττηη ββRn .

Asymptotically, with a probability approaching 1, if along a path η > η* simple intensity

will be less than that of the complex, while if η < η* simple intensity will be greater than

that of the complex. As can by seen in (V.11), η is monotonically decreasing when the

complex are in power, since the denominator increases period by period while the

numerator remains constant, and is monotonically increasing when the simple are in

power, as the numerator and denominator are increased by the same amount in each

- 21 -

η* 0 1

bias is high, the simple are in power, η rises and bias falls

bias is low, the complex are in power, η falls and bias rises

Figure II: Asymptotic Phase Diagram

period. Moreover, as proven in the on-line appendix, the changes in η from one period to

another, Δη, get smaller and smaller as the numerator and denominator grow larger

0)V.13(p

=∆η .

The preceding results imply that η converges to η*, and the bias of simple beliefs

converges to the bias τ* consistent with their having the same voting intensity as the

complex. The convergence to this steady state is illustrated in Figure II, which is the

asymptotic phase diagram associated with (V.11) above. Asymptotically, when η, the

fraction of the information concerning policy 1 revealed when the simple are in power, is

less than η*, in all but a probability measure zero of equilibria the bias of the simple is

greater than τ*, the simple are in power and η grows and their bias falls as they find that

their policies are less effective than they thought. When η is greater than η*, the bias of

the simple is less than τ*, the complex are in power, η falls and and the bias and voting

intensity of the simple grows, as they load up the successful effects of policy 2 onto

policy 1. Over time the movements in η get smaller and smaller, but cannot converge to

any point other than η*, for were they to do so, the limit of the share of time the simple

are in power would be 0 or 1, which we have already established is not possible.

One endogenous variable of interest is the share of time the simple asymptotically

are in power. Manipulating the definition of η in (V.11) we have:

- 22 -

.)1(1

)V.14(11*11*

11*

1111

11

*

c

cc

s

ss

c

cc

s

c

ccs

s

sss

s

ss

s

tt

t

t

t

tt

t

tt

t

t

t

t

hhhh

hh

hhhh

hh

′+

′−

′

=→′

−+′

′

=ηη

ηη

As simple beliefs converge on τ*β1, their policies converge on

,)V.15( 1*

1ββ

ββ

′= ′R

xp

s βτ

while asymptotically complex policies are given by (V.2), so we see that

,1

1

)1(

)V.16(

*

2*

2

21*2

21

2*

*

2

21*

+

−

=→

+

′+

+

′−

+

′=

′

′′

′

τ

στθ

σβ

ησβτ

η

σβ

η

ββ

ββββ

ββ

ββββ

ββ

R

RR

R

t

t

n

p

s

nn

np

s

where we substitute for η* using (V.12) and note that the odds ratio θc/θs → τ* as 2nσ → 0.

(V.16) shows that the asymptotic fraction of the time the simple are in power is a

decreasing function of τ* and ββ′Rn /2σ . τ* is increasing in |β2/β1|, the relative efficacy of

policy 2. The more effective are the policies that the simple believe irrelevant, the higher

is the asymptotic bias needed for the simple to compete electorally with the complex. To

sustain this bias, the simple spend less time in power, loading up more bias from the

beneficial actions of complex policy in area 2. When there is more background noise

relative to the implementation of policy, the simple also spend less time in power, as the

information given by random variation systematically drives their beliefs towards the

truth, requiring a longer period of complex rule to arrive at the τ* level of bias.

Asymptotically, when the complex are in power the average efficacy of their

policies is

- 23 -

ββββββ

ββ

ββββ ′=′

+′

=+= ′′′

RRR

xxycc

c

22

212211)17.V( ββββ ,

while periods when the simple are in power yield the inferior average outcomes

ββββ

ββ

ββ ′=′

== ′′

RR

xys

s *

21

*11

1)18.V(

τβτβ .

However, the outcomes simple voters expect under complex and simple rule are

.)|(

,1

)|()19.V(

21

2*11

*

21

*11

s

s

sss

c

c

ssc

yRR

xyE

yRR

xyE

>′=′

==

<′=′

==

′′

′′

ββββ

β

ββββ

β

ββ

ββ

ββ

ββ

βτβ

τβτβ

Simple voters are systematically disappointed by the outcomes of the extreme policies

implemented when their populist politicians are in power. This leads to a gradual

diminution of beliefs and consequent moderation of policy, until those with more

complex views once again take power. However, the surprising success of policy under

the complex gradually convinces simple voters of the value of implementing more

extreme and focused policies, increasing their probability of voting in favour of populist

politicians who advocate narrow and extreme solutions to complex problems.

VI. Political Competition between Correctly & Incorrectly Specified

Models

In this section we consider the equilibrium of generalized political competition

between correctly and incorrectly specified models within the framework described

earlier above. Specifically, we consider an environment in which there are k potential

policies, some of which are relevant and have non-zero effects, and some of which are

irrelevant and have zero effects. While the beliefs of "complex" types are correctly

specified, in that they include all relevant policies, "simple" types erroneously exclude a

subset of these. The prior beliefs of both types may include some irrelevant policies that

- 24 -

have zero effects, and we impose no a priori restriction on the relative number of policies,

ks and kc, each type believes may be relevant, other than that their union covers the set of

k policies that are systematically implemented. The monikers "complex" and "simple"

derive from the fact that the endogenous asymptotic equilibrium looks much like that of

the simple 1 versus 2 policy example given above. Our results can be summarized as:

(R3a) The beliefs of both types regarding policies that are actually irrelevant converge on 0. Consequently, the non-zero beliefs of those with the misspecified model become "simple" relative to the "complex" views of those with the correctly specified model. While the beliefs of the complex converge on true parameter values, the beliefs of the simple converge on a multiple of the true parameter values. Asymptotically the simple implement a narrowed, exaggerated and less effective version of complex policies.

(R3b) All other results mirror the 1 vs 2 policy example given earlier. For sufficiently small noise, the asymptotic equilibrium involves policy cycles, as, with the exception of equilibrium paths of probability measure zero, asymptotically autarchic rule by a single type is not possible. The larger the effects of policies the simple mistakenly exclude relative to those they include, the smaller is the asymptotic share of time the simple are in power, but the more biased are their asymptotic beliefs and the more ineffective, compared to the complex, their policies when in power.

To simplify the presentation and focus on intuition, we assume a steady state

exists and then derive the restrictions on the equilibrium this imposes. Proof that the

polity actually converges to that steady state, while a contribution to the literature on

models with misspecified beliefs in its own right, is technically involved and hence

relegated to the on-line appendix. To review our notation, we use H = X + N to denote

the t x k history of desired policy and iid noise that may affect the common outcome y

through the parameters β. Each type believes that only a subset ki of these policies are

relevant and hence only use the associated t x ki columns Hi of H in the regression model

which determines their mean beliefs iβ . The true parameters associated with these

policies are denoted by βi. When necessary to differentiate which type is in power, we

- 25 -

add a second subscript. Thus, steady state policies deemed potentially relevant by type i

when j is in power are indicated by xij. We use j•x to denote the full vector of k desired

policy values when type j is in power, including 0s in policies j believes are irrelevant. θi

= ti/t is the share of time type i has been in power up to time t, Im and 0mn are the square

identity matrix and rectangular matrix of zeros of the subscripted dimensions, and p

=

denotes the probability limit. We assume the share of time each type is in power θi ,

beliefs iβ , and policies i•x all have well defined probability limits, proof of convergence

to these limits, and all other plims noted below, being given in the on-line appendix.

We begin by establishing some restrictions on the mean beliefs of type i

,],[

)()1.VI(

)(22

1

β0IXX

βIXX

βNNXNNXXX

βNNXNNXXX

βHH

βHH

εHHβHβHHyHHHβ

+′

=

+′

→

′+

′+

′+

′=

′+

′+

′+

′→

′=

′→′+′=′→′′=

−

−

iiii kkxkkni

p

iknii

iiiip

iiiiiiiii

ip

iii

iiiiiiiii

tt

tttttttt

tt

σσ

where the probability limit in the first line follows from the fact that the average of the

product of the components of policy Hi and the random shock to y, ε, is zero, while the

third line follows from the fact that the noise elements of policy N are iid and

independent from the desired elements of policy X. Assuming convergence to steady

state policies and shares of time in power, we can re-state (VI.1) as

- 26 -

[ ] [ ]

][][][)2.VI(

][][][)2.VI(

],[][][

),( )2.VI(

2

2

2

)(22

ccn

p

PEPE

ccccccccc

PEPE

ccsccscss

ssn

p

PEPE

cccsscscc

PEPE

sssssssss

iin

p

PEPE

jjjiijijj

PEPE

iiiiiiiii

kkxkknjijjiiii

p

iknijijjiiiii

cccs

scss

ijii

iiii

θθc

θθs

θθ

θθθθ

βββxβxxβxβxx

βββxβxxβxβxx

βββxβxxβxβxx

β0IxxxxβIxxxx

−=′−′+′−′

−=′−′+′−′

−=′−′+′−′→

+′+′=+′+′ −••

σ

σ

σ

σσ

44 344 2144 34421

44 344 214434421

44344214434421

where we use the notation PEPEij to denote the "policy effects prediction error" of type i

when type j is in power, and the fact that βx i•′ equals iiiβx′ and ccssss βxβx ′=′ . In the case

of the last, while xss and xcs are ks x 1 and kc x 1 vectors of policy actions deemed relevant

by each type when the simple are in power, the equality follows from the fact that the

policies the simple act on that are not included in kc have zero effects, while the elements

included in cβ that are non-zero but not included in ks have 0 simple policy actions.

We begin by focusing on the beliefs of the complex, alternately pre-multiplying

(VI.2c) by xcs and xcc

,0])[(])[(

0])[(])[()3.VI(

2

2

p

ccccccncccccccsccscsccs

p

cccccccccscccsccsncscss

θθ

θθ

=′−′+′+′−′′

=′−′′+′−′+′

βxβxxxβxβxxx

βxβxxxβxβxxx

σ

σ

and then combining equations to derive the restrictions

,0))(())((

][

0))(())((

][)4.VI(

2

22

2

22

p

ncscss

csccscccscncccccncscss

PEPE

cccccc

p

nccccc

csccscccscncccccncscss

PEPE

ccsccs

θ

θθθθ

θ

θθθθ

cc

cs

=+′

′′−+′+′′−′

=+′

′′−+′+′′−′

σσσ

σσσ

xx

xxxxxxxxβxβx

xx

xxxxxxxxβxβx

4434421

43421

where we make use of the fact that each PEPE is multiplied by a strictly positive number

in at least one equation. By the Cauchy-Schwarz inequality, the numerator in the fraction

in both equations is strictly positive, so these imply that both policy effects prediction

- 27 -

errors are asymptotically zero, which in turn, from (VI.2c), implies that c

p

c ββ = .8 Since

the complex causal model is correctly specified, with minimal noise their parameter

estimates are consistent and converge on the true parameter values.

We now focus on simple beliefs, substituting into (VI.2s) using the dependence of

policies on beliefs

],[][][)5.VI( 2ssn

p

ssscsssss

ss

s

Rθ

Rθ ss βββββββ

βββββββ

ββ

ββββ −=′−′′

+′−′′

′′ σ

where we have used the fact that complex beliefs converge on true parameter values and,

to simplify and clarify matters, that ββββ ′=′cc , as the policies the complex consider

irrelevant do indeed have zero effects. If 1p

s =θ , we pre-multiply by sβ to see

{ .

][][)6.VI(

(VI.5)by

2

s

p

sss

p

ss

ssssn

p

ssssss

ss

ss

R

ββββββ

ββββββββββββ

ββ

=→′=′→

′−′=′−′′′

′ σ

If with a probability approaching one asymptotically the simple are always in power,

their model is not misspecified (as only simple policies are systematically implemented)

and given negligible amounts of noise their beliefs converge on true parameter values. In

this case, however, we have ss

p

ss

p

cc ββββββββ ′=′>′=′ , i.e. complex intensity must be

strictly greater than that of the simple as simple beliefs exclude some relevant policies

that have non-zero effects. If so, we have a contradiction, as asymptotically, with a

probability approaching 1, the simple will not be in power, thereby contradicting the

initial assumption that 1p

s =θ .

We now consider the case where 1p

c =θ , pre-multiplying (VI.5) by βs and defining

½* )/( ssββββ ′′=τ we find

8As noted earlier, absent noise there are multiple equilibria based on the linear restrictions imposed

by the restrictions on the PEPEs.

- 28 -

[ ] { ,)/(1

)1/(

][][ )7.VI(

22*

22*

(VI.5)by

22

2*

2

s

n

np

snss

p

nss

ssssn

p

ssss

R

RR

R

R

ββββββ

ββββββββββ

ββ

ββ

ββ

ββ

ββ

ββ

++

=→+′=

+′→

′−′=′−′′

′

′

′′

′

′

στστ

σστ

σ

and see that simple beliefs are proportional to βs. As the information provided by noise

relative to that provided by complex policy, ββ′Rn /2σ , goes to zero, these results imply

that cc

p

ss

p

ss ββββββββββ ′=′>′=′=′ 2*4* ττ , so the simple have strictly greater intensity

than the complex. We conclude that if ββ′Rn /2σ is sufficiently small, asymptotically with

a probability approaching 1 the complex are not in power, so the initial assumption that

1p

c =θ is invalid. Combined with the previous paragraph, this establishes that

asymptotically autarchic rule by a single type is not possible.

For 0p

c >θ and 0p

s >θ both to be true, asymptotically the intensity of the simple

and the complex must be the same, i.e. ββββββ ′=′=′p

cc

p

ss . Using this, we substitute for

equilibrium policies in (VI.5)

.

][

][

where,

][][

][][][)8.VI(

2

2

**

22

2

nsss

nsscp

ss

s

p

s

nsscs

p

nssss

ssn

p

ssscssss

Rθ

Rθ

Rθ

Rθ

Rθ

Rθ

σ

σττ

σσ

σ

+′−′′

+′−′′==

′′

=→

+′−′

′=

+′−′

′→

−=′−′′

+′−′′

′

′

′′

′′

ββββββ

ββββββ

ββ

ββββ

ββββββ

βββββββ

β

ββββββββ

ββββββββ

ββ

ββ

ββββ

ββββ

Simple beliefs asymptotically are proportional to true parameter values and, given that

simple intensity equals that of the complex, the factor of proportionality must be τ*.

Using this, we can solve for θs by substituting s

p

s ββ*τ= in the expression for τ* in (VI.8)

- 29 -

,1

1

][

][

)9.VI(*

2*

2*

2*

*

τ

στ

στ

σττ

+

−=→

+′−′′

+′−′′= ′

′

′

ββ

ββ

ββ

ββββββ

ββββββ R

θR

θ

Rθ

n

p

s

nsss

nsscp

which is the exact same result as in the 1 vs 2 policy model of the last section. Finally,

we note that complex policies yield an average outcome of ββββ ′′R , simple policies an

average outcome of ββββ ′′R)/1( *τ , and the simple over and underestimate the

effectiveness of their own and complex policies by

,11

& 1

1)10.VI(**

−′=′−′

−′=′−′ ′′ ττβββxβxβββxβx ββββ RR

p

PEPE

cccssc

p

PEPE

ssssss

scss

4342143421

which also follow results in the previous section.

To summarize, generalized political competition between types which correctly

and incorrectly specify the causal model for y produces an equilibrium which matches in

all relevant respects that of the 1 vs 2 policy example of the previous section. The beliefs

of the type with the correctly specified causal model converge on the true parameter

values and asymptotically they implement a broad and "complex" set of polices across all

relevant policy instruments, while setting irrelevant policy instruments to zero. The

beliefs of the type with an incorrectly specified model converge on a multiple of the true

parameter values for the policy instruments they believe are relevant. They correctly set

policy to zero in areas where policy is actually irrelevant, but implement an exaggerated

policy agenda in a narrow and "simple" set of relevant policies, systematically

overestimating the effectiveness of their preferred policies and underestimating the

effectiveness of the complex policy agenda. Neither perpetual rule by the simple nor

(with sufficiently small noise) the complex is possible, and the asymptotic equilibrium

involves policy cycles where periods of complex rule increase simple bias, recurrently

returning the simple to power, where they implement systematically less effective

- 30 -

versions of complex policies, thereby disappointing their intensely motivated voters and

ensuring their eventual electoral defeat.

VII. Local Dynamics: Random Outcomes and the Political Cycle

A peculiar characteristic of political life seems to be that random outcomes

benefit or harm incumbent parties. In this section we show that this feature arises in our

model through the fully rational Bayesian updating of beliefs. Random shocks change

estimates of the effectiveness of policy, but these effects are stronger for the incumbent

party which is implementing its desired policy combination. Specifically, we show that

(R4) Although the long run equilibrium involves cycles with types alternating in power, a random negative shock to y lowers the voting intensity of incumbent groups relative to their opposition, hastening regime change, while random positive shocks to y strengthen the political position of incumbents, lengthening their stay in power in the current political cycle.

To allow an examination of period by period beliefs, we restore notation with respect to

time, with the t x k matrix Ht denoting the history of policy up to time t, the vector th′ the

tth row thereof, and Hit and ith′ the corresponding histories and tth period policies that type

i deems relevant. We focus on outcomes in the vicinity of the steady state and, to

simplify the analysis, with negligible amounts of policy noise. As elsewhere, formal

proofs of probability limits stated below are given in the on-line appendix.

The formula for mean Bayesian beliefs, based as it is upon regression coefficients,

allows a simple representation of the updating of beliefs from period t to t +1

- 31 -

],[)(1

)(

)(1

)(

)(1

)(

)()(1

)()()(

)()1.VII(

11

11

1

11

11

1

111

11

1

111

11

11

1

111

11

111

111

ititt

itititit

itititit

itititit

tititit

itititit

itititititit

tittit

itititit

itititititititit

titititit

y

y

y

βhhHHh

hHHβ

hHHh

hHH

hHHh

βhhHHβ

hyHhHHh

HHhhHHHH

yHHHβ

+++

−+

+−

+−

+

++−

+−

+

++−

+++

−+

−++

−−

++−

+++

′−′′+

′+=

′′+′

+′′+

′′−=

+′

′′+′′′

−′=

′′=

where in the second line we make use of the Sherman-Morrison formula for the rank one

update of a matrix inverse.9 The term in brackets [] in the last line is the period t + 1

prediction error based upon beliefs at the end of period t. As the beliefs of the complex

encompass all non-zero policy effects and converge on the true parameter values, their

prediction error converges on the random component of y, 1111 ++++ +′=′− ttctctty εβhβh

,11 ++ =′− t

p

ctct εβh while asymptotically the prediction error of the simple contains both that

random component and the systematic under and overprediction of outcomes under each

regime discussed earlier, ststty βh 11 ++ ′− .1*

11 +++ +′−′= tsstt

p

ετ βhβh Our analysis in this

section focuses on the impact of εt+1.

As itititititititititit NNNXXNXXHH ′+′+′+′=′ , we use the fact that policies and the

share of time each type spends in power converge to steady state values, that noise shocks

are mutually independent, and that the averaged cross-products of desired policy and

policy noise converge on 0 to calculate the following probability limits

9Specifically, (V+xx')-1 = V-1 - V-1

xx'V-1/(1 + x'V-1x).

- 32 -

,0)( while

, ,

,)( with

, and )2.VII(

1

1

11

11

2~~~

22*

2

~

~~

~

pitititit

itititit

knsscsbc

knbbcs

knxkk

xkkp

ststp

ctct

ttt

RR

R

tt

s

b

cbc

cb

=

′′=′′

+′

′=′

′=

+′

′+=

=

′

′=

′

+−

++

−+

′′

′

hHHhhHHh

Iββ

ββCββ

ββB

Iββ

ββA

I0

0AHH

CB

BAHH

ββββ

ββ

σθθ

σθτθ

σ

and where we use the subscript ~i to denote policies that each type deems irrelevant,

subscript b to denote the policies both deem relevant, k~i and kb the number of such

policies, and make use of the fact that simple beliefs and policies in areas the complex

deem irrelevant (~c) converge to the true parameter values of 0. These results allow us to

express the asymptotic change in beliefs as

,1

],[1

)3.VII(

11

1

1

1*

111

1

21

~~

~

++

−

+

++++

−

+

′=−

+′−′

=−

tct

p

ctct

tssttst

knxkk

xkkp

stst

t

tcbc

cb

ε

ετσ

hCB

BAββ

βhβhhI0

0Aββ

(VII.3) shows that in the limit changes in beliefs are of order O(1/t) times a random

variable with a finite variance, so we can asymptotically approximate the change in the

intensity of each type as:

).(2)()()4.VII( 11111 ititit

p

itititititititit βββββββββββ −′=−′+=′−′ +++++

Finally, we note the formula for a block matrix inverse and calculate the limits of some

useful quadratic forms as policy noise goes to zero:

- 33 -

[ ] .0/0lim/limlim

,)/(

lim

/)/(1

/)/(1limlim

,)()/()(

lim

/)/()(1

/)/()(1limlim

,)()(

)()()5.VII(

2

0

2~~

0~

12~

0

~~2

~~

0

~2~~

4~~

2~0

~1

~0

2*2*20

22*

42*

20

1

0

11111111

111111

22~2

2

~22

2

22

==′=′

′=

′′+′

=

′′+′′

−′=′

+′

=′′++

′=

′′++′′+

−′=′

′−′+′−′−′−−′−

=

′

→→

−

→

′′→

′

′

→

−

→

′′→

′

′

→

−

→

−−−−−−−−

−−−−−−

nncccknc

csscn

ss

s

nssc

nssc

k

n

sss

csbbcsn

bb

b

nbbcs

nbbcs

k

n

bbb

nnc

n

n

snn

n

bnn

RR

R

R

RR

R

R

σσσ

θθσ

σθσθ

σ

θτθθτθσ

σθτθσθτθ

σ

σσσ

σ

σσ

σ

σσ

βββIβ

ββ

ββββ

ββ

βββββ

ββββIββCβ

ββ

ββββ

ββ

βββββ

ββββIββAβ

BCBBCABCCBBCABC

BCBBCABBCA

CB

BA

ββββ

ββ

ββ

ββββ

ββ

ββ

Since we are considering the limit as the variance of policy noise goes to zero, we also

take ht+1 in (VII.3) as equal to xt+1, the intentional policy vector of that time period.

Asymptotically the simple implement policies βββ ββ′′ /* Rsτ for the policies they

believe are relevant and 1~ xk s0 for those they believe are irrelevant, so using the preceding

results the change in the intensity of both types when the simple are in power is

[ ]

,2]1[2

*

/2lim)(2 plimlim)6.VII(

2*1

2*

2*

**

1***

*2~~

1*

01

0 22

cs

t

cs

tssss

nccbbststst

R

RR

Rt

nn

θτθετ

θτθττετττ

τστσσ

+′

++

−′=

+

′′−

′′

′′+′=−′

+

′+

′′

′−

→+

→

ββ

ββββ

ββ

βββββ

ββββ

βββ

βββββAββββ

[ ]

.0)()1(2lim

2lim)(2 plimlim

1*1

VII.5)(by 1

~~0

1

1

*1

~0

10

2

~

22

=′

′−′′

−=

′

′′′=−′

+′−−

=

′−

→

+′

−

→+

→

tbbcss

t

xk

b

sbctctct

RR

Rt

n

s

nn

ετθ

ετ

σ

σσ

βββBBCAβ

βββCβ

ββ0

β

CB

BAβββββ

ββ1ββ1

ββ

44 344 21

- 34 -

The first term for the simple represents the systematic tendency for their intensity to

decline when in power, as they respond to the overprediction of average outcomes. The

εt+1 term, for the simple and the complex, represents the effect of random shocks to y.

Here, a negative shock reduces the intensity of the simple, as their belief in the

effectiveness of the policies they deem relevant falls. Complex beliefs in these same

policies also fall, but the complex belief in the efficacy of policies the simple deem

irrelevant, and hence do not implement, rises, as the poor outcome under simple rule

convinces the complex that these neglected policies are more effective than previously

thought. These two effects offset each other, and complex intensity remains constant. In

sum, a negative shock lowers the relative political intensity of the simple, hastening the

transfer of power, with positive shocks having the opposite effect.

When the complex are in power asymptotically they implement policies

βββ ββ′′ /Rc for the policies they believe are relevant and 1~ xk c

0 for those they believe

are irrelevant, so the changes in intensity are seen to be

[ ]

[ ]

.2

])())(1[(2lim

lim)(2 plimlim

,2]1[2

*

/2lim)(2 plimlim)7.VII(

1

1~~12

VII.5)(by 1

~~0

1

~

1

~0

10

2*1

*

2*

*

1*

21~

1*

01

0

2

22

~22

c

t

tssbb

c

ss

t

s

b

sbctctct

cs

t

cs

tss

nxkcbbststst

R

RR

Rt

R

RR

Rt

n

nn

cnn

θε

εθ

ε

θτθετ

θτθτετ

στ

σ

σσ

σσ

+

′

+′−−−

=

′−

→

+′

−

→+

→

+

′+

′′

′−

→+→

′=

′+′−′

′−=

′

′′′=−′

+′

++

−′=

+

′′−

′′

′′+′=−′

ββ

ββ11ββ1

ββ

ββ

ββββ

ββ

ββ

βββCββBBCAβ

βββCβ

βββ

β

CB

BAβββββ

βββββ

ββββ

βββ

ββ0ββAββββ

444 3444 21

- 35 -

Once again, the change in simple beliefs contains a systematic component, this time

consisting of the gradual increase in bias and intensity as outcomes under the complex are

consistently better than expected. Both simple and complex respond to the realization of

the output shock ε, but the impact on the intensity of the complex is greater as, given that

*/ τθθ →sc as 02 →nσ , we have

.1

)1(lim )8.VII(

*

*

2*

*

02

cccsn θτθτ

θτθτ

σ<

+=

+→

A negative shock reduces the belief in the effectiveness of policies of both types, but the

effects on intensity are greater for the complex, for whom intensity depends upon a wider

range of policies, all of which are seen to be failing. Consequently, negative shocks

accelerate regime change, ushering in further negative outcomes as the simple implement

misguidedly narrow and intense policies, while positive shocks lengthen the time the

complex hold onto power and the polity continues to benefit from a full range of

moderate policy actions.

VIII. Conclusion

Our analysis has shown how simplistic beliefs can persist in political competition

against a more accurate and complex view of the world, delivering sub-par outcomes on

each outing in power and yet returning to dominate the political landscape over and over

again. In the framework presented above simplistic beliefs arise as a consequence of a

primitive assumption of misspecification, but we recognize that there are deeper

questions to explore. A recent examination of European Social Survey data by Guiso et

al (2017) finds that the responsiveness of the electorate to populist ideas and the supply of

populist politicians increases in periods of economic insecurity. Social and economic

transformation, and the insecurity and inequality it can engender, may create

- 36 -

environments in which opportunistic politicians are able to plant erroneously simplistic

world views into the electorate. Linking belief formation, at its most fundamental level,

to ongoing economic and political events allows a richer characterization of political

cycles, and is something we intend to explore in future work.

- 37 -

Bibliography

Acemoglu, Daron, Georgy Egorov and Konstantin Sonin (2013). "A Political Theory of Populism". Quarterly Journal of Economics: 771-805.

Anderson, T.W. (2003). An Introduction to Multivariate Statistical Analysis. Third Edition. New Jersey: John Wiley & Sons.

Arrow, Kenneth J. and Jerry R. Green (1973). "Notes on Expectations Equilibria in Bayesian Settings." Working paper #33, Institute for Mathematical Studies in the Social Sciences, Stanford University.

Bernhardt, Dan, Stefan Krasa, and Mehdi Shadmehr (2019). "Demagogues and the

Fragility of Democracy." Manuscript, University of Illinois, Urbana.

Bohren, J. Aislinn and Daniel N. Hauser (2019). "Social Learning with Model Misspecification: A Framework and a Characterization." Manuscript, 2019.

Bray, Margaret (1982). "Learning, Estimation, and the Stability of Rational

Expectations." Journal of Economic Theory 26: 318-339. Dahrendorf, Ralf. (2007). "Acht Anmerkungen zum Populismus". Eurozine 2007 (first

published Transit 25, 2003). Di Tella, Rafael and Julio J. Rotemberg (2016). "Populism and the Return of the

`Paranoid Style': Some Evidence and a Simple Model of Demand for Incompetence as Insurance against Elite Betrayal". Manuscript, Harvard Business School.

Dornbusch, Rudiger and Sebastian Edwards (1991). "The Macroeconomics of

Populism." In Dornbusch, Rudiger and Sebastian Edwards, eds, The Macroeconomics of Populism in Latin America. Chicago, University of Chicago Press.

Eliaz, Kfir and Ran Spiegler (2019). "A Model of Competing Narratives." Manuscript, 2019.

Esponda, Ignacio (2008). "Behavioral Equilibrium in Economies with Adverse

Selection." American Economic Review 98 (4): 1269-1291. Esponda, Ignacio and Demian Pouzo (2016). "Berk-Nash Equilibrium: A Framework for

Modeling Agents with Misspecified Models." Econometrica 84 (3): 1093-1130. Esponda, Ignacio, Demian Pouzo and Yuichi Yamamoto (2019). "Asymptotic Behavior

of Bayesian Learners with Misspecified Models." Manuscript, 2019. Frick, Mira, Ryota Iijima and Yuhta Ishii (2019). "Stability and Robustness in

Misspecified Learning Models. Manuscript, 2019.

- 38 -

Gidron, Noam and Bart Bonikowski (2013). "Varieties of Populism: Literature and

Research Agenda." Manuscript, Weatherhead Center for International Affairs, Harvard University.

Guiso, Luigi, Helios Herrera, Massimo Morelli, and Tommaso Sonno (2017). "Demand and Supply of Populism." Manuscript, Einaudi Institute for Economics and Finance.

Heidhues, Paul, Botond Kőszegi and Philipp Strack (2018). "Convergence in Misspecified Learning Models with Endogenous Actions." Manuscript, 2018.

Mailath, George J. and Larry Samuelson. "Learning under Diverse World Views: Model-

Based Inference." PIER working paper 19-018, University of Pennsylvania. Molavi, Pooya (2019). "Macroeconomics with Learning and Misspecification: A General

Theory and Applications." Manuscript, 2019. Montiel Olea, José Luis, Pietro Ortoleva, Mallesh Pai and Andrea Prat (2019).

"Competing Models". Manuscript, 2019. Morelli, Massimo, Antonio Nicoló and Paolo Roberti (2020). "Populism as Simplistic

Commitment." Manuscript.

Mudde, Cas and Cristóbal Rovira Kaltwasser (2017). Populism: A Very Short Introduction. Oxford: Oxford University Press.

Nyarko, Yaw. "Learning in Mis-specified Models and the Possibility of Cycles." Journal of Economic Theory 55 (2): 416-427.

Schmidt, Peter (1976). Econometrics. New York: Marcel Dekker, 1976. Zellner, Arnold (1971). An Introduction to Bayesian Inference in Econometrics. New

Jersey: John Wiley & Sons.

Misspecified Politics and the Recurrence of Populismpersonal.lse.ac.uk/YoungA/Misspecified Politics and... · London School of Economics, March 2020 We develop a model of political

Documents