Mass Purges: Top-down Accountability in Autocracy · In 1901, when writing his revolutionary agenda What is to be done?, Vladimir Ilyich Ulyanov (alias Lenin) chose one particular

Mass Purges: Top-down Accountability in Autocracy∗

B. Pablo Montagnes

Emory University

[email protected]

Stephane Wolton

London School of Economics

[email protected]

This version: July 16, 2016

Most recent version accessible here

Abstract

This paper contends that mass purges are a salient method of top-down accountability

used by totalitarian regimes to increase party performance and shape party membership. In

our theoretical framework, party members work on independent projects. Their fate, however,

is linked through the purge, and a member’s effort depends on the activism of all others via

what we call the pool size effect. In turn, the autocrat’s incentive to purge depends on the

informativeness of different performance indicators, a function of all members’ effort via what

we term the pool makeup effect. These novel pool effects emerge from the many (party

members) to one (autocrat) accountability problem faced by the principal. Our approach

also highlights how violence affects top-down accountability in autocracy. Greater intensity

of violence increases effort, but can impede selection. The autocrat thus cannot escape a

trade-off between love (less unity) and fear (more activism).

∗We thank Scott Ashworth, Dan Bernhardt, Alessandra Casella, Ali Cirone, Torun Dewan, Tiberiu Dragu, Scott

Gehlbach, Thomas Groll, Haifeng Huang, Navin Kartik, Roger Myerson, Salvatore Nunnari, Carlo Prato, Arturas

Rozenas, Milan Svolik, Scott Tyson, conference and seminar participants at the Sixth LSE-NYU Conference, Priorat

Workshop in Theoretical Political Science, the Third Formal Theory & Comparative Politics Conference at Chicago,

3rd PECO Conference in Washington, D.C., Bocconi University, Columbia University for helpful comments and

suggestions. All remaining errors are the authors’ responsibility.

1

http://ssrn.com/abstract=2810511

1 Introduction

Party struggles lend a party strength and vitality; the greatest proof of a party’s

weakness is its diffuseness and the blurring of clear demarcations; a party becomes

stronger by purging itself

From a letter of Lassalle to Marx, of June 24, 1852

In 1901, when writing his revolutionary agenda What is to be done?, Vladimir Ilyich Ulyanov

(alias Lenin) chose one particular sentence as an epigraph. This sentence reproduced above calls

for the use of purges as a mean of shaping the membership of communist parties. Often repeated

(e.g., Stalin at the 13th Party Congress, the Pravda in 1949, see Brzezinski, 1956, 207, 148), this

sentence has been used to justify the many purges of rank-and-file party members experienced by

the Communist Party of the Soviet Union (CPSU). These mass purges—by the sheer number of

individuals concerned—are not just a Soviet phenomenon. They have been recurrent in Communist

China and occurred in both Fascist Italy (in 1931) and Nazi Germany (in 1938).

In a single party system, the autocrat faces the twinned issues of motivating and selecting her

agents. We model this concern, common to many hierarchical organizations (the military, bureau-

cracy, firm), as a top-down accountability problem. Mass purges, we contend, provide a means

of addressing it.1 Unlike elite purges, which increase an autocrats hold on power (De Mesquita

and Smith, 2011), mass purges both “produce necessary momentum and dynamism for great social

changes” and “cleanse the system in anticipation” (Brzezinski, 1956, 19) from the ideologically

passive that the party attracts due to the economic benefits of membership.

Because they affect all party members, mass purges illustrate that top-down accountability

can take the form of a many-to-one problem where a single principal seeks to hold accountable a

mass of agents. The autocrat evaluates a party member based on his performance relative to his

peers. We show that this particular feature generates novel strategic interdependencies that are not

accounted for in models where one or many principals interact with a single agent (e.g., electoral or

ministerial accountability). Mass purges also highlight a specific feature of autocracies, namely the

ever present possibility of violence. Even though the autocrat faces little formal constraints on her

1The autocrat is thus the principal to whom we reserve the pronoun ‘she’ to follow the tradition in agency models.

We are, however, well aware that, with the possible exception of Indira Gandhi in 1975-77, all autocrats have been

male.

2

actions, the use of violence sometimes generates an inescapable trade-off between fear (motivation)

and love (selection) as her agents strategically respond.

In our theoretical framework, party members can be either ideologues (sharing the autocrat’s

ideological objectives) or opportunists (entering the party for its associated benefits). They exert

effort on individual projects where they can either succeed (e.g., meet the quotas) or fail. The

autocrat only observes the outcomes of members’ project. She then decides the purge breadth,

defined as the proportion of party members being removed from the party. The autocrat also

decides the intensity of violence, defined as the cost of being purged for party members. All

purged members are replaced by new members from an available replacement pool (e.g., the pool

of candidates to the party in USSR and Communist China).

Even though party members work on independent tasks, their fate is linked through the purge.

A member’s probability of being purged depends on the overall performance of the party via what

we term the ‘pool size effect.’ Hence, a member’s incentive to exert effort on his project depends

on all other members’ level of activity. We refer to this strategic interdependency as the ‘pool size

effect.’ This pool size effect is positive (i.e., higher party performance encourages greater effort)

when only members who fail in their project are purged—defined as a ‘discriminate purge’—since

failure becomes riskier. It is negative in a ‘semi-indiscriminate purge’ when some successful party

members are purged because higher effort by other members decreases the probability of surviving

the purge even if successful.

The autocrat carries out a purge to increase the ideological unity of the party, but the screen-

ing is never perfect. In any purge, some ideologues are eliminated, whereas some opportunists

survive. The autocrat’s incentive to purge is a function of the informativeness of different perfor-

mance indicators which is affected by all party members’ activism. We term this second strategic

interdependency the ‘pool makeup effect.’ We show that as party members anticipate higher purge

threat, the pool makeup effect is weakly positive, leading to weakly greater purge breadth.

The autocrat also selects the intensity of violence which affects both members’ effort, what

we term the ‘fear effect,’ and the future composition of the party, what we call the ‘love effect.’

The fear effect is positive: increased intensity of violence always improves present performance by

increasing the cost of being purged. However, greater intensity of violence can impede the screening

of party members and limits gains in ideological unity induced by the purge: the love effect can

be negative. The autocrat may therefore face a trade-off between short-term (higher effort) and

long-term (party unity) benefits of the purge.

3

Finally, we benchmark our theoretical results against historical evidence on mass purges. Our

theory predicts that the nature of the purge depends critically on the intensity of violence with

violent purges more likely to be semi-indiscriminate and (relatively) mild purges discriminate. In

turn, violent purges are not necessarily broader as the purge breadth is non-monotonic in violence.

Our results are in line with the differences and similarities between Maoist purges in the fifties

and Stalinist purges of the thirties. As Teiwes (1993, 25-27) describes, Chinese campaigns were

less violent and more targeted than Stalinist purges when “flouting commands court danger, but

even enthusiastic compliance is no guarantee of safety.” The proportion of party members purged,

however, appear to have been relatively similar in both countries (see Teiwes, 1993; Getty, 1987).

Related Formal Literature

Our paper advances the idea that mass purges are a salient method of accountability in autocracy.

In contrast, the literature has put great emphasis on elite purges: the shaping of an autocrat’s

close circle and contests for power. Bueno de Mesquita and Smith (2015) consider under which

conditions a leader might purge the selectorate to increase his payoff and probability of survival.

Svolik (2009) examines how autocrats can use elite purges to acquire more power. Acemoglu et

al. (2008, 2009) show that there exists limits to elite purges: members of the autocrat’s winning

coalition would oppose too much purging as it increases the risk of their being removed in the

future. Egorov and Sonin (2015) consider how elite purges in the present breeds more elite purges

in the future as winners of a contest for power anticipate the risk of sparing losing contenders.

In contrast, our analysis highlights the importance of motivating and selecting a mass of party

members.

We thus provide a framework for analyzing many-to-one accountability problems. Dozens of

studies focus on one-to-many accountability such as electoral accountability (see Ashworth, 2012, for

a review). More recently, several papers have examined the specificity of one-to-one accountability.

In democratic politics, Dewan and Myatt (2007, 2010) study a prime minister’s choice whether

to retain or fire a minister following a scandal. Their analysis shares some similarities with ours,

including (as we will see) the effect of the quality of the replacement pool on the principal’s

decision.2 Other works on one-to-one accountability focus on the difficulty for an autocrat to hold

her officials accountable. Egorov and Sonin (2011) consider the optimal contract an autocrat can

2Many models of bureaucracy can be interpreted as one-to-one accountability. See Gailmard and Patty (2012)

for a review.

4

offer to his close circle to avoid betrayal. They show that the autocrat can find it optimal to

recruit incompetent viziers as it can be too expensive to induce loyalty from competent ones (see

also Zakharov, 2016). Egorov et al. (2009) and Lorentzen (2014) examine under which conditions

autocrats can allow a (relatively) free press to limit corruption in the bureaucracy. Gehlbach and

Simpser (2015) analyze how an autocrat can use and manipulate elections to increase bureaucrats’

effort and her chance of survival. These works study moral hazard or adverse selection problems,

whereas our theoretical framework includes both. Further, due to the nature of the accountability

problem studied, these papers are unable to identify the pool size or pool makeup effects we uncover

and their impact on agents’ effort and selection.

Our paper also innovates by letting the autocrat chooses the intensity of violence and joins

a small literature on the role of punishment and violence. In the context of industrial organiza-

tion, Bernhardt and Mongrain (2010) shows how the threat of being fired can lead to inefficient

investment in firm-specific human capital. Bernhardt and Mongrain, however, consider a set-up

where the principal is fully informed when she makes her decision whether to retain an employee so

the pool size and pool makeup effects we identify are entirely absent. In a principal-agent set-up,

Acemoglu and Wolitzky (2011) show how coercion (in the form of slavery) can minimize an agent’s

moral hazard rents by reducing his outside option. Because a principal (slave-owner) only deals

with a single agent (slave), an agent’s risk of being punished does not depend on all other agents’

behaviors unlike in our set-up. Bloch and Rao (2002); Dal Bo et al. (2006) highlight how violence

or the threat thereof increases a player’s bargaining position and thus improves his payoff in various

bargaining settings. None of these studies includes adverse selection and so cannot examine the

effect of violence on the screening of agents like in our paper.

2 Evidence on mass purges

Before presenting our theory, we provide some evidence justifying why mass purges are best un-

derstood as a salient method of top-down accountability in autocracy. Mass purges have been

extensively used in communist regimes under Lenin and Stalin (in USSR) and Mao (in China).

The Communist Party of the Soviet Union (CPSU) experienced mass purges (“chitska”—a sweep-

ing, cleansing) in 1919, 1921-23, 1924, 1928, 1929, 1931, 1933-34, 1935, 1936 (Getty, 1987), 1949

(Brzezinski, 1956),1951-53 (Rigby, 1968), and 1971 (Schapiro, 1977). In China, mass purges

(“quingchu”—to weed out) were part of the rectification campaigns (which included a reeduca-

5

tion component absent from Soviet purges) which occurred in 1947-48, 1950,1951-54, 1953, 1957,

1957-58, 1959-60, 1960-61, 1962-63, 1964-65. The list of mass purges, you will notice, does not

include the Great Terror (1936-38) or the Cultural Revolution (1966-76). We exclude these two

important events for two reasons. First, both had some elements of elite purges (Red Army purge in

USSR, “bombard the headquarter” in China), which are absent from our theory. More importantly,

they were characterized by broad terror against the general population which is not explained by

any model of purges.3 Mass purges are not just a communist phenomenon, but are less frequent in

other totalitarian systems. In Fascist Italy, the party experienced a mass purge in 1931 (Morgan,

2012). In Germany, the NSDAP was purged in 1938 (Orlow, 1969, 241-2).

According to (Brzezinski, 1956, 9), purges in totalitarian systems are a “technique of govern-

ment”. Elite purges serve to resolve power struggle, but mass purges have different purposes. Getty

(1987, 38, emphasis in text) asserts that “[i]n the majority of purges, political crimes or deviations

pertained to a minority of those expelled” from the CPSU.4 Similarly, Teiwes (1993, 5) argues that

most rectification movements and associated purges in China were responses to problems arising

outside the top party leadership.5

The goals of mass purges are twofold. First, they provide the necessary momentum to accom-

plish the grand designs of the totalitarian regimes (Brzezinski, 1956, 19), to sustain a high level

of activity (Teiwes, 1993, 37). Second, they are meant to cleanse the party ranks from the op-

portunists who join for the social and economic benefits (e.g., reserved positions, special shops)

associated with membership (Brzezinski, 1956, 21, 148; Getty, 1987, 41, 89, 169; Teiwes, 1993, 6-7).

By removing a proportion of party members, mass purges also provide for the influx of new mem-

3There obviously exist non-formal theories of totalitarian terror, in particular Arendt (1973), to which we cannot

do justice here.4Rigby (1968), on the other hand, argues that the purges of the CPSU in the 1930’s paved the way for the Great

Terror and the show trials of 1936-38. However, he still admits that the 1933-34, 1951-53, and especially 1921-23

purges were not caused by conflict between leaders (see, e.g., 282-83). Far from generating purges, the contest for

power in USSR between Stalin, Trotsky, Zinoviev, and Bukharin lead to an increase in party membership as Stalin

tried to recruit allies (Rigby, 1968, 131).5Chinese rectification movements were also meant to educate ranks-and-files, a concern generally absent from

Soviet purges. Interestingly, the few elite purges of the Chinese leadership did not trigger mass purges even when

the purged leaders, like Gao Gang and Rao Sushi, were accused of establishing independent kingdoms (Teiwes,

1993, chapter 5, especially 142 and 162). The dismissal of Peng Dehuai in 1959, however, was concomitant with the

1959-60 Rectification Campaign. Nonetheless, Teiwes argues that the two events were uncorrelated (ibid., 339 and

341).

6

bers (Brzezinski, 1956, 9, 131, 168, Teiwes, 1993, 42-43) drawn from the pool of candidates to the

party (Rigby, 1968, 52-53). Consequently, while elite purges target specific individuals, the target

of mass purges is more diffuse, it is the mass of party members composed of potentially millions

of individuals (Teiwes, 1993, 5).6 Further, while elite purges depend on circumstances, communist

leaders thought to regulate the periodicity of mass purges (Getty, 1987, 38, 41), with Mao pre-

scribing rectification campaigns twice every five year (Teiwes, 1993, 224). On this dimension, mass

purges thus share more similarities with elections than elite purges.

3 Set-up

We study a two-period (t ∈ {1, 2}) model with an autocrat (A) and a [0, 1] continuum of party

members, indexed by the superscript m. Each party member is characterized by a type τ ∈ {i, o},

where τ = i corresponds to an ideologue and τ = o to an opportunist.7 A party member’s type is

his private information, however it is common knowledge that there is a proportion λ of ideologues

among party members.

Each period, party member m exerts effort em ∈ [0, 1] at cost (em)2/2 on an individual project.

The probability member m’s project is successful is equal to em.8 While a member’s effort is not

observed by the autocrat, the outcome of his project is (e.g., whether he has fulfilled his quota).

This assumption corresponds to historical evidence that officials in charge of the purges had little

information about local circumstances and could only judge according to how successful problem

cases or certain projects were handled (Teiwes, 1993, 28, 42, Rigby, 1968, 96). At the end of period

1, the autocrat decides to purge a proportion κ of party members, which we refer to as the ‘purge

breadth.’

Mass purges entail a loss in term of human capital and organizational knowledge as well as

the cost of potentially deporting party members or delay in finding suitable replacement for the

purged party member. This cost is captured by the cost function c(κ) with c(0) = 0 and (for ease

6As described by Weinberg (1993, 23) in his case study of mass purges in Birobizhan, leaders do not have “detailed

list of individuals to be purged.”7We use the term opportunist for simplicity to encompass members attracted to the party for the benefits attached

to it (Getty, 1987, 32-33), members who lacked “a wholehearted commitment to the Party’s cause” (Teiwes, 1993,

114-115), and members not in line with the current Party’s policy (see the quote from Stalin in Gregory et al., 2011,

36).8Alternatively, we could assume that effort is translated in project success via some concave function.

7

of exposition) marginal cost c′(κ) = c0 + c1κ, c0 ≥ 0, c1 > 0. When a party member is purged, a

new member replaces him (e.g., from the pool of candidates). The proportion of ideologues among

the replacement pool is ri.

Being purged has two distinct consequences for a party member. First, the party member is

expelled from the party and cannot enjoy the benefit associated with party membership in the

second period. Second, he suffers a direct loss L which corresponds to the “intensity of violence”

of the purge. The loss L can be relatively low if a party member is only fined or very large if a

party member is killed, her or his spouse deported, and their children sent to orphanage as it was

commonplace in Stalin’s USSR (Brzezinski, 1956, 110).9 The autocrat determines the intensity of

violence at the beginning of the game (e.g., investment in the security apparatus) at a cost ζ(L)

with ζ(0) = 0 and marginal cost ζ ′(L) = ζ0 + ζ1L, ζ0 ≥ 0 and ζ1 > 0.

In period 1, a party member enjoys a benefit R ≥ 0, which captures all the special privileges

accorded to party members. In addition, If he is not purged from the party, km = 0, an ideologue

obtains b > 0 when his project is successful, whereas an opportunist gets 0 regardless of the outcome

of his project. When purged, km = 1, a party member suffers the loss L > 0.10 Party member m’s

first-period payoff thus assumes the following form:

um1 (e; τ) = R + (1− km)

I{τ=i}b if project succeeds

0 otherwise

+ km(−L)− e2

2(1)

In period 2, if m survives the purge, given that there is no subsequent purge, his payoff can be

expressed as the sum of party membership benefit (R) and the net gain from successful project:

um2 (e; τ) = R +

I{τ=i}b if project succeeds

0 otherwise

− e2

2(2)

To simplify the exposition, we assume throughout that a party member does not discount the

future.

The autocrat gets a positive payoff—normalized to 1—when a party member’s project is suc-

cessful, and 0 otherwise. The autocrat thus wants to maximize the proportion of successful project,

which is equal to party members’ average effort in each period. In the first period, the autocrat

also bears the cost of investing in the intensity of violence and the cost of purging. Denote et the

9For example, the law of June 1934 in USSR established that all family members are legally responsible for the

illegal acts of one of them (Wolton, 2015, 271).10All our results hold if an ideologue enjoys the payoff from successful project even after being purged.

8

average effort in period t ∈ {1, 2}, we can thus express the autocrat’s first-period and second-period

payoffs as, respectively:

uA1 (κ, L) =e1 − c(κ)− ζ(L) (3)

uA2 =e2 (4)

The autocrat has a discount factor of β ∈ (0, 1), which captures, among other things, the risk

(perceived or real) of losing power between the two periods.

To summarize, the timing of the game is:

Period 1:

1. Autocrat chooses the intensity of violence L ≥ 0;

2. Member m chooses effort em1 ;

3. Project outcome (success/failure) is determined and observed by autocrat. Autocrat chooses

the purge breadth κ;

4. Purged members are replaced by new party members, and first-period payoffs are realized;

Period 2:

1. (Surviving and new) member m chooses effort em2 ;

2. Project outcome is determined;

3. Game ends and second-period payoffs are realized.

Note that the assumption that the autocrat commits to an intensity of violence at the beginning of

the game is not innocuous. Without commitment, at the moment of the purge, the autocrat would

either choose no violence (if violence is costly) or the highest feasible intensity (if it is costless). This

is because once efforts choices are made, violence has no effect on selection (and clearly on effort).

This assumption, however, has some historical ground. Funds for the Great Terror were earmarked

before its launch (Wolton, 2015, 317; unfortunately, there is no similar historical evidence for mass

purges). Observe further that the autocrat prefers to commit whenever her preferred intensity of

violence is positive.

The equilibrium concept is Perfect Bayesian Equilibrium (PBE), which implies that each party

member correctly anticipates the autocrat’s purging decision and other members’ effort when choos-

ing his own effort, and, in turn, the autocrat correctly anticipates the level of effort by each type

when determining her investment in violence and purging strategy. For simplicity, we impose that

agents are anonymous, so all agents with a successful (failed) project face the same probability of

being purged. Finally, to deal with measurability issue, we assume that when the autocrat observes

9

an out-of-equilibrium event, she treats the deviation as a mistake and does not distinguish between

the party member who deviated and the rest of the party which followed the prescribed strategy.11

If after these restrictions, multiple PBE arise, we select the one which maximizes the autocrat’s

ex-ante expected welfare (henceforth autocrat welfare). In what follows, “equilibrium” refers to

this class of equilibria.

Throughout, we use the following notation. v(τ) corresponds to a party member’s flow payoff

if his project is successful; i.e., v(i) = b and v(o) = 0. V2(τ) denotes a party member m’s expected

payoff from being in the party in period 2 as a function of his type. Simple algebra yields V2(i) =

R + b2/2 and V2(o) = R. The (ex-ante) average payoffs are denoted by v = λv(i) + (1 − λ)v(c)

and V2 = λV2(i) + (1− λ)V2(o). For the autocrat, denote W2(τ) her second-period expected payoff

induced by a type τ ∈ {i, o}. It can be checked that W2(i) = b and W2(o) = 0. The gain from

replacing an opportunist by an ideologue is Di,o := W2(i)−W2(o).

We also impose the following restrictions on parameter values:

βriDi,o < c0 + c1 ≤ βriDi,o + c11 + v

2− βλ

1−v(i)2− v(i)−v

2− (V (i)− V2)

1−v2

Di,o (5)

The left-hand side states that it is never optimal for the autocrat to purge the whole party even if

there is no ideologue among current party members (e.g., due to the risk of popular rebellion if the

party work is too disrupted). The right-hand side is a technical condition meant to limit the number

of cases to be considered. All results hold substantially when this inequality is relaxed. Finally, we

assume that the highest feasible intensity of violence, denoted L, satisfies L := 1− v(i)− V2(i).12

The analysis proceeds in three steps. First, we consider the optimal purge breadth for exogenous

levels of violence and uncover the ‘pool size’ and ‘pool make-up’ effects. Then, we examine the

conflicting effects of violence on effort and selection, which we term the love-fear trade-off. Finally,

we characterize the optimal solution to the love-fear dilemma and how it is affected by underlying

fundamentals.

11Alternatively, we could define two subsets of party members ξ0 and ξ1 who always exert (respectively) effort 0

and 1, implying that there is no out-of-equilibrium event of measure 0.12This assumption guarantees that v(i) + V2(i) + L ≤ 1 so there is no corner solution in effort (which only

complicates the analysis). A party member cannot work all the time for a long period of time so his effort is

naturally bounded above by the number of hours available in a day.

10

4 Purge Breadth

Due to our equilibrium refinement, there is no equilibrium in which agents exert zero effort.13

When they exert effort, party members endogenously sort into pools of failure and success, which

constitute, with the proportion of ideologues among existing party members (λ) and potential

replacements (ri), the only information available to the autocrat at the time of her purging decision.

However, given that ideologues receive an intrinsic benefit from a successful project, they always

have more incentive to exert effort than opportunists and are more likely to belong to the success

pool (i.e., the single-crossing condition holds). Consequently, the autocrat always first targets

unsuccessful party members.

When choosing their effort, party members take into account both the intrinsic value of success

and the risk of being purged after failure or success. As success on a project provides full or partial

inoculation from a purge, all party members have incentive to exert effort in order to survive

the purge. The benefit from a successful project, however, depends on the relative probability

of being purged when in the success and failure pools, which we refer to as the (success/failure)

pool incidences. These pool incidences can take three qualitatively distinct forms which determine

the nature of the purge. When only a portion of the failure pool is purged, we say that the

purge is “partially discriminate.” When the entire failure pool is purged, we label the purge “fully

discriminate.” Finally, when even some successful members are purged, we use the qualifier “semi-

indiscriminate.” In turn, the purge incidence faced by a member m is a function of two factors:

the breadth of the purge and the efforts of other party members.

Suppose that the purge breadth and all other efforts were exogenous. How would a party

member respond to either quantity? A party member’s effort depends critically on the nature of

the purge. In a partially or fully discriminate purge, the payoff from belonging to the success

pool is large: a successful party member obtains his flow payoff and is inoculated against the

purge. As such, a party member exerts high effort when anticipating a discriminate purge. In a

semi-indiscriminate purge, the benefit from belonging to the success pool is relatively low. Even if

successful, there is a risk a party member’s effort is wasted as he may be purged anyway.

13Absent our restrictions, there would exist an equilibrium in which all party members exert 0 effort and the

autocrat would purge with probability 1 a successful party members. This equilibrium would only be sustained by

the arguably unreasonable out-of-equilibrium belief that a successful member is likely to be an opportunist even

though only ideologues are intrinsically motivated to exert effort.

11

In turn, holding the purge breadth constant, a change in other party members’ level of activity

affects a party member’s incentive to exert effort by altering the relative size of the success and

failure pools. This is the pool size effect, which we say is positive when a member’s effort increases

with other members’ level of activity and negative otherwise. In a discriminate purge, higher

activism by other members reduce the size of the failure pool, which increases the failure pool

incidence and therefore encourages effort. Thus, in a discriminate purge, efforts by party members

are strategic complements and the pool size effect is positive. In contrast, in a semi-indiscriminate

purge, efforts are strategic substitute and the pool size effect is negative. Increased level of activity

by other members depresses a member’s effort because the benefit of being in the success pool is

reduced as the failure pool becomes thinner and the success pool incidence increases. Finally, notice

that the nature of the purge depends on all party members’ effort. As the failure pool becomes

too thin relative to the breadth of the purge (i.e., 1 − e1 < κ), a discriminate purge becomes

semi-discriminate.

Lemma 1 summarizes the reasoning above noting that in a discriminate purge, a party member

considers the flow payoff from a successful project (v(τ)) as well as the expected loss from being

purged ( κ1−e1 (V2(τ) +L)), whereas in a semi-indiscriminate, he takes into account the benefit from

success (v(τ) + V2(τ) + L) weighted by the probability of surviving the purge (1− κ−(1−e1)e1

).

Lemma 1. A type τ ∈ {i, o} party member m chooses effort:

em1 (τ) =

v(τ) + κ1−e1 (V2(τ) + L) if 1− e1 ≥ κ(

1− κ−(1−e1)e1

)(v(τ) + V2(τ) + L) if 1− e1 < κ

, (6)

As explained above, in a discriminate purge, party members’ efforts are strategic complement,

whereas they are strategic substitute in a semi-indiscriminate purge. These effects, however, are

secondary compared to the direct impact of the purge breadth. Consequently, the nature of the

purge (discriminate or semi-indiscriminate) is determined solely by the purge breadth and the

intensity of violence, and is thus fully in the autocrat’s hands.

Lemma 2. A purge is semi-indiscriminate if and only if κ > 1− v − V2 − L := κ(L).

When deciding whom to purge, the autocrat observes only the outcome of a party members’

project, and forms a posterior that a party member is an ideologue based on success (denoted

µS(e1)) or failure (µF (e1)). The autocrat’s posteriors incorporate her conjectures (correct in equi-

librium) of the different levels of effort exerted by ideologues and opportunists. Due to ideo-

logues’ intrinsic motivation, a successful project is a positive signal of ideological alignment, so

12

µF (e1) < λ < µS(e1). However, this signal is never perfect: ideologues sometimes fail and op-

portunist sometimes succeed. Consequently, in any purge, some ideologues are purged and some

opportunists survive.

Consider the autocrat’s incentives to purge a member after observing his project is unsuccessful.

Recall that W2(τ) is the autocrat’s second period expected payoff induced by a type τ ∈ {i, o}

party member. If the autocrat retains the party member after failure, her expected payoff is:

µF (e1)W2(i) + (1− µF (e1))W2(o). Since the proportion of ideologues in the replacement pool is ri,

the autocrat’s payoff from purging an unsuccessful party member is: riW2(i) + (1− ri)W2(o). The

autocrat’s expected benefit from purging a member after failure is thus:

WF = [ri − µF (e1)]Di,o2 , (7)

By a similar reasoning, the expected benefit from purging a successful party member is:

WS = [ri − µS(e1)]Di,o2 (8)

The autocrat’s incentive to purge thus depends critically on the informativeness of her posteriors,

which are a function of party members’ endogenous sorting into the success and failure pools. Thus,

in addition to the pool size effect, resulting from the interdependence between party members’

efforts, there exists a pool makeup effect which captures how changes in party members’ efforts

affect the autocrat’s learning. We say that the pool makeup effect is positive when the target pool

becomes more tainted and the autocrat’s incentive to purge increases. As party members’ effort

depends on the purge breadth, the pool makeup effect is a function of κ and the nature of the purge.

To make sense of it, we first examine how an exogenous increase in the purge breadth affects the

autocrat’s relevant posteriors.

Anticipating a discriminate purge (relatively low κ), both ideologues and opportunists increase

their effort and exit the failure pool in response to higher purge threat. Ideologues, however, have

more to lose from being purged and so increase their effort relatively more. Since ideologues are

also less likely to fail to start with, they exit the failure pool at a higher rate, and the target pool

becomes more tainted. The pool makeup effect is thus positive.

In a semi-indiscriminate purge (relatively high κ), the autocrat’s benefit from purging a party

member is determined at the margin by the informativeness of success (see (8)). The sign of the pool

makeup effect is thus a function of the differential exit rate between ideologues and opportunists

from the success pool (since an increase in κ decreases effort). Ideologues are more responsive to

13

a change in purge breadth, but (relatively) more likely to belong to the success pool before it. In

our set-up, the two effects compensate each other because party members’ cost of effort exhibits

constant elasticity. Both types exit the success pool at the same rate, resulting in a null pool

makeup effect.

In equilibrium, the purge breadth, pool size and pool makeup effects are jointly determined.

However, because all players correctly anticipate each other’s strategy, we can simply compare the

marginal cost of purging an additional number given by c′(κ) = c0 + c1κ with the marginal benefit

(determined by (7) and (8)) which incorporates the pool size and makeup effects. Recall that

κ(L) = 1− v − V2 − L, we obtain the following lemma.

Lemma 3. There exists a unique equilibrium purge breadth κ∗(L). Further, if at κ = κ(L),

(i) c0 + c1κ >WF , the purge is partially discriminate;

(i) WS ≤ c0 + c1κ ≤ WF , the purge is fully discriminate;

(ii) c0 + c1κ <WS, the purge is semi-indiscriminate.

Lemma 3 indicates that, somewhat unsurprisingly, the autocrat chooses a partially discriminate

purge if the cost of carrying a fully discriminate purge is too high (point (i)). In turn, she prefers

a semi-discriminate purge whenever the cost is sufficiently low (point (iii)). In all other cases, the

purge is fully discriminate (point (ii)).14 Notice that since the success pool has a greater proportion

of ideologues (i.e., µS(e1) > µF (e1)), the benefit of purging a successful party member is strictly

lower than the benefit of purging a failed one (WS < WF ) and a fully discriminate purge is not

a knife-edge result. Further, since a party member’s success is always an (imperfect) signal of

ideological congruence with the autocrat, Lemma 3 implies that semi-indiscriminate purges can

occur only if the replacement pool is better on average than the current party members.

Corollary 1. If ri > λ, there exists a non-measure zero set of parameter values such that the

equilibrium purge is semi-indiscriminate.

14Due to the strategic complementary, in a partially discriminate purge, a member’s effort is convex in the

(anticipated) purge breadth. The pool makeup effect is thus increasing and so is the marginal benefit of purging.

The marginal benefit WF may thus intersect the marginal cost more than once in the range [0, κ(L)]. Despite this

complication, points (i) and (ii) hold because the autocrat always prefers the highest feasible purge breadth to take

advantage of the higher marginal benefit from purging.

14

5 The love-fear tradeoff

In the previous analysis, we fixed the intensity of violence. Here, we examine how changes in L

affect the autocrat’s and party members’ strategies and reserve the analysis of the optimal intensity

of violence to the next section.

We first consider the effect of greater intensity of violence on the performance of the party in

the first period, what we term the fear effect which is positive whenever an increase in L increases

average effort. The next proposition establishes that the fear effect is always positive. As the cost

of being purged increases, all party members have greater incentives to exert effort.

Fear, however, motivates differentially ideologues and opportunists since any change in efforts

triggers the pool size and makeup effects identified above and thus affects the purge breadth. While

the cost of being purged is similar for all types, the indirect pool effects on the probability a member

survives are weighted by a member’s second-period payoff, greater for ideologues. Consequently,

the strength of the fear effect depends on the nature of the purge.

In a discriminate purge, the greater level of activity caused by higher L generates a positive

pool size effect. As ideologues are more responsive to the pool size effect, they exit the failure

pool faster than opportunists producing a positive make-up effect. The autocrat then purges more

failures. Since greater purge breadth is also conducive to more effort, all effects go in the same

direction. Equilibrium effort by all party members increases at a (relatively) high rate with the

intensity of violence.

When the purge is semi-discriminate, the pool size effect is negative (i.e., party members’ efforts

are strategic substitutes) so greater activism depresses the incentives to exert effort, especially for

ideologues. Consequently, ideologues increase their effort less than opportunists, and the success

pool becomes more tainted. The pool makeup effect is thus positive, and the purge breadth in-

creases, further reducing the benefit of effort. The equilibrium effort thus increases at a (relatively)

low rate with the intensity of violence.15

The fear effect is summarized in Proposition 1 and illustrated in Figure 2a below.

15Importantly, the indirect pool effects only reduce the positive impact of greater intensity of violence on effort in

a semi-indiscriminate purge. To see that, suppose to the contrary that the fear effect is negative so average effort

decreases with L. Since the pool size effect is negative, members have greater incentive to exert effort, especially

ideologues. This would lead to a negative pool makeup effect (the autocrat has less incentives to purge the success

pool), reducing the proportion of successful members being purged and increasing the value of success. Consequently,

all members would increase their effort, contradicting the assumption of decreased average effort.

15

Proposition 1. The first-period equilibrium average level of effort increases with the intensity of

violence. Average effort increases at a faster rate in a fully discriminate purge than in a semi-

indiscriminate purge.

Our next result establishes that the nature of the purge, but not the breadth is determined by

the intensity of violence. In a partially discriminate purge, as L increases, the failure pool becomes

more tainted and this positive pool make-up effect leads to an increase in purge breadth. But

greater breadth increases effort and reduces the failure pool. As L continues to increase, the failure

pool becomes so thin that all failures are purged: the purge becomes fully discriminate for intensity

of violence above some Lfull. In a fully discriminate purge, although the failure pool becomes more

tainted, the pool makeup effect has no effect on the breadth as the entire pool is already being

purged. The only effect remaining is the fear effect which decreases the size of the failure pool and

thus the purge breadth.

As violence increases further, two effects lead a purge to move from fully discriminate to semi-

indiscriminate. First, as relatively more opportunists join the success pool, the latter becomes more

tainted and purging successful party members is more attractive for the autocrat. Second, as the

failure pool shrinks and the breadth of the purge declines, the (marginal) cost of purging successful

members decreases. Once the purge becomes semi-indiscriminate (above some intensity Lind), the

positive pool make-up effect induced by greater intensity of violence again implies an increase in

purge breadth. Overall, the purge breadth is non-monotonic in violence because of the coarseness

of the information available to the autocrat.16

Proposition 2 summarizes these findings and Figure 1 illustrates them.

Proposition 2. There exist unique Lfull < L and Lind ∈ (Lfull, L] such that:

(i) For L < Lfull, the purge is partially discriminate and breadth strictly increases with violence;

(ii) For L ∈ [Lfull, Lind], the purge is fully discriminate and breadth strictly decreases with violence;

(iii) For L > Lind, the purge is semi-indiscriminate and breadth strictly increases with violence.

Our theory thus predicts that violent purges (L > Lind) are semi-indiscriminate, whereas (rel-

atively) mild purges are discriminate. This prediction seems in line with historical evidence. As

described by Teiwes (1993, 25-27), Chinese rectification campaigns were characterized by low inten-

sity of violence and by a high level of predictability and so resemble discriminate purge. In contrast,

16This non-monotonicity result would hold in any setting in which the autocrat’s information is discrete (but not

necessarily binary).

16

Figure 1: Equilibrium purge breadth and intensity of violence

Parameter values: λ = 1/3, ri = 2/3, R = 0, b = 1/4, β = 0.9, c0 = 0, c1 = 0.17.

during the purges of the thirties in USSR, the intensity of violence was high and the target of the

purges less delimited as “flouting commands court danger, but even enthusiastic compliance is no

guarantee of safety” (ibid., 25). Stalinist purge are thus examples of semi-indiscriminate purges

(we provide some reasons why it may have been so below).

The intensity of violence, however, does not determine the purge breadth due to the non-

monotonous relationship uncovered in Proposition 2. This appears again to correspond to historical

facts. Despite the differences in intensity of violence and nature, Soviet and Chinese purges had

similar breadth. During the 1930s Stalinist purges, the proportion of purged members varied from

5% in 1930, 1931, and 1937 to 22% in 1933-34 (see Table 1 in Appendix A for more details). In

turn, the expulsion rate in rectification campaigns in China fluctuated between 9% in 1957-58 and

23% in 1947-48 (see Table 2).17

Having examined the effect of violence on effort and the nature and breadth of the purge, we now

consider its impact on selection. Observe that since a new party member is always better on average

(from the autocrat’s perspective) than a purged party member, the second-period ideological unity

(defined by the proportion of ideologues) of the party is always greater following the purge. An

increase in the intensity of violence, however, affects the selection benefit for the autocrat. This is

what we refer to as the love effect, which is positive (resp. negative) if greater violence improves

(worsens) selection. The next proposition establishes conditions under which the love effect is

negative for the survivors of the purges and for all party members, Figure 2b illustrates this result.

17Given the differences in size and population, our preferred measure is the proportion of party members purged.

The number of expelled was always much larger in China.

17

Proposition 3.

(i) The proportion of ideologues among surviving members of the purge strictly increases with L if

and only if L < Lfull, and strictly decreases otherwise.

(ii) The proportion of ideologues in the party in the second period weakly increases with L for all

L if and only if λ ≥ ri.

(iii) If ri ∈ (λ, 2λ], the proportion of ideologues in the party in the second period strictly increases

with L for L < Lfull and strictly decreases otherwise.

The first part of the proposition highlights that the autocrat’s ability to screen party members

decreases with L unless the purge is partially discriminate. For L < Lfull, as the intensity of

violence increases, more party members exit the failure pool and more failures are purged. Among

surviving members, a greater proportion of party members thus belongs to the success pool, which

is of higher quality than the failure pool. The love effect is then positive. When the purge is fully

discriminate or semi-discriminate, surviving members all belong to the success pool. Screening

then worsens because the success pool becomes more tainted as more opportunists enter the pool

relative to the stocks of both types. The love effect is then necessarily negative.

Even though purged party members are replaced by (in expectation) more ideological members,

the love effect can still be negative when it comes to (second-period) party membership. Maybe

surprisingly, the love effect is always negative for high enough intensity of violence (L ≥ Lfull) when

the replacement pool is better than the existing pool of party members (Proposition 3(ii)). This

occurs because in a fully discriminate purge, a lower proportion of party members are replaced

by better new party member (Proposition 2(ii)). A decreasing purge breadth, however, is not

necessary to produce a negative love effect for second-period party membership. Indeed, whenever

the difference between the replacement pool and existing party numbers is not too large (ri ≤ 2λ, a

sufficient condition), the deterioration of the pool of survivors discussed above dominates increased

replacement, and the love effect is again negative.

This section therefore establishes that under certain circumstances (such as a better replace-

ment pool), the autocrat faces a trade-off between fear (better first-period performance) and love

(lower second-period ideological unity), not too dissimilar from the dilemma identified long ago

by Machiavelli (2005, Chapter 17). While there is little historical evidence on the effect of purges

on selection, the trade-off we identify might explain why Stalin, who understood that “saboteurs

18

(a) Fear effect (b) Love effect

In Figure 2b, the plain line corresponds to proportion of ideologues among all party members in period 2, the dashed

line to the proportion of ideologues among survivors of the purge. Parameter values: λ = 1/3, ri = 2/3, R = 0,

b = 1/4, β = 0.9, c0 = 0, c1 = 0.17.

disguise themselves by over-fulfilling the plan” (cited in Dallin and Breslauer, 1970, 57), allegedly

asserted that it is better to induce loyalty by fear than by conviction.18

6 Intensity of violence

In this section, we consider how the autocrat’s choice of violence balances the (positive) fear and

(potentially negative) love effects. When L is low (L < Lfull), the love effect is positive. Hence,

the autocrat has strong incentive to increase the intensity of violence. A purge, therefore, will be

partially discriminate only if the cost of investing in the security apparatus is very large. When L is

large (L > Lind), the love effect is negative. Further, the fear effect is relatively small (Proposition

1). A purge is thus semi-indiscriminate only if the cost of investing in the security apparatus is

low.19

18Dallin and Breslauer (1970, 42 footnote 37) reproduces an anecdote circulating among Moscow party members

in 1931. “Yagoda was alleged to have asked Stalin: ‘Which would you prefer Comrade Stalin: that party members

should be loyal to you from conviction or from fear?’ Stalin is alleged to have replied: ‘From fear.’ Whereupon

Yagoda asked, ‘Why?’ To which Stalin replied: ‘Because convictions can change: fear remains.”’19At L = Lfull, by definition κ(L) = 1− e1. Effort thus satisfies em1 (τ) = v(τ) + V2(τ) + L (see Lemma 1). The

posterior µF = λ1−em1 (i)1−e1 thus depends only on model parameters and L. For L > Lfull, the posterior µS does not

depend on κ (see the discussion prior to Lenna 3). Hence, the quantities used in the text of Proposition 4 only

depend on the intensity of violence and model parameters.

19

Proposition 4. There exists a (almost always) unique equilibrium intensity of violence L∗. Further,

(i) If at L = Lfull, ζ0 + ζ1L ≥ 1 + βDi,o(λ− µF (e1)), then L∗ ≤ Lfull;

(ii) If at L = Lind, ζ0 + ζ1L <12

(1 + βDi,o (λ−µS(e1))

c1(v+V2+L)

)+ βDi,o(λ− µS(e1)), then L∗ > Lind.

For low L, point (i) establishes formally that the fear effect (equals to 1) and the love effect

(equals to βDi,o(λ− µF )) are both positive. In contrast, for high intensity of violence (point (ii)),

the fear effect, while still positive, is relatively low (12(1 + βDi,o (λ−µS)

c1(v+V2+L))) and the love effect is

negative (βDi,o(λ− µS) < 0).

Given the tradeoff between love and fear for high intensity of violence, it remains to determine

under which conditions a semi-discriminate purge occurs. The next corollary lists three necessary

and sufficient conditions. As explained above, the replacement pool must be sufficiently good

quality, and in particular better on average than existing party members (condition 1.). Further,

the cost of purging (c0, c1) must be sufficiently low to compensate for the relatively low marginal

benefit of purging a successful party member (condition 2.). Finally, in line with Proposition 4,

the cost of investing in the security apparatus must also be relatively small.20

Corollary 2. A purge is semi-indiscriminate if and only if:

1. The proportion of ideologues in the replacement pool ri is strictly higher than some ri ≥ λ;

2. The cost parameters c0 and c1 are respectively strictly below some c0(ri) and c1(ri, c0);

3. The cost parameters ζ0 and ζ1 are respectively strictly below some ζ0(ri) and ζ1(ri, ζ0).

As noted above, the Stalinist purges of the thirties resembled semi-indiscriminate purges. His-

torical evidence do not permit to evaluate whether the conditions of the corollary were satisfied.

Interestingly, however, they strongly suggest that the pool of candidates to the CPSU was markedly

different than existing party members. While the 1920s were marked by a divide between ideo-

logically committed and technically proficient officials, starting in 1928, Stalin took great interest

in “training a new generation of cadres that would be both Red and experts” (Fitzpatrick, 1979,

382). Around 170,000 of these new cadres graduated between 1928-32 and 370,000 between 1933

and 1938 (ibid, 398).21 These cadres were among the main beneficiaries of the purges of the 1930s.

While their relative productivity compared to old cadres is still debated (e.g., Dallin and Breslauer,

20The thresholds described in Corollary 2 depend on all (other) parameter values. We have reduced notation to

facilitate the exposition.21Brzezinski (1956, 90-91) puts the total number of engineers emerging from the Stalinist state schools between

1933 and 1938 at 1 million.

20

1970, 37), there are clear evidence that the new cadres were more loyal to Stalin (Wolton, 2015,

267-68).22

Observe that Proposition 4 does not characterize the intensity of violence when conditions (i)

and (ii) do not hold. Due to the positive fear and love effects as well as the complementarity in

party members’ effort in a partially discriminate purge, the marginal benefit of violence is strictly

convex for L ≤ Lfull. This means that the marginal benefit may intersect the marginal cost more

than once, and the autocrat must choose between the lowest intersection and L = Lfull.23 This

implies that predicting the optimal intensity of violence is difficult even though it is generically

unique for the autocrat. Further, small changes in the underlying fundamentals can be associated

with a large increase in the intensity of violence.

Proposition 5. There exists a non-measure zero set of parameter values Pd such that if (λ, ri, b, c0, ζ0) ∈

Pd, there exists cd1 and ζd1 satisfying limc1↑cd1

L∗ < limc1↓cd1

L∗ and limζ1↑ζd1

L∗ < limζ1↓ζd1

L∗.

The unpredictability in violence outlined in Proposition 5 also implies that the purge breadth

may also seem random (by Proposition 2). While there is little ways to test this prediction, it

should be noted that there was great variation in the number of party members affected by mass

purges both in USSR and Communist China (see Tables 1 and 2 in Appendix A).

7 Mass Purges and Top-Down Accountability in Autocra-

cies

In our theoretical framework, mass purges are a salient method of top-down accountability in

autocracy. In totalitarian regimes, the political and professional responsibilites of party members

are fused and “[f]ailure in the latter thus automatically becomes a case of political accountability”

(Brzezinski, 1956, 86). The autocrat’s top-down accountability problem thus bears similarities

with bottom-up accountability where voters’ evaluation of their representative based on economic

indicators (for example, Besley, 2007, chapter 4).

There exists, however, a fundamental difference between the autocrat and voters’ accountabil-

ity problems. While many voters hold a single representative accountable so one is accountable to

22Notice that all our results hold as long as the replacement pool is expected to be more active (on average) than

existing party members. It is of little consequence whether the difference is due to greater loyalty to the leader,

greater productivity, or both.23The second intersection being a local minimum.

21

many, the autocrat faces a mass of party members so many are accountable to one. Our analysis

highlights forces specific to many-to-one accountability. In our setting, there is no team production

problem and no yardstick competition as in a tournament, party members work on independent

projects. Nonetheless, their fate is linked through the purge. This generates strategic interdepen-

dence in party members’ activism—the pool size effect. In turn, the pool size effect changes the

autocrat’s inference problem as her ability to identify an agent’s type depends on the behavior

of all party members—the pool makeup effect. Consequently, top-down accountability cannot be

apprehended by studying a single or few agents in isolation; researchers must take a comprehensive

perspective. Top-down accountability is, by definition almost, a large N problem.

While the many-to-one feature we identify is not unique to top-down accountability in autoc-

racy, the critical role of violence due to the autocrat’s monopoly in the political and judicial areas

is. We show that the intensity of violence determines the nature of the purge, but not necessarily

its breadth (Proposition 2). Further, even though the autocrat faces little constraint on her ac-

tions, due to party members’ strategic response, she cannot escape a trade-off between fear (higher

performance) and love (less unity). This trade-off implies that a rational autocrat does not use

violence bluntly. We should observe greater violence when the need to mobilize the autocrat’s

agents is high,24 and more restraint when the autocrat is interested in selection.

This result may also explain why the Fascist and Nazi regimes used purges parsimoniously

(as documented above) despite opportunists entering the parties (Orlow, 1969, 45-50, Morgan,

2012, 312). In both countries, the party only played a secondary role. “[T]he Nazi party at no

point attained the effective supremacy over the administration of the State which the Soviet party

established in the early year of the Bolshevik revolution” (Unger, 1965, 459). As Orlow (1969,

176) describes, “party life [in the NSDAP] in early 1936 was, in a word, boring.” In Italy, the

Fascist Party legally became a subordinate to the State after the Gran Consiglio of 1928 (Gentile,

1984). Relatedly, right-wing totalitarian regimes resorted to top-down accountability in the form

of purges only when party members were given a more active role in politics. In Italy in 1931, the

purge was concomitant with a conflict over the rights of the Catholic Church (Morgan, 2012). In

Germany in 1938, the purge occurred when the NSDAP acquired a greater role in judicial, church,

and economic affairs after the invasion of Austria (Orlow, 1969, 241-2).

24This particular comparative statics is in line with historical evidence on Chinese rectification campaigns (Teiwes,

1993).

22

8 Conclusion

This paper provides a theoretical framework to understand mass purges as a method of top-down

accountability in autocracy. Our theory highlights that the nature of top-down accountability is

often many-to-one with a mass of agents accountable to a single principal. The fate of all party

members is linked through the purge generating new strategic interdependency which affects both

agents’ efforts (pool size effect) and principal’s learning (pool makeup effect).

The many-to-one nature of top-down accountability is not limited to autocracy. Many hierar-

chical institutions share this feature. In large firms, in the military, or the bureaucracy, a principal

is faced with a mass of agents whose individual efforts are difficult to ascertain. The principal’s

methods of accountability there takes the form of mass layoffs or performance based up or out

promotion standards, which are not too dissimilar from purges. The approach developed in this

paper has thus a large range of applications for the study of public and private institutions.

Autocracies, however, are specific in their use of violence as the autocrat faces little restraints

on her actions due to her monopoly in the political and judicial areas. Even though the autocrat

faces little constraint on her actions, due to party members’ strategic response, she cannot escape

a trade-off between fear (more effort) and love (less unity).

We, however, focus exclusively on one particular form of violence, mass purges. Researchers

would do well to consider other means available to the autocrat and explore why mass purges have

been used by only a limited number of autocrats (Lenin, Stalin, Mao, Mussolini, and Hitler). Other

important questions include: the effectiveness of mass purges at achieving other objectives of the

autocrat such as rooting out corruption; their dynamic consequences; and their use in conjunction

with competitions for power and resources.

23

References

Acemoglu, Daron and Alexander Wolitzky, “The economics of labor coercion,” Econometrica,

2011, 79 (2), 555–600.

, Georgy Egorov, and Konstantin Sonin, “Coalition formation in non-democracies,” The

Review of Economic Studies, 2008, 75 (4), 987–1009.

, , and , “Do Juntas Lead to Personal Rule?,” The American Economic Review: Papers &

Proceedings, 2009, 99, 298–303.

Arendt, Hannah, The origins of totalitarianism, Vol. 244, Houghton Mifflin Harcourt, 1973.

Ashworth, Scott, “Electoral accountability: recent theoretical and empirical work,” Annual Re-

view of Political Science, 2012, 15, 183–201.

Bernhardt, Dan and Steeve Mongrain, “The Layoff Rat Race,” The Scandinavian Journal of

Economics, 2010, 112 (1), 185–210.

Besley, Timothy, “Principled agents?: The political economy of good government,” OUP Cata-

logue, 2007.

Bloch, Francis and Vijayendra Rao, “Terror as a bargaining instrument: A case study of

dowry violence in rural India,” The American Economic Review, 2002, 92 (4), 1029–1043.

Bo, Ernesto Dal, Pedro Dal Bo, and Rafael Di Tella, “Plata o Plomo?: Bribe and Pun-

ishment in a Theory of Political Influence,” American Political Science Review, 2006, 100 (01),

41–53.

Brzezinski, Zbigniew K, The permanent purge, Harvard University Press Cambridge, MA, 1956.

Dallin, Alexander and George W Breslauer, Political terror in communist systems, Stanford

University Press, 1970.

de Mesquita, Bruce Bueno and Alastair Smith, “Political Succession: A Model of Coups,

Revolution,Purges and Everyday Politics,” 2015.

Dewan, Torun and David P Myatt, “Scandal, protection, and recovery in the cabinet,” Amer-

ican Political Science Review, 2007, 101 (01), 63–77.

24

and , “The declining talent pool of government,” American Journal of Political Science, 2010,

54 (2), 267–286.

Economist, The, “No ordinary Zhou,” 2014.

Egorov, Georgy and Konstantin Sonin, “Dictators And Their Viziers: Endogenizing The

Loyalty–Competence Trade-Off,” Journal of the European Economic Association, 2011, 9 (5),

903–930.

and , “The Killing Game: Reputation and knowledge in politics of succession,” Research in

Economics, 2015, Forthcoming.

, Sergei Guriev, and Konstantin Sonin, “Why resource-poor dictators allow freer media:

A theory and evidence from panel data,” American Political Science Review, 2009, 103 (04),

645–668.

Fitzpatrick, Sheila, “Stalin and the Making of a New Elite, 1928-1939,” Slavic Review, 1979, 38

(3), 377–402.

Gailmard, Sean and John W Patty, “Formal models of bureaucracy,” Annual Review of

Political Science, 2012, 15, 353–377.

Gehlbach, Scott and Alberto Simpser, “Electoral manipulation as bureaucratic control,”

American Journal of Political Science, 2015, 59 (1), 212–224.

Gentile, Emilio, “The problem of the party in Italian Fascism,” Journal of Contemporary History,

1984, 19 (2), 251–274.

Getty, John Arch, Origins of the great purges: the Soviet Communist Party reconsidered, 1933-

1938, Vol. 43, Cambridge University Press, 1987.

Gregory, Paul R, Philipp JH Schroder, and Konstantin Sonin, “Rational dictators and

the killing of innocents: Data from Stalins archives,” Journal of Comparative Economics, 2011,

39 (1), 34–42.

Lorentzen, Peter, “China’s Strategic Censorship,” American Journal of Political Science, 2014,

58 (2), 402–414.

25

and Xi Lu, “Factional Struggle or Efforts in Saving the Party: Evidence from AntiCorruption

Campaign in China,” MPSA Conference 2016, 2016.

Machiavelli, Niccolo, The prince, OUP Oxford, 2005.

Mesquita, Bruce Bueno De and Alastair Smith, The dictator’s handbook: why bad behavior

is almost always good politics, PublicAffairs, 2011.

Morgan, Philip, “The Trash Who are Obstacles in Our Way: the Italian Fascist Party at the

Point of Totalitarian Lift Off, 1930–31,” The English Historical Review, 2012, 127 (525), 303–344.

Orlow, Dietrich, The history of the Nazi party: 1933-1945, Vol. 2, [Pittsburgh]: University of

Pittsburgh Press, 1969.

Rigby, Thomas Henry, “Communist Party membership in the USSR,” Princeton: Princeton

University Press, 1968, 178, 369–491.

Schapiro, Leonard Bertram, The government and politics of the Soviet Union, Taylor & Francis,

1977.

Svolik, Milan W, “Power sharing and leadership dynamics in authoritarian regimes,” American

Journal of Political Science, 2009, 53 (2), 477–494.

Teiwes, Frederick C, Politics and Purges in China: Rectification and the Decline of Party Norms,

1950-1965, ME Sharpe, 1993.

Unger, Aryeh L, “Party and state in soviet Russia and Nazi Germany,” The Political Quarterly,

1965, 36 (4), 441–459.

Weinberg, Robert, “Purge and Politics in the Periphery: Birobidzhan in 1937,” Slavic review,

1993, 52 (1), 13–27.

Wolton, Thierry, Histoire mondiale du communisme, tome 1: Les bourreaux, Grasset, 2015.

Zakharov, Alexei V, “The Loyalty-Competence Trade-Off in Dictatorships and Outside Options

for Subordinates,” The Journal of Politics, 2016, 78 (2), 457–466.

26

A Historical data

Table 1: Proportion of party members purged in USSR

Year Proportion Source

1919 10-15 Getty (1987, Table 2.1)-Rigby (1968, 76)

1921-23 25 Getty (1987, Table 2.1)-Rigby (1968, 97)

1924 3 Getty (1987, Table 2.1)

1925 4 Getty (1987, Table 2.1)

1926 3 Getty (1987, Table 2.1)

1927 6 Rigby (1968, 127)

1928 13 Getty (1987, Table 2.1)

1929 11 Getty (1987, Table 2.1)

1930 5 Getty (1987, Table 2.1)

1931 5 Getty (1987, Table 2.1)

1933-34 17-22 Getty (1987, 55)-Rigby (1968, 204)

1935 9-13 Getty (1987, Table 7.1)-Rigby (1968, 209)+

1936 10 Rigby (1968, 209)+

1937 5 Getty (1987, Table 7.1)

1951-53 5 Rigby (1968, 281)+

All proportions are approximation. + denotes authors’ calculation using Rigby (1968, 52)

.

27

Table 2: Proportion of party members purged in China

Year Proportion Source

1947-48 23 Teiwes (1993, 75)+

1951-54 10 Teiwes (1993, 110)

1957-58 9 Teiwes (1993, 268)

1959-60 20 Teiwes (1993, 339)

1964-65 10 Teiwes (1993, 425)

All proportions are approximation. + denotes authors’ calculation

B Proofs

As second-period actions are subsumed in V2(τ) (for a party member) and W2(τ) for the autocrat,

we ignore the first-period time subscript in all the proofs.

Lemma 4. There is no equilibrium in which the autocrat purges a strictly positive proportion of

successful party members and no failed party members.

Proof. The proof proceeds by contradiction. Suppose there is such equilibrium. It can easily be

checked that in such equilibrium, opportunists exert no effort (they have no gain from effort and

success only increases their probability of being purged). Consider now ideologues. If they exert

strictly positive effort, then the success pool is composed only of ideologues, whereas the failure

pool has all opportunists. The autocrat would have a profitable deviation to purge first from the

failure pool and not purge the success pool, a contradiction. If ideologues exert no effort, then

all party members exert no effort. But due to our equilibrium restriction, there does not exist an

equilibrium in which no party member exerts effort (see the reasoning in the text). We have thus

reached a contradiction.

Corollary 3. There is no equilibrium in which a party member faces a higher probability of being

purged after success than after failure.

Proof. The proof proceeds by contradiction. Suppose there is such equilibrium. By Lemma 4, it

must be that the autocrat is indifferent between purging from the success and failure pools (if she

strictly prefers to purge from the success pool, the reasoning in Lemma 4 applies). But by a similar

reasoning as in the proof of Lemma 4, it can easily be checked that we reach a contradiction.

28

Proof of Lemma 1

Denote γ and ρ the probability that a party member is purged when his project is a failure and

a success, respectively. Since we have a continuum of party members, party member m takes γ

and ρ as given. When he anticipates the breads of the purge to be smaller than the number of

unsuccessful members (1− e ≤ κ), γ ≥ 0 and ρ = 0 so his maximization problem is:

maxe∈[0,1]

R + e(v(τ) + V2(τ)) + (1− e)[γ(−L) + (1− γ)(V2(τ))]− e2

2(9)

If his project is successful (probability e), he receives a flow payoff v(τ) and is not purged so he

also gets his second period expected payoff. If his project fails, he is purged with probability γ

and suffers a loss L. Taking the first order condition and replacing γ by κ/(1− e), we obtain that

em(τ) = v(τ) + κ1−e(V2(τ) + L).

When he anticipates the breads of the purge to be greater than the number of unsuccessful members

(1− e > κ), γ = 1 and ρ > 0 so his maximization problem is:

maxe∈[0,1]

R + e(1− ρ)(v(τ) + V2(τ)) + (eρ+ (1− e))(−L)− e2

2(10)

If his project is successful (probability e), party member m still faces a risk of being purged.

This occurs with probability ρ. If he is not purged, he receives a flow payoff v(τ) and his second

period expected payoff. If his project is unsuccessful, he is always purged and suffers a loss L.

Taking the first order condition and replacing ρ by (k − (1 − e))/e, we obtain that em(τ) =

1−κe

(v(τ) + V2(τ) + L).

For the next lemma, it is useful to denote Lm := 1−v2− V2 and κm(L) := (1−v)2

4(V2+L).

Lemma 5. All ideological (resp. opportunistic) party members exert the same level of effort.

If L ≤ Lm, there exists a unique feasible level of effort for all κ. Otherwise, there exists a unique

feasible level of effort for κ /∈ (1 − v − V2 − L, κm(L)), three feasible levels of effort for κ ∈

(1− v − V2 − L, κm(L)), and two feasible levels of effort at κ = 1− v − V2 − L and κ = κm(L).

Proof. Anticipating future notation, denote κ(L) := 1 − v − V2 − L. The proof proceeds in five

steps. First, we characterize the equation which solves for the feasible average level of effort. We

then show uniqueness when κ ≤ κ(L) for all L. Third, we show uniqueness for the case when

L ≤ Lm. Step 4 considers the case when L > Lm and κ > κ(L). Step 5 shows that there is no

variation in effort within types.

Step 1. Suppose 1−e ≥ κ. By Lemma 1, members’ efforts are determined by the following system

29

of equations (dropping subscripts and superscripts whenever possible):

e(i) =v(i) +κ

1− (λe(i) + (1− λ)e(o))(V2(i) + L) (11)

e(o) =v(o) +κ

1− (λe(i) + (1− λ)e(o))(V2(o) + L) (12)

Multiplying (11) by λ and (12) by 1− λ, summing the resulting equations, we obtain:

e = v +κ

1− e(V2 + L) (13)

Proceeding in similar steps for the case 1− e < κ, we obtain:

e =1− κe

(v + V2 + L) (14)

By (13) and (14), the average effort of party members is thus the solution to the equation: e1 =

H(e1), with

H(e) =

v + κ1−e(V2 + L) for e ≤ 1− κ

1−κe

(v + V2 + L) for e > 1− κ(15)

Notice that the function H(·) is increasing and convex for e1 ∈ [0, 1−κ) and decreasing and convex

for e ∈ (1− κ, 1].

Step 2. Suppose κ ≤ κ(L), then the solution e∗1(κ) is unique. Using the property of H(·), it can

be checked that there is no solution in (1 − κ, 1]. Given H(0) > 0, if the solution is not unique,

there must be at least three solutions. But by (13), e1 must be the solution to the quadratic

equation: e2 − (v + 1)e + v + κ(V2 + L) = 0, which has at most two distinct solutions. Hence, a

contradiction. The reasoning also implies that the equilibrium effort e∗ is the smallest solution to

e2−(v+1)e+v+κ(V2+L) = 0. In fact, both solutions are positive so if the highest solution—denoted

eh(κ, L)—is an equilibrium effort (i.e., eh ≤ 1−κ), then the smallest solution—denoted el(κ, L)—is

also an equilibrium effort. But this contradicts uniqueness. Hence, when κ ≤ κ(L), average effort

is unique and equals to e∗(κ, L) =(v+1)−

√∆(κ,L)

2, with ∆(κ, L) = (v + 1)2 − 4(κ(V2 + L) + v).

Step 3. Suppose now that κ > κ(L). Since H(·) is strictly decreasing on the interval (1 − κ, 1]

and H(1) = (1− κ)(v + V2 + L) < 1, there is a unique average effort satisfying e∗(κ, L) > 1− κ.

In [0, 1− κ], notice that there must be zero or two solutions since H(1− κ) = v + V2 + L > 1− κ

and H(0) > 0. Consider eh(κ, L) =(v+1)+

√∆(κ)

2the highest solution to e2 − (v + 1)e + v +

κ(V2 + L) = 0. eh(κ, L) is a fixed point of H(·) only if eh(κ, L) < 1 − κ. We now show that this

condition is never satisfied for L < Lm. Notice that eh(κ, L) is decreasing and concave in κ since

∂eh(κ, L)/∂κ ∝ −∆(κ, L)−1/2 and ∂2eh(κ, L)/∂κ2 ∝ −∆(κ, L)−3/2. In contrast, 1− κ is obviously

30

linearly decreasing in κ. Further notice that eh(0, L) = 1, eh(κ, L) is minimized at κ = κm(L) and

eh(κm(L), L) = v+12

< 1 − κm(L) if and only if 0 < 2(V2 + L) − (1 − v) ⇔ L > Lm. Given the

properties of eh(κ, L) and 1− κ, we thus obtain that for all L < Lm, eh(κ, L) > 1− κ and so there

is no solution to Equation 15 in [0, 1−κ]. Finally, it can be checked that at L = Lm, κm(L) = κ(L)

and el(κ(L), L) = eh(κ(L), L), hence the equilibrium effort is unique and equals to el(κ(L), L) if

κ = κ(L) and√

(1− κ)(v + V2 + L) if κ > κ(L).

To summarize, when L ≤ Lm, the average communication effort is unique and equals to:

e∗(κ, L) =

(v+1)−

√∆(κ,L)

2(with ∆(κ, L) = (v + 1)2 − 4(κ(V2 + L) + v)) if 1− κ ≥ v + V2 + L√

(1− κ)(v + V2 + L) if 1− κ < v + V2 + L

(16)

Step 4. Suppose now L > Lm. We show that eh(κ, L) < 1 − κ for all κ ∈ (κ(L), κm(L)). First,

it can be checked that at κ = κ(L), ehl (κ, L) = 1 − κ when L > Lm. To see this, notice that at

κ = 1− (v + V2 + L), ∆(κ) = (1− v)2 − 4(1− v)(V2 + L) + 4(V2 + L) = (1− v − 2(V2 + L))2 and

eh(κ, L) = (1+v)+|1−v−2(V2+L)|2

= v + V2 + L if L > Lm, which implies the claim by the properties

of eh(·) (see Step 3). This directly implies that el(κ, L) < 1 − κ on this interval. Therefore, there

are three feasible solutions when L > Lm and κ ∈ (κ(L), κm(L)), At κ = κ(L), eh converges to

v + V2 + L; at κ = κm(L), eh converges to el so there are two feasible levels of effort in these two

cases.

Step 5. While there might be multiple solutions to the average effort, a member’s effort is uniquely

defined for each average effort by (6). Hence, the claim holds.

In the case when L ≤ Lm, we get:

e∗(κ, L; τ) =

v(τ) + 2κ

1−v+√

∆(κ,L)(V2(τ) + L) if 1− κ ≥ v + V2 + L√

(1−κ)

(v+V2+L)(v(τ) + V2(τ) + L) if 1− κ < v + V2 + L

(17)

From the proof of Lemma 5, recall that eh(κ, L) (el(κ, L)) is the highest (smallest) solution

to e2 − (v + 1)e + v + κ(V2 + L) = 0. Further denote eind(κ, L) =√

1− κ√v + V2 + L. Denote

el(κ, L; τ), eh(κ, L; τ), eind(κ, L; τ) the associated effort a type τ ∈ {i, o} party member.

Proof of Lemma 2

Necessity follows directly from the proof of Lemma 5. When L ≤ Lm, sufficiency follows from the

proof of Lemma 5. When L > Lm, we claim:

31

Claim 1. When L > Lm and κ > κ(L), under the assumption, the unique equilibrium average

effort satisfies e∗(κ, L) = eind(κ, L) so 1− e∗(κ, L) < κ.

We prove the claim in the proof of Lemma 3.

Lemma 6. (i) If κ < κ(L), party members’ efforts are strictly increasing and convex in κ.

(ii) If κ > κ(L), party members’ efforts are strictly decreasing and convex in κ whenever e∗(κ, L) =

eind(κ, L).

Proof. By (17), a party member’s effort is continuous in 1 − κ. It is differentiable in κ for κ ∈

[0, κ(L)) and κ ∈ (κ(L), 1]. For κ < κ(L), we get using (16):

∂e∗(κ, L; τ)

∂κ=∂ κ

1−e∗(κ,L)

∂κ(V2(τ) + L)

=(1− e∗(κ, L)) + κ∂e

∗(κ,L)∂κ

(1− e∗(κ, L))2(V2(τ) + L) (18)

Notice that ∆(κ, L) = (v + 1)2 − 4(κ(V2 + L) + v) and so ∂∆(κ)/∂κ = −4(V2 + L) < 0, and by

(16), ∂e∗(κ,L)∂κ

> 0. This directly implies: ∂e∗(κ,L;τ)∂κ

> 0. To see the convexity, notice that:

∂2e∗(κ, L; τ)

∂κ2=∂2 κ

1−e∗(κ,L)

∂κ2(V2(τ) + L)

=∂2e∗(κ,L)

∂κ2 (1− e∗(κ, L)) + ∂e∗(κ,L)∂κ

(κ∂e∗(κ,L)∂κ

+ (1− e∗(κ, L))

(1− e∗(κ, L))3(V2(τ) + L) (19)

By (16), ∂2e∗(κ,L)∂κ2 =

− ∂2∆(κ,L)

∂κ2 ∆(κ,L)+ 12( ∂∆(κ,L)

∂κ )2

∆(κ,L)3/2 > 0 since ∂2∆(κ, L)/∂κ2 = 0. Therefore, ∂2e(κ,L;τ)∂κ2 > 0

as claimed.

Point (ii), under the assumption of the Lemma, the result follows from direct examination of

(17).

For the next lemma, we consider the effect of an exogenous increase in purge breadth on the

autocrat’s posterior after failure and success. While the autocrat does not observe the members’

effort, she correctly anticipates their effort in equilibrium. For ease of exposition, in the next

lemma, we characterize the property of the autocrat’s posterior assuming that the purge breadth

is exogenous, and the average efforts characterized in Lemma 5 are always feasible. We relax all

these assumptions and characterize the proper equilibrium conditions in subsequent lemmas and

propositions.

Lemma 7. (i) Suppose average effort is el(κ, L), the autocrat’s posterior after failure is strictly

decreasing and concave in purge breadth for κ < κm(L).

32

(ii) Suppose average effort is el(κ, L), the autocrat’s posterior after failure is strictly increasing in

purge breadth for κ < κm(L).

(ii) Suppose average effort is eind(κ, L), the autocrat’s posterior after success is constant in purge

breadth for κ > κ(L).

Proof. Point (i). By Bayes’ Rule, µF (e(κ, L)) = λ1−el(κ,L;i)

1−el(κ,L). The relevant comparative statics is

then (omitting superscript):

∂µF (el(κ, L))

∂κ=

λ

(1− el(κ, L))2

[− el(κ, L; i)

∂κ(1− λel(κ, L; i)− (1− λ)el(κ, L; o))

+ (1− el(κ, L; i))

(λ∂el(κ, L; i)

∂κ+ (1− λ)

∂el(κ, L; o)

∂κ

)]=λ(1− λ)(1− el(κ, L; i))(1− el(κ, L; o))

(1− el(κ, L))2

[∂el(κ,L;o)

∂κ

1− el(κ, L; o)−

∂el(κ,L;i)∂κ

1− el(κ, L; i)

](20)

By examination of (17, el(κ, L; i) > el(κ, L; o). By examination of Equation 18, ∂el(κ,L;i)∂κ

> ∂el(κ,L;o)∂κ

.

This directly implies:∂el(κ,L;o)

∂κ

1−el(κ,L;o)−

∂el(κ,L;i)∂κ

1−el(κ,L;i)< 0 and ∂µF (e(κ,L))

∂κ< 0 as claimed.

To see that the posterior is strictly concave in κ, notice that:

∂2µF (el(κ, L))

∂κ2∝[∂2el(κ, L; o)

∂κ2(1− el(κ, L; i))− ∂2el(κ, L; i)

∂κ2(1− el(κ, L; o))

](1− el(κ, L))

+ 2∂el(κ, L)

∂κ

[∂el(κ, L; o)

∂κ(1− el(κ, L; i))− ∂el(κ, L; i)

∂κ(1− el(κ, L; o))

](21)

By examination of (19), ∂2el(κ,L;i)∂κ2 > ∂2el(κ,L;o)

∂κ2 . This directly implies ∂2µF (el(κ,L))∂κ2 < 0 since both

terms in brackets in (21) are negative.

Point (ii). Using Point (i), we only need to consider the sign of∂eh(κ,L;o)

∂κ

1−eh(κ,L;o)−

∂eh(κ,L;i)∂κ

1−eh(κ,L;i). By ex-

amination of (17, we still obtain eh(κ, L; i) > eh(κ, L; o). Further, ∂eh(κ,L;τ)∂κ

has the same sign as∂ κ

1−v−√

∆(κ,L)

∂κ, with

∂ κ

1−v−√

∆(κ,L)

∂κ∝1− v −

√∆(κ, L) +

1

2∆(κ, L)−1/2(−4(V2 + L)κ)

∝(1− v)∆(κ, L)1/2 − (1− v)2 + 2κ(V2 + L),

using ∆(κ, L) = (1−v)2−4κ(V2+L). Notice that(

(1−v)∆(κ, L)1/2)2

= (1−v)4−4κ(1−v)2(V2+L)

and(

(1− v)2− 2κ(V2 +L))2

= (1− v)4− 4κ(1− v)2(V2 +L) + 4κ2(V2 +L)2. Hence,∂ κ

1−v−√

∆(κ,L)

∂κ<

0. By examination of 18 (with the proper substitution), this implies ∂eh(κ,L;i)∂κ

< ∂eh(κ,L;o)∂κ

< 0.

Consequently,∂eh(κ,L;o)

∂κ

1−eh(κ,L;o)−

∂eh(κ,L;i)∂κ

1−eh(κ,L;i)> 0 so µF (eh(κ, L)) is strictly increasing with κ.

33

Point (iii). By Bayes’ rule, the autocrat’s posterior after success is: µS(e∗(κ, L)) = λ eind(κ,L;i)

eind(κ,L). By

a similar reasoning as above, we obtain:

∂µS(eind(κ, L))

∂κ=

λ

eind(κ, L)2

[eind(κ, L; i)

∂κ(λeind(κ, L; i) + (1− λ)eind(κ, L; o))

− eind(κ, L; i)

(λ∂eind(κ, L; i)

∂κ+ (1− λ)

∂eind(κ, L; o)

∂κ

)]=λ(1− λ)eind(κ, L; i)eind(κ, L; o)

eind(κ, L)2

[∂eind(κ,L;i)

∂κ

eind(κ, L; i)−

∂eind(κ,L;o)∂κ

eind(κ, L; o)

](22)

From (17), it can be checked that∂eind(κ,L;τ)

∂κ

eind(κ,L;τ)=− 1

2(v(τ)+V2(τ)+L)√

(1−κ)

√(v+V2+L)√

(1−κ)(v(τ)+V2(τ)+L)√

(v+V2+L)

= − 12√

1−κ . Therefore, ∂µS(eind(κ,L))

∂κ=

0 as claimed.25

Recall κm(L) = (1−v)2

4(V2+L).

Lemma 8. For all κ′, κ′′ ∈ (0, κm(L))2, µF (eh(κ′, L)) < µF (el(κ′′, L)).

Proof. By definition of κm(L), eh(κm(L), L) = eh(κm(L), L) so µF (eh(κm(L), L)) = µF (el(κm(L), L)).

By Lemma 7, µF (eh(κ, L)) is strictly increasing in κ so µF (eh(κ, L)) < µF (eh(κm(L), L)) for all

κ < κm(L) and µF (el(κ, L)) is strictly decreasing in κ so µF (el(κ, L)) > µF (el(κm(L), L)) for all

κ < κm(L), which proves the claim.

Lemma 9. Fixing the purge breadth, µF (el(κ, L)) and µS(eind(κ, L)) are decreasing in L.

Proof. A similar reasoning as in Lemma 1 yields:

∂µF (el(κ, L))

∂L=λ(1− λ)(1− el(κ, L; i))(1− el(κ, L; o))

(1− el(κ, L))2

[∂el(κ,L;o)

∂L

1− el(κ, L; o)−

∂el(κ,L;i)∂L

1− el(κ, L; i)

](23)

Given the definition of el(κ, L), it can be checked that ∂el(κ,L;i)∂L

> ∂el(κ,L;o)∂L

. Hence, ∂µF (el(κ,L))∂L

< 0.

Regarding the autocrat’s posterior after success, a similar reasoning as in Lemma 1 yields:

∂µS(eind(κ, L))

∂L=λ(1− λ)eind(κ, L; i)eind(κ, L; o)

eind(κ, L)2

[∂eind(κ,L;i)

∂L

eind(κ, L; i)−

∂eind(κ,L;o)∂L

eind(κ, L; o)

](24)

Using the definition of eind(κ, L) and Lemma 5, we obtain ∂eind(κ,L;τ)∂L

=√

(1− κ)(v+V2+L)− 1

2(v(τ)+V2(τ)+L)

(v+V2+L)3/2 ,

τ ∈ {i, o}. Therefore,∂eind(κ,L;τ)

∂L

eind(κ,L;τ)= 1

v(τ)+V2(τ)+L− 1

2(v+V2+L). Since v(i) > v(o) and V2(i) > V2(o),

this directly implies:∂eind(κ,L;i)

∂L

eind(κ,L;i)−

∂eind(κ,L;o)∂L

eind(κ,L;o)< 0 and ∂µS(eind(κ,L))

∂L< 0 as claimed.

25The proof of Lemma 7 highlights the general computation. Under the condition of the lemma, it can easily be

checked that µS(eind(κ, L)) = λ

√1−κ v(i)+V2(i)+L√

v+V2+L√1−κ√v+V2+L

= λ v(i)+V2(i)+L

v+V2+L, which does not depend on κ.

34

For the proof of Lemma 3, denote P F (κ;Lκ, e(κ, L)) = κβ[ri−µF (e(κ, L))]Di,o−c(κ) (P S(κ;L, κ, e) =

κβ[ri − µS(e(κ, L))]Di,o − c(κ)) the autocrat’s gross expected payoff from purging κ unsuccessful

(successful) party members when intensity of violence is L, average effort is e(·), and anticipated

purge breadth by party members is denoted κ. Notice that while in the autocrat’s maximiza-

tion problem, we take the purge breadth anticipated by party members as given, it is correct in

equilibrium. In what follows, recall that Lm = 1−v2− V2 and κm(L) = (1−v)2

4(V2+L).

Proof of Lemma 3

The proof proceeds in two steps. First, we consider the case when L ≤ Lm so there is a unique

feasible equilibrium level of effort for all κ (Lemma 5). Second, we consider the case when L > Lm.

Step 1: L ≤ Lm. From the reasoning in the text, the autocrat’s marginal benefit from purging a suc-

cessful (unsuccessful) party member isWS = [ri−µS(e∗(κ, L))]Di,o (WF = [ri−µF (e∗(κ, L))]Di,o).

Further, minκWF > maxκWS since µS(e∗(κ, L)) > λ > µF (e∗(κ, L)) for all κ. So whenever

c0 + c1κ < WS at κ = κ(L), then the equilibrium purge breadth satisfies κ∗(L) > κ(L) and the

purge is semi-indiscriminate. Inversely, if c0 + c1κ ≥ WS, then the equilibrium purge breadth

satisfies κ∗(L) ≤ κ(L) and the purge is discriminate.

It remains to consider two cases at κ = κ(L): (a) c0 + c1κ >WF and (b) c0 + c1κ ≤ WF . In case

(a), since WF is convex in κ due to the convexity of µF (e1(κ, L)) (Lemma 9), it must be that for

allWF intersects c0 + c1κ at most once. The equilibrium purge breadth then is the unique solution

to c0 + c1κ =WF if it exists which satisfies κ∗(L) < κ(L) and 0 otherwise. The purge is partially

discriminate as claimed. In case (b), if c0 + c1κ ≤ WF for all κ ≤ κ(L), then κ∗(L) = κ(L). To

see that, observe that there exist κ1(L), κ2(L) ∈ (0, κ(L)) (ignoring corners which only complicate

the exposition) such that c0 + c1κj(L) = β[ri− µF (e(κj(L), L))]Di,o for j ∈ {1, 2}. There are three

possible purge breadths then: κ1(L), κ2(L), and κ(L) = 1 − e∗(κ, L). Notice that the autocrat’s

equilibrium welfare satisfies: P F (κ1(L);L, κ1(L), e∗) = κ1(L)β[ri−µF (e(κ1(L), L))]Di,o−(c0κ1(L)+

c12

(κ1(L))2)

= c12

(κ1(L))2 (using the FOC above). Therefore, P F (κ1(L);L, κ1(L), e∗(κ1(L), L)) <

P F (κ2(L);L, κ2(L), e∗(κ2(L), L)) < P F (κ(L);L, κ(L), e∗(κ(L), L)). Given our equilibrium selec-

tion, the equilibrium purge breadth is κ∗(L) = κ(L) and the purge is fully discriminate then.

Step 2: L > Lm. Notice that at L = Lm and κ = κ(Lm) = κm(Lm), µF (el(κ, L)) = λ1−v(i)

2− v(i)−v

2−(V (i)−V2)

1−v2

since el(κ(L), L) = v + V2 + L = 1 − κ(L) = eh(κ(L), L), el(κ(L), L; i) = v(i) + V2(i) + L =

eh(κ(L), L; i), and Lm = 1−v2− V2. Under the assumption (Equation 5), c0 + c1κ(Lm) ≤ β[ri −

µF (el(κ(Lm), Lm))]Di,o = β[ri − µF (eh(κ(Lm), Lm))]Di,o. Assuming c0 + c1κ > WS at κ = κ(L),

35

we now show that the equilibrium purge breadth and effort satisfy respectively κ∗(L) = κ(L) and

e∗(κ(L), L) = eh(κ(L), L).

Observe that by Lemma 5, eh(κ(L), L) = 1 − κ(L) = v + V2 + L for all L > Lm (whereas

el(κ(L), L) < 1− κ(L)) and µF (eh(κ(L), L)) = λ1−v(i)−V2(i)−L1−v−V2−L

strictly decreasing with L. Hence for

all L > Lm, c0 + c1κ(L) < β[ri − µF (eh(κ(L), L))]Di,o. Further, P F (κ(L);L, κ(L), eh(κ(L), L)) is

strictly increasing with L. When e = eh(κ(L), L), the autocrat chooses κ = κ(L). (κ(L), eh(κ(L), L)

is thus part of a PBE. We now consider different cases.

Suppose that at L = Lm, for all κ ≤ κ(Lm), the following inequality holds c0 + c1κ ≤ β[ri −

µF (el(κ, L))]W2(i). By Lemma 9, µF (el(κ, L)) is strictly increasing in L. Therefore, c0 + c1κ <

β[ri − µF (el(κ, L))]W2(i) for all L > Lm and κ ≤ κm(L). The autocrat’s best response is thus to

purge all failures with average effort el(·) or eh(·) (given Lemma 8). Therefore, κ∗(L) = κ(L) and

e∗(κ(L), L) are the unique equilibrium solution since 1− el(κ, L) > κ for all κ ≤ κm(L).

Suppose now that there exists L ≥ Lm such that for L ≤ L, there exist κ1(L), κ2(L) ∈ (0, κm(L))2

such that c0 + c1κj(L) = β[ri − µF (el(κj(L), L))]Di,o for j ∈ {1, 2}. By Step 1, we obtain that:

P F (κ2(L);L, κ2(L), el(κ2(L), L)) = c1κ2(L)2

2> P F (κ1(L);L, κ1(L), el(κ1(L), L)). Since µF (el(κ, L))

is strictly decreasing with L (Lemma 9) and β[ri−µF (el(κ, L))]Di,o crosses c0+c1κ from below at κ =

κ2(L), κ2(L) is strictly decreasing with L and so is P F (κ2(L);L, κ2(L), el(κ2(L), L)). Using the fol-

lowing properties (i) at L = Lm, P F (κ2(L);L, κ2(L), el(κ2(L), L)) < P F (κ(L);L, κ(L), eh(κ(L), L))

and (ii) P F (κ(L);L, κ(L), eh(κ(L), L)) strictly increasing with L, we obtain that for all L > Lm,

P F (κ2(L);L, κ2(L), el(κ2(L), L)) < P F (κ(L);L, κ(L), eh(κ(L), L)). Given our equilibrium selection

criterion, κ2(L) cannot be an equilibrium purge breadth. Further, this implies el(κ, L) cannot be

an equilibrium effort when L > Lm.

Suppose now that there exists L > Lm such that for L ≥ L, there exist κ3(L) ∈ (κ(L), κm(L)] such

that c0 +c1κ3(L) = β[ri−µF (eh(κ3(L), L))]Di,o (observe that we consider average effort eh(κ, L)).26

Using a similar reasoning as in point (ii) of Lemma 7, it can be checked that µF (eh(κ, L)) is de-

creasing with L. Further, since β[ri − µF (eh(κ, L))]Di,o crosses c0 + c1κ from above, κ3(L) is

strictly decreasing in L. Hence, P F (κ3(L);L, κ3(L), el(κ3(L), L)) = c1κ3(L)2

2is strictly decreas-

ing with L. Using this property and (i) c1κ3(L)2

2≤ c1

κm(L)2

2, (ii) c1

κm(L)2

2strictly decreasing

with L by definition of κm(L), (iii) c1κm(Lm)2

2≤ P F (κm(Lm);Lm, κm(Lm), eh(κm(Lm), Lm)) (recall

κ(Lm) = κm(Lm)), and (iv) P F (κ(L);L, κ(L), eh(κ(L), L)) strictly increasing with L, we obtain

P F (κ(L);L, κ(L), eh(κ(L), L)) > P F (κ3(L);L, κ3(L), el(κ3(L), L)) for all L > Lm. Hence, by our

26L > Lm because the interval (κ(L), κm(L)] is empty otherwise.

36

equilibrium restriction, κ3(L) cannot be an equilibrium purge breadth.

By the reasoning above, for all L > Lm, the equilibrium purge breadth satisfies either (a) κ∗ = κ(L)

with e∗(κ(L), L) = eh(κ(L), L) = eind(κ(L), L) or (b) κ∗ > κ(L) with e∗(κ∗, L) = eind(κ∗, L). This

directly provides a proof of Claim 1.

Proof of Claim 1. By the reasoning above, when κ > κ(L), el(·) or eh(·) cannot be an equilibrium

effort. The equilibrium average effort is then necessarily e∗(κ, L) = eind(κ, L).

Finally, it remains to determine conditions under which κ∗(L) > κ(L). By a similar reasoning as

in point 1, if c0 +c1κ <WS at κ = κ(L), then the equilibrium purge breadth satisfies κ∗(L) > κ(L)

and the purge is semi-indiscriminate. Inversely, if c0 + c1κ ≥ WS, then the equilibrium purge

breadth satisfies κ∗(L) = κ(L) and the purge is fully discriminate. Notice that since by assumption

(Equation 5) WF ≥ c0 + c1κ at L = Lm and κ = κ(L), the purge is never partially discriminate as

claimed.

To summarize, using Lemma 3, the equilibrium average effort satisfies

e∗(κ(L), L) =v +κ(L)

1− e∗(κ(L), L)(V2 + L) (25)

=(v + 1)−

√∆(κ, L)

2if κ(L) < κ(L) (26)

with ∆(κ, L) = (v + 1)2 − 4(κ(V2 + L) + v)

e∗(κ(L), L) =v + V2 + L if κ(L) = κ(L) (27)

e∗(κ(L), L) =√

1− κ(L)

√v + V2 + L if κ(L) > κ(L) (28)

Proof of Corollary 1

By Lemma 3, the purge is semi-indiscriminate if and only if (i) ri > µS(e∗) = λv(i)+V2(i)+L

v+V2+Land (ii)

c1, c0 satisfy c1 <β(ri−λ

v(i)+V2(i)+L

v+V2+L

)Di,o2 −c0

1−v−V2−L. Notice that as b → 0, µS(e∗) → λ. Therefore, whenever

ri > λ, there exist b, c1, and c0 sufficiently small so that both conditions are satisfied.

In the remaining of the appendix, we assume that if there exists a solution to c0 + c1κ =

β[ri − µF (el(κ, L))]W2(i) for L < Lm, this solution is unique. This restriction is meant to simplify

the exposition of the proofs and are without loss of generality. We discuss below its implication

(see Remark 1). Notice that this implies that at L = Lm, c0 + c1κ ≤ β[ri − µF (el(κ, L))]W2(i) for

37

all κ ≤ κ(Lm).

To facilitate the exposition, we use subscript x to denote the partial derivative of some vari-

able z with respect to x (i.e., ∂z/∂x = zx) and a similar notation for the second partial deriva-

tive. To alleviate notation, we then denote µF (κ∗(L), L) := µF (e∗(κ∗(L), L)) and µS(κ∗(L), L) :=

µS(e∗(κ∗(L), L)). Finally, we ignore superscript and arguments whenever possible

Lemma 10. The equilibrium purge breadth satisfies: κ∗L(L) > 0 if (i) if κ∗(L) < κ(L) and (ii) if

κ∗(L) > κ(L).

Proof. Point (i). κ∗(L) is a solution to c0 + c1κ(L) = β[ri − µF (κ(L), L)]Di,o. By the Implicit

Function Theorem,

κL(L)(c1 + µFκ ) = −βµFLDi,o. (29)

By Lemma 9, µFL < 0. Since we consider the unique interior solution, it must be that c1 + µFκ > 0

(otherwise, the autocrat chooses κ∗(L) = κ(L)). Hence, the claim holds.

Point (ii). The equilibrium purge breadth is the solution to: c0 + c1κ(L) = β[ri−µS(κ(L), L)]Di,o.

Noting that µSκ = 0 by Lemma 7, we obtain:

κL(L)c1 = −βµSLDi,o, (30)

which directly implies the claim since µSL < 0 (Lemma 9).

Proof of Proposition 1

We first prove that equilibrium effort is increasing in the intensity of violence. We obtain:

de(κ(L), L)

dL= κL(L)eκ + eL (31)

When κ∗(L) < κ(L), by Lemma 6, eκ > 0. By Lemma 10, κL(L) > 0. By the proof of Lemma 9,

eL > 0. Hence de∗/dL > 0.

When κ∗(L) = κ(L), then e∗(κ∗(L), L) = v + V2 + L so de∗/dL = 1 > 0.

When κ∗(L) > κ(L), e∗(κ∗(L), L) =√

1− κ∗(L)√v + V2 + L. Hence,

eκ =− 1

2

√v + V2 + L√1− κ(L)

< 0

eL =1

2

√1− κ(L)√v + V2 + L

> 0

38

So

κL(L)eκ + eL =1

2

1√1− κ(L)

√v + V2 + L

(1− κ(L)− κL(L)(v + V2 + L)

)=

1

2

1√1− κ(L)

√v + V2 + L

(1− κ(L) + β

µSLW2(i)

c1

(v + V2 + L))

=1

2

1√1− κ(L)

√v + V2 + L

(1− κ(L)− β(1− λ)

W2(i)

c1

v + V2

v + V2 + L

)=

1

2

1√1− κ(L)

√v + V2 + L

(1− βW2(i)

c1

(ri − µS) +c0

c1

− β(1− λ)W2(i)

c1

v + V2

v + V2 + L

)=

1

2

1√1− κ(L)

√v + V2 + L

(1 +

c0

c1

− βW2(i)

c1

ri + βW2(i)

c1

v + V2 + λL

v + V2 + L

− β(1− λ)W2(i)

c1

v + V2

v + V2 + L

)=

1

2

1√1− κ(L)

√v + V2 + L

(1 +

c0

c1

− βW2(i)

c1

(ri − λ))

(32)

The first second lines comes from Equation 30, the third from µS(κ, L) = λv(i)+V2(i)+L

v+V2+Lso µSL =

−(1 − λ) v+V2

(v+V2+L)2 , the fourth from the definition of the equilibrium purge breadth, the fifth from

the definition of µS. Denote κ(L) := βW2(i)c1

(ri − λ) − c0c1

. Under the assumption (Equation 5),

κ(L) < 1 so κL(L)eκ + eL = 12

1√1−κ(L)

√v+V2+L

(1− κ(L)) > 0.

To see the second point of the proposition, recall that in a semi-indiscriminate purge (κ∗(L) > κ(L)),

1 − e∗ < κ∗. Further, κ(L) > κ∗ since µS > λ. This implies that 12

1√1−κ(L)

√v+V2+L

(1 − κ(L)) =

12

1− ˙κ(L)e∗

< 12. Since average effort increases at rate 1 in a fully discriminate purge, the claim

holds.


By Proposition 1, 1−e∗(κ∗(L), L) is strictly decreasing with L. By Lemma 10, for all κ∗(L) < κ(L),

κ∗L(L) > 0. Under the assumption (Equation 5) and by Lemma 3, 1 − e∗(κ∗(Lm), Lm) ≤ κ∗(Lm).

Hence, there exists a unique Lfull ∈ [0, Lm] such that 1 − e∗(κ∗(L), L) ≤ κ∗(L) if and only if

L ≤ Lfull. By Lemma 10, κ∗L(L) > 0 for all L ≤ Lfull.

For all L ≥ Lfull, κ∗(L) ≥ κ(L). By Lemma 9, µS(κ∗(L), L) is strictly decreasing with L then.

Suppose c0 + c1κ(L) ≥ β[ri − µS(κ(L), L)]Di,o. In this case, it is never profitable for the autocrat

to purge from the success pool. Denote Lind := L. Otherwise, given µS is strictly decreasing with

L and κ(L) = 1− (v+ V2 +L), there exists a unique Lind ∈ (Lfull, L) such that κ∗(L) = κ(L) (i.e.,

the purge is fully discriminate) if and only if L ∈ [Lfull, Lind] and κ∗(L) > κ(L) (i.e., the purge is

39

semi-indiscriminate) if and only if L > Lind. By Lemma 10, κ∗L(L) > 0 for all L ≥ Lfull.

Finally for L ∈ [Lind, Lfull), κ∗(L) = κ(L) = 1− v − V2 − L so κ∗L(L) < 0.


Point (i). For L < Lfull, the proportion of ideologues in the pool of survivors is:

S(L) =((1− e)− κ(L))µF + eµS

1− κ(L)

=λ− κ(L)µF

1− κ(L)(33)

Consider the function F (x) = λ−xµF1−x , its derivative is F ′(x) = −µF (1−x)+λ−µF x

(1−x)2 = λ−µF(1−x)2 > 0.

We obtain

dS(L)

dL=κL(L)F ′(κ(L))− (µFκ κL(L) + µFL)

κ(L)

1− κ(L)(34)

Since (µFκ κL(L) + µFL) < 0, F ′(x) > 0, κL(L) > 0 (Lemmas 9 and 10), we obtain dS(L)dL

> 0.

For L ≥ Lfull, the proportion of ideologues in the pool of survivors is

S(L) = µS (35)

Since µSL < 0 (Lemma 9), dS(L)dL

< 0.

Points (ii) and (iii). For L < Lfull, the proportion of ideologues in the party in the second period

is:

P(L) =κ(L)ri + ((1− e)− κ(L))µF + eµS

=κ(L)(ri − µF ) + λ (36)

Since κL(L) > 0 (Lemma 10), µFκ < 0 (Lemma 7), and µFL < 0 (Lemma 9), dP(L)dL

> 0.

For L ∈ [Lfull, Lind), the proportion of ideologues in the party in the second period is:

P(L) =(1− e)ri + eµS (37)

Given e = v+ V2 +L and µS(κ∗(L), L) = λ e∗(κ∗(L),L;i)e∗(κ∗(L),L)

= λv(i)+V2(i)+L

v+V2+L, we obtain: P(L) = (1− (v+

V2 + L))ri + λ(v(i) + V2(i) + L) and

dP(L)

dL= λ− ri (38)

40

PL(L) ≥ 0 ⇔ λ ≥ ri. Since in this case, Lind = L, point (ii) directly holds. To prove point

(iii), suppose that Lind < L (otherwise, the claim holds directly) and L ≥ Lind, the proportion of

ideologues in the party in the second period is:

P(L) =κ(L)ri + (1− κ(L))µS (39)

Using c0 + c1κ(L) = β[ri − µS]Di,o and Equation 30, we obtain:

dP(L)

dL=κL(L)(ri − µS) + (1− κ(L))µSL

=κL(L)(ri − µS)− (1− κ(L))c1κL(L)

βDi,o

=κL(L)

βDi,o

(β(ri − µS)Di,o

c1

− (1− κ(L))

)=κL(L)

βDi,o

(2β(ri − µS)Di,o

c1

− c0 + c1

c1

)(40)

Under the assumption (Equation 5), c0 + c1 ≥ βriDi,o. Therefore,

dP(L)

dL≤ κL(L)

βDi,oβ(ri − 2µS)Di,o

c1

Since µS > λ, the assumption ri ≤ 2λ guarantees that dP(L)dL

< 0.

Before determining the equilibrium intensity of violence, the next lemmas characterize some

properties of the benefit of violence—denoted B(L)—for the autocrat. Throughout we assume

that Lind < L. The reasoning extend to the case when Lind = L. We also use the shorthand

e(τ) := e∗(κ∗(L), L; τ). Recall that the probability a party member is purged is γ(κ(L), L) =

κ(L)1−e(κ(L),L)

after failure in a partially discriminate purge and ρ(κ(L), L) = 1− 1−κ(L)e(κ(L),L)

after success

in a semi-discriminate purge. Denote further W e2 = riW2(i) + (1− ri)W2(o) the expected payoff for

the autocrat from a new party member.

Lemma 11. The benefit of violence is continuous in L. It is strictly increasing in L for L ≤ Lind.

Proof. For L ≤ Lfull, The benefit of violence is:

B(L) =λ[e(i)(1 + βW2(i)) + (1− e(i))

(0 + γβW e

2 + (1− γ)βW2(i))]

+ (1− λ)[e(o)(1 + βW2(o)) + (1− e(o))

(0 + γβW e

2 + (1− γ)βW2(o))]− c(κ(L))

B(L) =e+ βλW2(i) + βκ(L)(ri − µF )W2(i)− c(κ(L)) (41)

41

The last line uses γ = κ(L)1−e so λ(1 − e(i))γ = κ(L)µF . Further, notice that under the assumption

that there is a unique solution to β(ri − µF )W2(i) − cκ(κ(L)) = 0, we obtain: limL↑Lfull

B(L) =

e+ βλe(i)W2(i) + β(1− e)riW2(i)− c(κ(L)), with κ(L) = 1− e and e = v + V2 + L.

Taking the derivatives, we obtain:

dB(L)

dL=κL(L)eκ + eL − βκ(L)(µFκ κL(L) + µFL)W2(i) + κL(L)

(β(ri − µF )W2(i)− c′(κ(L))

)(42)

Given that the purge is partially discriminate, the purge breadth satisfies: β(ri − µF )W2(i) −

cκ(κ(L)) = 0. By Lemmas 6, 7, 9, and Proposition 1, κL(L) > 0, eκ > 0, eL > 0, µFκ < 0, and

µFL < 0 so dB(L)/dL > 0.

For L ∈ (Lfull, Lind], the benefit of violence is:

B(L) =λ[e(i)(1 + βW2(i)) + (1− e(i))βW e

2

]+ (1− λ)

[e(o)(1 + βW2(o)) + (1− e(o))βW e

2

]− c(κ(L))

=e+ βλe(i)W2(i) + β(1− e)riW2(i)− c(κ(L)), (43)

with e = v+V2 +L and κ(L) = 1− e. Hence, limL↓Lfull

B(L) = limL↑Lind

B(L) = e+βλe(i)W2(i) +β(1−

e)riW2(i)− c(κ(L)).

Taking the derivative, we obtain:

dB(L)

dL=1 + β(λ− ri)W2(i) + c′(1− e) > 0 (44)

Finally, when L ≥ Lind, the benefit of violence is:

B(L) =λ[e(i)(1 + ρβW e

2 + (1− ρ)βW2(i)) + (1− e(i))βW e2

]+ (1− λ)

[e(o)(1 + ρβW e

2 + (1− ρ)βW2(o)) + (1− e(o))βW e2

]− c(κ(L))

=e+ βκ(L)riW2(i) + β(1− κ(L))µSW2(i)− c(κ(L)) (45)

The last line uses 1 − ρ = 1−κ(L)e

so λeiρ = (1 − κ(L))µS (by definition of µS). Recall that at

L = Lind, κ(L) = 1 − e, which implies e = v + V2 + L (see Equation 16). Thus limL↓Lind

B(L) =

e + β(1 − e)riW2(i) + βλe(i)W2(i) − c(κ(L)) = limL↑Lind

B(L), which completes the proof that B(L)

is continuous in L.

Taking the derivative, we obtain:

dB(L)

dL=κL(L)eκ + eL + β(1− κ(L))µSLW2(i) + κL(L)

(β(ri − µS)W2(i)− cκ(κ(L))

)=κL(L)eκ + eL + β(1− κ(L))µSLW2(i) (46)

The second line uses the fact we consider a semi-discriminate purge. We discuss the sign of (46)

below.

42

Lemma 12. The marginal benefit of violence satisfies:

limL↑Lfull

dB(L)

dL>dB(L1)

dL>dB(L2)

dLfor all L1 ∈ (Lfull, Lind] and L2 ∈ (Lind, L]

Proof. The proof of a discontinuity in the marginal benefit at L = Lfull proceeds in three steps.

First, we show that eL > 1 as L ↑ Lfull. Second, we show that −κ(L)µFL > λ − µF as L ↑ Lfull.

Finally, we show that BL(L) = 1 + β(λ− µF )W2(i) as L ↓ Lfull.

Step 1. By Equation 13, e = v + γ(V2 + L). We thus obtain:

eL = γL(V2 + L) + γ (47)

γL = eLκ(L)(1−e)2 > 0 and γ = κ(L)

1−eL↑Lfull−−−−→ 1 so eL > 1.

Step 2. Using the definition of µF and κ(L)L↑Lfull−−−−→ (1− e) to obtain:

−κ(L)µFL =(1− e)λeL(i)(1− e)− eL(1− e(i))(1− e)2

=λeL(i)− µF eL

=γ(λ− µF ) + γL(λ(v(i) + V2(i) + L)− µF (v + V2 + L)

)The third line comes from Equation 47. As γ = 1, γL > 0, and λ > µF , we obtain−κ(L)µFL > λ−µF

as claimed.

Using Equation 42, notice that Steps 1 and 2 imply that dB(L)dL

> 1 + β(λ− µF )W2(i) at L = Lfull

(all the other terms are positive).

Step 3. As L ↓ Lfull, c′(κ(L)) = β(ri − µF )W2(i). Hence, we can rewrite Equation 44 as (slightly

abusing notation by using equalities):

dB(L)

dL=1 + β(λ− ri)W2(i) + β(ri − µF )W2(i)

=1 + β(λ− µF )W2(i)

This directly implies limL↑LfulldB(L)dL

> limL↓LfulldB(L)dL

. Using Equation 44, we also obtain:

d2B(L)

dL2= −c′′(1− e) < 0 (48)

This directly implies limL↑LfulldB(L)dL

> dB(L1)dL

for all L1 ∈ (Lfull, Lind].

We now consider the discontinuity in the marginal benefit around Lind. By Equation 44, dB(L1)dL

> 1

for all ∈ (Lfull, Lind]. By Equation 46, dB(L2)dL

< 1 for all L2 > Lind since κL(L)eκ + eL < 1

(Proposition 1) and the other term is negative.

43

Lemma 13. The benefit of violence is strictly convex in L for L ≤ Lfull.

Proof. The proof proceeds in four steps. First, we compute the second (total) derivative of the

marginal benefit of violence. Second, we look at the second (partial) derivatives of effort and

autocrat’s posterior with respect to L and κ. Third, we show that the equilibrium purge breadth

is convex in L. The last step proves the claim.

Step 1. Using Equation 42, we obtain:

d2B(L)

dL2=κLL(L)eκ + κL(L)eκκ + (κL(L) + 1)eκL + eLL − βκLL(L)(µFκ κL(L) + µFL)W2(i)

− βκL(L)(κLL(L)µFκ + κL(L)µFκκ + (κL(L) + 1)µFκL + µFLL)W2(i)

+ κLL(L)(β(ri − µF )W2(i)− cκ(κ(L))

)+ κL(L)

(β(−κL(L)µFκ − µFL)W2(i)− κL(L)cκκ(κ(L))

)(49)

Step 2. Using µF = λ1−e(i)1−e , we obtain for j ∈ {κ, L}:

µFjj =λ(1− λ)(1− e(i))(1− e(o))

(1− e)3×

[(ejj(o)

(1− e(o))− ejj(i)

(1− e(i))

)(1− e) + 2ej

(ej(o)

1− e(o)− ej(i)

1− e(i)

)](50)

µFκL =λ(1− λ)(1− e(i))(1− e(o))

(1− e)3×

[(eκL(o)

(1− e(o))− eκL(i)

(1− e(i))

)(1− e) + eL

(eκ(o)

1− e(o)− eκ(i)

1− e(i)

)]

+λ

(1− e)3

[eL(eκ(o)(1− e(i))− eκ(i)(1− e(o))) + (1− e)(eκ(i)eL(o)− eL(i)eκ(o))

]

=λ(1− λ)(1− e(i))(1− e(o))

(1− e)3×

[(eκL(o)

(1− e(o))− eκL(i)

(1− e(i))

)(1− e) + eL

(eκ(o)

1− e(o)− eκ(i)

1− e(i)

)]

+λ(1− λ)

(1− e)3

[eκ(o)

(eL(1− e(i))− eL(i)(1− e)

)+ eκ(i)

(eL(o)(1− e)− eL(1− e(o))

)]

µFκL =λ(1− λ)(1− e(i))(1− e(o))

(1− e)3×

[(eκL(o)

(1− e(o))− eκL(i)

(1− e(i))

)(1− e) + eL

(eκ(o)

1− e(o)− eκ(i)

1− ei

)]

+λ(1− λ)(1− e(i))(1− e(o))

(1− e)3

[((1− λ)eκ(o) + λeκ(i))

(eL(o)

(1− e(o))− eL(i)

(1− e(i))

)](51)

Using e(τ) = v(τ) + γ(V2(τ) + L), we obtain:

eκκ(τ) =γκκ(V2(τ) + L) (52)

eLL(τ) =γLL(V2(τ) + L) + 2γL (53)

eκL(τ) =γκL(V2(τ) + L) + γκ (54)

44

Using the comparative statics on e), we get:

γκκ =eκ

(1− e)2+

(eκκκ(L) + eκ)(1− e) + 2(eκ)2κ(L)

(1− e)3

γLL =eLLκ(L)(1− e) + 2(eL)2κ(L)

(1− e)3

γκL =(eκLκ(L) + eL)(1− e) + 2eκeLκ(L)

(1− e)3

By Lemma 6 (given L ≤ Lfull, κ∗(L) ≤ κ(L)), eκ ≥ 0 and eκκ ≥ 0. By a similar reasoning,

eL > 0, eLL > 0, and eκL > 0 (by definition of e∗(·), see Equation 26), therefore γκκ, γLL, and γκL

are all strictly positive. Since γL and γκ are also positive, it implies that the quantities defined

in Equation 52-54 are all strictly positive. Further, given V2(i) > V2(o), ejm(i) > ejm(o) for all

j,m ∈ {κ, L}2. Since e(i) > e(o), this implies that µFjj < 0, j ∈ {κ, l} and µFκL < 0.

Step 3. Using Equation 29 and using the Implicit Function Theorem, we obtain (using cκ(κ) =

c0 + c1κ:

κLL(L)(c1 + µFκ ) = β(−µFκκκL(L)− µFκL(κL(L) + 1)− µFLL)Di,o (55)

By the properties of the equilibrium (partially discriminate) purge breadth, c1 + µFκ > 0. By the

reasoning above and κL(L) > 0 (Proposition 2), the right-hand-side of (55) is strictly positive.

Hence κLL(L) > 0.

Step 4. Using Equation 29 and the definition of κ∗(L), we can rewrite Equation 49 as:

d2B(L)

dL2=κLL(L)eκ + κL(L)eκκ + (κL(L) + 1)eκL + eLL − βκLL(L)(µFκ κL(L) + µFL)W2(i)

− βκL(L)(κLL(L)µFκ + κL(L)µFκκ + (κL(L) + 1)µFκL + µFLL)W2(i)

By Steps 2 and 3, it can be checked that d2B(L)dL2 > 0 as claimed.


Existence follows from the fact that B(L) is continuous (Lemma 11) and the maximization problem

is over a compact set [0, L]. We now look at various cases using the computation of the marginal

benefit of violance (Lemma 11) and the assumption that the marginal cost of violence is ζ0 + ζ1L.

Case 1. Suppose at L = Lfull, ζ0 +ζ1L ≥ 1+βDi,o(λ−µF ) (point (i)). Then by Lemma 12, it must

be that ζ0 +ζ1L > B(L) for all L ≥ Lfull. Hence, L∗ < Lfull. Since B(L) is convex over the interval

[0, Lfull], we need to consider two cases at L = Lfull: (a) ζ0 + ζ1L ≥ dB(L)dL

and (b) ζ0 + ζ1L <dB(L)dL

.

In case (a), the solution is unique and equals to either 0 or the unique solution to ζ0 + ζ1L = dB(L)dL

.

In case (b), there may be two solutions to ζ0 + ζ1L = dB(L)dL

, call them LMax ≥ 0 and Lmin with

45

LMax < Lmin and LMax (Lmin) a local maximum (minimum).27 There is also a corner solution at

L = Lfull. The autocrat then compares B(LMax) and B(Lfull) and there generically exists a unique

solution unless B(LMax) = B(Lfull).

For the next case, notice that using Equation 32 and Equation 28, we can rewrite for all L > Lind,

dB(L)

dL=

1

2

1

e

(1 +

c0

c1

− βDi,o

c1

(ri − λ))

+ (1− κ(L))µSLβW2(i)

=1

2

1

e

(1 +

c0

c1

− βDi,o

c1

(ri − µS) + βDi,o

c1

(µS − λ))

+ (1− κ(L))µSLβW2(i)

1

2

1

eβDi,o

c1

(µS − λ) + (1− κ(L))( 1

2e+ µSLβW2(i)

)(56)

Further, by Equation 56 and taking the limits L ↓ Lind, we obtain given 1−κ(L)→ e, e→ v+V2+L,

and µS = λ e(i)e

(slightly abusing notations by using equalities):

dB(L)

dL=

1

2

1

eβDi,o

c1

(µS − λ) +1

2− eλe− e(i)

e2 W2(i)

=1

2

(1 + β

Di,o

ec1

(µS − λ)

)+ βDi,o(λ− µS) (57)

The last line comes from W2(o) = 0.

Case 2. Suppose at L = Lfull, ζ0 +ζ1L < 1+βDi,o(λ−µF ) and that maxL∈(Lind,L]

ζ0 +ζ1L− dB(L)dL

< 0.28

In this case, by Lemma 12, the equilibrium intensity of violence is unique and is the solution to

ζ0 + ζ1L = 1 + βDi,o(λ− µF ) if it exists or L = Lind otherwise.

Case 3. Suppose at L = Lind, ζ0 + ζ1L < maxL∈(Lind,L]

dB(L)dL

. The equilibrium intensity is L = Lind if

the equation ζ0 + ζ1L = dB(L)dL

, with dB(L)dL

defined by Equation 46, has no solution. The equilibrium

intensity is the solution to ζ0 + ζ1L = dB(L)dL

if it is unique. It is either the smallest solution to

ζ0 + ζ1L = dB(L)dL

or L = L if there are multiple solutions.

Finally, notice that the condition of point (ii) of the proposition is is contained in Case 3 (using

Equation 57) so L∗ > Lind.

Lemma 14. If µS < ri at L = Lind (assuming Lind < L), there exists c1(ri, c0) > 1/2 such that the

marginal benefit of violence is strictly positive for L > Lind whenever c1 < c1(ri, c0).

Proof. Using Equation 28 and Equation 32, for all L ≥ Lind, we can rewrite Equation 46 as (recall

W2(o) = 0 so Di,o = W2(i)):

dB(L)

dL=

1

2

1

e

(1 +

c0

c1

− βW2(i)

c1

ri

)+ βW2(i)

( λ

2ec1

+ (1− κ(L))µSL

)(58)

27If there is a unique solution, it is necessarily Lmin, i.e., the local minimum.

28Notice that we do not compute the sign of d2B(L)dL2 for L > Lind. Simulations suggest that the sign is ambiguous.

However, it is not critical for our argument.

46

By assumption, 1 + c0c1− βW2(i)

c1ri > 0 so we focus on the second parenthesis.

λ

2ec1

+ (1− κ(L))µSL =λ

2ec1

− (1− κ(L))(1− λ)v + V2

(v + V2 + L)2

=λ

e

(1

2c1

− (1− λ)1− κ(L)

v + V2 + L

√1− κ(L)

v + V2 + L(v(i) + V2(i))

)

Denote Γ(c1) = 12c1− (1 − λ) 1−κ(L)

v+V2+L

√1−κ(L)

v+V2+L(v(i) + V2(i)). Γ(c1) is strictly decreasing with c1,

strictly negative as c1 → ∞, strictly positive as c1 → 0. Hence there exists a unique solution,

denoted c1(ri, c0;L), such that Γ(c1) > 0 for all c1 < c1(ri, c0;L) (dependence on ri and c0—

and other parameter values—follows from observation of Equation 58). The solution satisfies

c1(ri, c0;L) > 1/2. To see this, notice that√

1−κ(L)

v+V2+L(v(i) + V2(i)) < ei ≤ 1 (by (28) properly

arranged) and 1−κ(L)

v+V2+L= (1− ρ)2 < 1 (using the definition of ρ and (28)). Hence Γ(1/2) > 0.

Define c1(ri, c0) := minL∈[Lind,L]

c1(ri, c0;L). By the reasoning above, c1(ri, c0) > 1/2. It can then

be checked that there exists a non-empty set of parameter values such that c1 < c1(ri, c0) and

Equation 5 are simultaneously satisfied.

Proof of Corollary 2

We just prove necessity. Sufficiency follows from the usual argument.

Denote ri := µS(κ(L), L). Using the proof of Proposition 2 and µS(·) decreasing with L (Lemma

9), if ri ≤ ri, Lind = L for all c0, c1 and a purge is never semi-indiscriminate. So ri > ri is a

necessary condition. This is condition 1.

Supposing condition 1. holds, define c0(ri) = β[ri − µS(κ(L), L)]Di,o evaluated at L = L. If

c0 ≥ c0(ri) then for all c1 > 0, the purge cannot be semi-indiscriminate as the marginal cost is

always greater than the marginal benefit. Given c0 < c0(ri), define c1(ri, c0) such that at L = L,

c0 + c1κ(L) = β[ri − µS(κ(L), L)]Di,o. Similarly, if c1 ≥ c1(ri, c0), a purge can never be semi-

indiscriminate. Using Lemma 14, denote c1(ri, c0) = min{c1(ri, c0), c1(ri, c0)} (assuming the upper

bound in Equation 5 does not bind, otherwise the condition can be arranged appropriately). This

is condition 2.

Finally define ζ0(ri) := maxL∈[Lind,L]

dB(L)dL

(by Lemma 14 and condition 2 above dB(L)dL

> 0 over this

range). If ζ0 ≥ ζ0(ri), for all ζ1 > 0, the marginal cost of investing in violence is such that

the autocrat never chooses L > Lind. Assuming dB(L)dL

is weakly decreasing over [Lind, L], denote

ζ1(ri, ζ0) :=dB(L)dL−ζ0

Lat L = Lind. If ζ1 ≥ ζ1(ri, ζ0), then the equilibrium intensity of violence

satisfies L∗ ≤ Lind and the purge is not semi-indiscriminate. If dB(L)dL

is not weakly decreasing

47

over [Lind, L], define L0 := arg maxL∈[Lind,L]

dB(L)dL−ζ0

L(assuming uniqueness). ζ1(ri, ζ0) then must satisfy

ζ1(ri, ζ0) ≤dB(L)dL−ζ0

Lat L = L0 and B(L)− ζ(L) ≥ B(Lind)− ζ(Lind). This is condition 3.


The procedure is as such. Step 1: Pick (λ′, r′i, b′, c′0, ζ

′0) ∈ [0, 1]× [0, 1]×R3

+. Step 2: Check whether

there exists cd1 satisfying Equation 5 and ζd1 ∈ R+ such that (i) there exists a local maximum of

B(L)−ζ(L) in [0, Lfull], denoted LMax as in Proposition 4 and (ii) B(LMax) = B(Lfull) (notice that

cd1 and ζd1 are unique if they exist). Step 3: If conditions (i) and (ii) hold then (λ′, r′i, b′, c′0, ζ

′0) ∈ Pd,

if not (λ′, r′i, b′, c′0, ζ

′0) /∈ Pd. Repeat the steps for all possible (λ, ri, b, c0, ζ0). Pd is non-empty as

we can always pick c1 such that a fully discriminate purge is possible and ζ0 and ζ1 such that

conditions (i) and (ii) hold by convexity of the marginal benefit (Lemma 13). Pd is not measure 0

as we can always perturbate the parameters slightly and adjust ζ0 and ζ1. Due to the convexity of

the marginal benefit of violence and conditions (i) and (ii), the claim holds directly.29

Remark 1. Supposer there exist multiple solutions to c0 + c1κ = β[ri − µF (el(κ, L))]W2(i) for

L ≥ L, L < Lm. Then B(L) is not continuous in L, but there exists a generically unique equilibrium

intensity of violence.

Proof. Denote κ1(L) the minimum solution to c0 + c1κ = β[ri−µF (el(κ, L))]W2(i). By the proof of

Lemma 3, for all L < L, κ∗(L) = κ1(L) < κ(L), but there exists L1 > L such that for all L ∈ [L, L1],

κ∗(L) = κ(L). Hence, the purge breadth and, consequently, B(L) are not continuous in L. Further,

limL↑L

B(L) < B(L). It can be checked that all the other properties of B(L) described in Lemmas

11-14, Propositions 4 and 5, Corollary 2 hold. This implies that all proofs carry through, but the

proof of existence and uniqueness of an equilibrium intensity of violence. It needs to be amended as

such. Suppose there exists L′ such that ζ0 + ζ1L′ = dB(L′)

dLand L′ < L. If there exists L′′ > L such

that ζ0 + ζ1L′′ = dB(L′′)

dL, then the equilibrium intensity of violence satisfies L∗ = arg max

L∈{L′,L′′}B(L)

and is generically unique. If there is no such L′′, then the equilibrium intensity of violence satisfies

L∗ = arg maxL∈{L′,L}

B(L) and is generically unique.

29It is important to observe that for all (λ, ri, b, c0, ζ0) ∈ Pd, the condition described in the text of the proposition

is knife-edge. However, the properties of Pd indicate that this knife edge condition can arise for a non-trivial set of

parameter values.

48

Mass Purges: Top-down Accountability in Autocracy · In 1901, when writing his revolutionary agenda What is to be done?, Vladimir Ilyich Ulyanov (alias Lenin) chose one particular

Documents