8/3/2019 Ertan - Who to Punish - EER 2009
1/17
Who to punish? Individual decisions and majority rule in mitigatingthe free rider problem$
Arhan Ertan a, Talbot Page b, Louis Putterman c,
a Tufts University, MA, USAb Brown University (Emeritus), RI, USAc Department of Economics, Box B, Brown University, Providence, RI 02912, USA
a r t i c l e i n f o
Article history:
Received 2 August 2006
Accepted 21 September 2008Available online 17 October 2008
JEL classification:
C91
C92
D71
H41
Keywords:
Public goods
Collective actionPunishment
Voting
Institutions
a b s t r a c t
We study a voluntary contributions mechanism in which punishment may be allowed,
depending on subjects voted rules. We found that out of 160 group votes, even when
groups had no prior experience with unrestricted punishment, no group ever voted to
allow unrestricted punishment and no group ever allowed punishment of high
contributors. Over a series of votes and periods of learning we found a distinct
reluctance to allow any punishment at the beginning, with a gradual but clear evolution
toward allowing punishment of low contributors. And groups allowing punishment of
only low contributors achieved levels of cooperation and efficiency that are among the
highest in the literature on social dilemmas.
& 2008 Elsevier B.V. All rights reserved.
0. Introduction
Organizations such as teams, firms, and military units depend on cooperative effort to succeed, and organizational
leadership often attempts to increase cooperative contributions and/or reduce free riding by instituting rewards and
sanctions and by building a culture or norms of cooperation. Problems of cooperation, for example, in efforts to limit
greenhouse gases or depletion of fisheries, and free riding in efforts to provide public goods share a common characteristic:
incentives for the individual that lead to inefficiency in the group. Such problems are often called social dilemmas and havebeen the focus of numerous studies using the method of the laboratory decision-making experiment.
In one key social dilemma experiment, Ostrom et al. (1992) found for a model of overuse of a commons, that allowing
face-to-face communication and allowing the subjects to sanction (punish) each other led to a significant increase in
cooperative behavior. In another influential experiment, Fehr and Gachter (2000a) found that, in a voluntary contributions
mechanism1 (VCM), the opportunity for punishment had a dramatic positive effect on contributions, but this finding did
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/eer
European Economic Review
ARTICLE IN PRESS
0014-2921/$ - see front matter & 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.euroecorev.2008.09.007
$ The research reported here was supported by N.S.F. Grant SES-0001769. We are indebted to two anonymous referees for helpful comments and
suggestions. Corresponding author. Tel.: +1 401863 3837; fax: +1 4018631970.
E-mail address: [email protected] (L. Putterman).1 The basic voluntary contributions mechanism without punishment is a particularly sharp social dilemma, in which each individual maximizes his
payoff when others contribute their full endowments but he himself contributes nothing; yet when everyone contributes nothing, efficiency is minimized.
European Economic Review 53 (2009) 495511
http://www.sciencedirect.com/science/journal/eerhttp://www.elsevier.com/locate/eerhttp://dx.doi.org/10.1016/j.euroecorev.2008.09.007mailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.euroecorev.2008.09.007http://www.elsevier.com/locate/eerhttp://www.sciencedirect.com/science/journal/eer8/3/2019 Ertan - Who to Punish - EER 2009
2/17
not extend to average efficiency. In both experiments, punishment was made possible by allowing a subject to pay out of
his/her earnings to reduce by a larger amount the earnings of another. Since punishment is costly to both the punisher and
the punished, it was not surprising to observe that punishment had a less positive effect on efficiency than on contributions
in VCMs or overuse in commons problems.
But at the same time practically everyone who studied the role of punishment noticed a curious phenomenon. While
most punishment was targeted at low contributors in VCMs (and overusers in commons problems), a considerable amount
of punishment was targeted at cooperators (high contributors in VCMs, low extractors in commons problems). The
frequency of punishing high contributors in VCMs was too high to be explained as mistakes. Cinyabugama et al. (2006)estimated that about 15% of punishment in several experiments2 of this type was targeted at the highest contributor in a
group, and about 25% at those who contributed more than their groups average. Researchers suggested several possible
explanations: for example, revenge and harming others more than oneself to win relatively (tournament style), and moral
resentment.3 These possible explanations suggested multiple preference typesincluding other-directed preferences
(revenge, etc.) in addition to the self-interested preference for maximizing earnings found in most economic models. It
seemed to us that the phenomenon of punishing high contributors in VCMs was more frequent than commonly recognized
and likely to have adverse effects on contributions and efficiency. In practical life, if decentralized punishment of high
contributors by resentful free riders has comparably high frequencies it would be a serous problem.4 We called the
punishment of high contributors perverse punishment because of its seeming inconsistency with self-interested earnings
maximization.
Because the directing of a significant fraction of punishment at high contributors appears to limit the usefulness of
decentralized punishment as a mechanism or institution, we asked whether the problem might be corrected if groups
of individuals were provided with the opportunity to choose their own rules governing the application of punishment.We conducted an experiment in which rules determining who can be punished are chosen by a series of votes, in order to
see how the choices of rules evolved over time and how these choices affected cooperation and efficiency. In our
experiment, subjects voted on three ballot items determining independently whether group members could reduce the
earnings of low (below average), of average, and of high (above average) contributors to their group account (public good).
We found that out of 160 group votes, no group ever voted to allow punishment of high contributors. Over a series of
votes and periods of learning we found a distinct reluctance to allow any punishment at the beginning, with a gradual but
clear evolution toward allowing punishment of low contributors. And groups adopting this rule of controlling perverse
punishment achieved levels of contributions and efficiency that are among the highest in the literature on social dilemmas.
Our main contributions are: to show how rules of punishment can evolve endogenously to address free rider problems,
within the opportunities of institutional choice presented to the experimental subjects; and to show that perverse
punishment can have strong negative effects on contributions and efficiency but is amenable to group control.
These contributions, listed more specifically in Results 14, are based on the observed behaviors in the experiment and
rely on direct counts or non-parametric tests using fully independent observations at the group or session level. Toward theend of the result section, we also discuss regressions estimated using individual-level observations, here using group and
period fixed effects to partially address the possible interdependence among observations.
The paper is organized as follows. Section 1 reviews the theoretical outlook that informs our own and related research,
then discusses the related literature. Section 2 presents the experimental design, Section 3 presents the analysis, and
Section 4 discusses interpretative issues.
1. Theoretical intuitions and literature
1.1. Theory
Several social dilemmas have an iterated dominant strategy equilibrium, which implies a unique Nash equilibrium
without any cooperation. The finitely repeated prisoners dilemma (Kreps et al., 1982), the centipede game (McKelvey and
Palfrey, 1992), and VCMs are examples having a unique Nash equilibrium with no cooperation. (One of the assumptions
that leads to this result is that of a single preference type of payoff maximizers, all of whom believe that all the players are
payoff maximizers.) Kreps et al. found this equilibrium result disturbing because many experiments on the prisoners
dilemma showed a pattern of substantial cooperation. A little later, McKelvey and Palfrey (1992) developed an exponential
version of the centipede game for which there are large benefits of cooperation, a unique Nash equilibrium with no
cooperation, and substantial cooperation in experimental observations. McKelvey and Palfrey thought the centipede game
ARTICLE IN PRESS
2 In particular, Fehr and Gachter (2000a), Page et al. (2005), and Bochet et al. (2006).3 A low contributing individual may be made uncomfortable by a high contributors action, feel moral resentment and want to get even by
punishing the high contributor. An experimental subject gave us this explanation in a debriefing statement.4 Cinyabuguma et al. find support for the idea that most punishment of high contributors by low ones may reflect retaliatory motives. For an
experiment on retaliatory punishment, see Nikiforakis (2008). Recently, the on-line auction site eBay announced a clamp-down on tit-for-tat feedback
to prevent sellers from leaving negative feedback on buyers. Today, the biggest issue with the system is that buyers are more afraid than ever to leave
honest, accurate feedback because of the threat of retaliation, explained eBay North America president Bill Cobb in his January 29, 2008 announcement(Bangeman, 2008).
A. Ertan et al. / European Economic Review 53 (2009) 495511496
8/3/2019 Ertan - Who to Punish - EER 2009
3/17
was an even simpler andymore compelling example of the Nash equilibriums predictive failure than is the prisoners
dilemma.
In response to the Nash equilibriums predictive failure under assumption of payoff maximizing as the only preference
type, Kreps et al. and McKelvey and Palfrey modeled the two social dilemmas as (different) games of incomplete
information with multiple preference types. Kreps et al. used two types: payoff maximizers and tit-for-tat players.
McKelvey and Palfrey used two types: payoff maximizers and altruists. With multiple types and incomplete information,
iterated dominance no longer is implied. Instead, the researchers solved for BayesNash equilibria that more accurately
predicted substantial cooperation until near the end of the game, as observed experimentally.It is easy to check that for the VCM with a punishment opportunity and voting in our experiment, under the assumption
of payoff maximizers as the single type, iterated dominance implies a Nash equilibrium predicting no cooperation and no
punishment (and any voting pattern, including 100% abstentions). But this implication no longer holds when there are
multiple preference types. This non-implication is suggestive because in numerous experiments researchers found
substantial contributions in finitely repeated VCMs without punishment (see Davis and Holt, 1993; Ledyard, 1995, for
surveys). And in VCM experiments with punishment but without voting, Fehr and Gachter (2000a, b, 2002), Carpenter and
Matthews (2002), Masclet et al. (2003), Page et al. (2005), and Sefton et al. (2002) found substantial contributions and
substantial punishment. These studies and the non-implication suggest the presence of multiple preference types in our
experiment and other VCMs.
Comparison with the prisoners dilemma, the centipede game, and other Bayesian games points toward several
predictions. Payoff maximizers are likely to mimic cooperators to encourage their cooperation, because this is a reasonable
strategy for increasing their payoffs. Cooperators are likely to punish low contributors because they dislike free riding
(see Gintis et al., 2005), and this signals and warns free riders to contribute more. Perverse punishers appear, however, tobe the opposite of cooperators. Fehr and Gachter (2000a, b) interpreted their results primarily in terms of the interaction of
two preference types: purely selfish players (what we have called payoff maximizers) and a conditional cooperator type
(see also Hoffman et al., 1998). Fischbacher et al. (2001) and Fischbacher and Gachter (2006) used a strategy method
protocol to estimate that about 50% of those in their subject pools were of this second type. 5 Further, punishment of high
contributors, observed by Gachter and Herrmann (2005), Gachter et al. (2005), and Cinyabugama et al. (2006), suggests
that when punishment is an available option, the presence of a third type, whom we call perverse punishers, should also
be taken into account. Based on the work mentioned above, we expected perverse punishers to account for not more than
25% of our subjects.6
A word of cautionwe believe that these preference types are somewhat stylized interpretations rather than sharply
fixed, non-overlapping characteristics. With this in mind, intuitively the interaction of the three types in our experiment
leads to predictions regarding voting. It seems likely that conditional cooperators would vote to allow punishment of low
contributors and prohibit punishment of high contributors, and payoff maximizers might also vote similarly.7 It also seems
likely that perverse punishers would vote to allow punishment of high contributors. But being in a minority, they wouldlikely be outvoted, although by chance they might form a majority in a few out of a large number of randomly formed
groups.
Considering multiple preference types has been useful in explaining results in a large number of basic VCMs and VCMs
with punishment. But VCMs are more complicated than the prisoners dilemma or the centipede game, and to our
knowledge, solving even the basic VCM for BayesNash equilibria has so far been intractable. We attempt here only to use
the intuitions developed above to guide interpretation of observed behaviors, hopefully contributing both to a practical
understanding of social dilemmas and to future refinements of theory.
1.2. Related literature
While our paper is the first to directly address effects of perverse punishment by allowing or prohibiting intermediate
restrictions on punishment, there are related papers on the endogenous choice of institutional rules that allow or prohibit
punishment altogether, or exogenously affect the role of punishment. Gurerk et al. (2005, 2006) designed two experimentsthat allowed subjects to vote with their feet in choosing between two groups, one allowing unrestricted punishment and
the other no punishment. Subjects initially avoided the group with punishment, but with repeated opportunities to choose,
almost all eventually chose the group with punishment, in result achieving high contributions and efficiency. Their
ARTICLE IN PRESS
5 In a different experimental setup, a VCM with endogenous group formation, Page et al. (2005) estimated a 59% proportion of conditional
cooperators.6 When subjects from a population with this rough demography of types are randomly assigned to play a VCM in small groups, the groups may
differ from one another in cooperation levels due to random differences in which types are represented and with what frequencies. Ones and Putterman
(2007) grouped together on the one hand subjects displaying more cooperative behaviors and on the other hand subjects displaying less cooperation and
more perverse punishing. They found, predictably, that the former achieved higher contributions and earnings than the latter.7 Incentives in voting of course differ from those in a private action. For example, a payoff maximizer may prefer free riding to contributing, but at the
same time find it in his interest to vote to allow punishment of low contributors. In his calculation he may believe that by such a rule he would lose the
benefit from his own free riding, but be more than compensated by many erstwhile free riders who will contribute more in response to the threatened
punishment of free riding. And in a population of mixed preference types, a payoff maximizers calculations of the net advantage from the rule depends onhis beliefs on whether there will be a sufficient number willing to punish free riders and make the threat of punishment effective.
A. Ertan et al. / European Economic Review 53 (2009) 495511 497
8/3/2019 Ertan - Who to Punish - EER 2009
4/17
experiments differ from ours in that their subjects choose groups with either no punishment or unrestricted punishment,
while our subjects have fixed groups and vote over alternative restrictions on punishment.
Botelho et al. (2005) designed an experiment that allowed subjects to choose between an institution with unrestricted
punishment and another without any punishment. They found that the subjects voted overwhelmingly for the institution
without punishment. In a related experiment, Sutter et al. (2005) found that subjects most often voted to allow rewards
rather than punishment even though the latter raised contributions more. These experiments differed from ours by
allowing only one vote for each group, and not allowing choices of partially restricted punishment.
Botelho et al. (2005) also analyzed Fehr and Gachters (2000a, 2002) data, finding lower earnings when punishment wasallowed than when it was not allowed.8 In contrast, Gurerk et al. (2005, 2006) found earnings (efficiency) as high or higher
in VCMs with unrestricted punishment than in VCMs without punishment opportunities. Masclet et al. (2003) also found
higher earnings with unrestricted punishment compared with no punishment allowed. By varying the ratio of
punishments cost to the punisher versus the target of punishment, Nikiforakis and Normann (2008) and Egas and Riedl
(2005) shed light on the conditions under which the unrestricted opportunity to punish does and does not increases
efficiency.
Noting the detrimental effects of the punishment of high contributors, Cinyabugama et al. (2006) designed a procedure
they believed might reduce its incidence. The first two stages of the experiment were an ordinary VCM followed by a
punishment opportunity. But in a third stage, each subject learned the frequency of each other subjects punishment of
high, average, and low contributors, and each was given an opportunity to punish on the basis of this information. The
authors found that this incentive system led to less perverse punishment in the second stage, but fairly frequent perverse
punishment in the third stage, for example subjects who punished free riders in the second stage were then severely
punished in the third stage, undermining the incentives in the first stages.Gachter and Herrmann (2005) used population groupings (young rural Russians, older rural Russians, young urban
Russians, older urban Russians) to study the effects of unrestricted punishment. They found large variations among the
groups in frequency of punishing high contributors and the harmful effects of this perverse punishment which, they wrote,
can undermine the positive impact of punishment for cooperation and thereby limit the success of self-governance. Like
Cinyabuguma et al. and our paper, Gachter and Herrmann emphasized the detrimental effect of perverse punishment on
efficiency.
Casari and Luini (2005) compared effects of exogenously imposed punishment rules, including a rule requiring a subject
to be targeted for punishment by at least two group members (in a group of five) before the punishment takes effect. They
found that the restriction decreased punishment of high contributors and raised efficiency, but in this treatment the
average contribution was quite low, not exceeding half of the endowment.
2. Design and predictions
2.1. Basic design
Our design extends the basic VCM in which subjects are randomly assigned to groups that remain fixed (a partners
design) for a finite and known number of periods. Each subject in a group is provided with an initial endowment that he or
she is asked to divide between a private account and a group account. Any funds placed in the group account are scaled up
by the experimenter and divided equally among the subjects in the group without regard to individual contribution. To this
basic VCM we added punishment and voting opportunities in two designs to study how rules restricting or allowing
punishment might emerge initially and evolve over a series of votes. In the experiment, individuals act anonymously and
without communication.
We initially conducted a pilot experiment in which there were four partner groups with four subjects in each group. At
the beginning of the 1st period, the subjects received instructions for playing a basic VCM without punishment, and each
group played 10 periods of this repeated game (details of the basic VCM and its payoff function (1) are shown below). At the
beginning of the 10th period the subjects received instructions for playing a VCM with unrestricted punishment, and each
group played 10 periods of this repeated game (details and payoff function (2) shown below). So far, this design is similar to
Fehr and Gachter (2000a). But following these first 20 periods, each group voted on who if anyone could be punished in a
final 10 periods (details of the ballot process is shown below). Of the four group votes, all four voted to prohibit punishment
of higher-than-average contributors; one group prohibited all punishment and the other three groups voted to allow
punishment of low contributors.9
ARTICLE IN PRESS
8 Cinyabuguma et al. (2004) found similar results for Fehr and Gachter (2000a) and in public goods and sanctions experiments by Carpenter and
Matthews (2002), Sefton et al. (2002), Page et al. (2005), and Bochet et al. (2006). In their working paper, Cinyabuguma et al. (2004) used regression to
study the impact of punishment upon changes in the punished subjects contribution, and found that each dollar of punishment of a groups highest
contributor substantially decreased his or her next period contribution. The authors concluded that a major reason why punishment reduces efficiency in
the experiments mentioned is the punishment of high contributors. Their calculations showed that in the related public goods and sanctions experiments
by Bochet et al. and Page et al., earnings would have been higher with punishment than without it but for the presence of perverse punishment.
9 Due to a computer problem, the voted rules were not properly implemented; nonetheless, decisions up to and including the vote remainuncompromised, allowing us to make inferences from this pilot experiment occasionally in what follows.
A. Ertan et al. / European Economic Review 53 (2009) 495511498
8/3/2019 Ertan - Who to Punish - EER 2009
5/17
Following this pilot, we wanted to see not only what rules are chosen initially but also what voting patterns would
emerge with further experience. In the first of two designs, we increased the number of votes to three votes for each group,
and correspondingly shortened the number of periods under which a voted rule governed before the next vote from
10 periods to 8. To keep the total number of periods to 30, we shortened the introductory experiences of VCMs with and
without punishment from 10 periods each to 3 periods each. This became our 3-Vote design (see Fig. 1A).
As in the pilot treatment, subjects in the 3-Vote design were given instructions describing the basic VCM, and then
participated in the basic VCM (this time for 3 periods), then received their second instructions about the opportunity of
voluntary punishment, unrestricted except for some budgetary constraints (see below), then played for 3 periods under
this condition, all before learning of the voting opportunities and items to be voted on. At the beginning of the 7th period,
the subjects received their third instructions, which explained the voting process, and took their first vote on the rulesgoverning who, if anyone, could be punished for the next 8 periods. At the beginning of the 15th period a second vote was
taken and new rules regulating punishment were chosen. Then the subjects participated in 8 periods of the VCM with
punishment (if any) governed by the second chosen rules. At the beginning of the 23rd period the third and final vote was
taken, and the remaining 8 periods were conducted with possible punishment governed by this last vote. As in the pilot, we
included practice exercises in each of the three sets of instructions.
Surprised to find that out of 60 group votes none allowed punishment of high contributors and that the majority of
groups seemed to be converging towards allowing punishment of low contributors, we added a 5-Vote design (Fig. 1B)
which differed from the 3-Vote design in that (a) there was no play, whether with or without punishment, before the
determination of rules by vote, and (b) the sequences of play between votes were reduced from 8 to 6 periods, to allow for
five votes and play phases in a session of similar duration. As Fig.1B shows, the first and only instructions were given at the
beginning of the experiment. They explained the basic VCM mechanism without punishment, possible rules governing
punishment, and the opportunity to vote on them. Subjects then voted to allow or restrict punishment (without any hands-
on experience of punishment or its restrictions). Then they participated for 6 periods in the VCM, governed by the chosenrules of punishment. At the beginning of the 7th period, the subjects voted again, and then participated in 6 periods of the
VCM, governed by the chosen rules of punishment. The same process repeated for three more times, as shown in the figure.
The 5-Vote design had the same number of periods (30) as the 3-Vote design.
The 5-Vote design functioned as a stress test for the results of the 3-Vote design in several ways. First, the task of
learning and familiarization was harder, since the first choice of rules occurred before subjects had any experience
interacting in a VCM with or without punishment. Second, the possibility that experiences such as annoyance with free
riders or with receiving punishment could influence the first vote was eliminated. These differences permitted a test of
whether the 3-Vote designs results were driven by the 3-Vote designs more gradual, hand-on learning. Third, in the 5-Vote
design there were 100 group votes, thus with 160 group votes in total, unanimity in prohibiting perverse punishment
would be very unlikely unless there were strong factors leading in this direction. Finally, with each group voting 5 times
instead of 3, the monotone increase in votes for the rule allowing punish low-but-not-high would be less likely unless there
were strong factors leading to this pattern.
In both the 3- and 5-Vote designs, sessions had 16 subjects assigned randomly to four groups of four subjects whoremained together throughout the session. Each subject knew there were 16 subjects in the experiment room but could not
ARTICLE IN PRESS
1instru
ctions
2instru
ctions
3period
s with
3period
s with
nopunishmen
t
punishment
3instru
ctions
and
1Vote
rd rdnd nd
st
st
8period
s with
the
chosen
rule
2Vo
te
8period
s with
the
chosen
rule
3Vote
8period
s with
the
chosen
rule
3 6 9 12 15 18 21 24 27 30
instr
uctio
ns
and
1Vote
thrd
st
6period
s with
6period
s with
the
chosen
rule
the
chosen
rule
3Vo
te
4Vo
te
3 6 9 12 15 18 21 24 27 30
Periods
nd
2Vo
te
6period
s with
the
chosen
rule
6period
s with
the
chosen
rule
th5
Vote
6period
s with
the
chosen
rule
Periods
3-VOTE DESIGN
5-VOTE DESIGN
Fig. 1. (A) The 3-Vote design and (B) the 5-Vote design.
A. Ertan et al. / European Economic Review 53 (2009) 495511 499
8/3/2019 Ertan - Who to Punish - EER 2009
6/17
tell which among the others in the session belonged to her group. Contribution and punishment choices (if any) were
announced to other group members under randomly changing labels B, C, and D, for ones fellow members, so that the
behaviors of individuals could not be tracked from period to period, except by conjecture. A subject learned the total
amount of punishment she had received, but not which group members punished her or by how much.
Just before the second and later votes of both designs, each subject was informed of the punishment rule chosen in the
preceding votes of each of the four groups in their session along with each groups average contributions and earnings
during the periods the rule governed (the information was new for the most recently taken vote, and was repeated for the
earlier votes). This information was included to speed the adjustment process, if there is one, and of course learning fromthe examples of others occurs in many real-world settings. The downside of providing this information, in terms of the
number of fully independent observations, is substantial, but our main results are statistically significant, in spite of this.
Also, the first vote of each group remains a strictly independent observation, since no information about other groups was
shared until immediately before the second vote.
2.2. Payoffs
All periods shared the same underlying structure. In each period, each subject had to decide on a division of 10
experimental dollars, in integer amounts, between a private account and a group account, before observing the choices of
fellow group members. In a period, subject i earned
yi 10 Ci 0:4X4
j1
Cj (1)
where Ci is is contribution to the public account and the summation is taken overall members ofis group, including i. After
all four made their decisions, each was informed of the contribution choices of the others. When punishment was
permitted, it cost a subject 0.25 experimental dollars to reduce the earnings of another person by 1.00 experimental dollar.
Subject is earnings after punishment were thus
yi 10 Ci 0:4X4j1
Cj 0:25Xjai
Rij Xjai
Rji (2)
where Rij is the number of dollars by which i reduced js earnings, and conversely for Rji. General constraints on punishment
in all treatments were: (i) a subject could not spend more than her/his pre-punishment earnings for the period on reducing
the earnings of other subjects, (ii) a subjects post-punishment earnings for a period would be set to zero if earnings yi in
Eq. (2) were negative, and (iii) a subject i could not spend more on reducing the earnings of a subject j in any period than
would single-handedly reduce js earnings according in (2) to less than zero.10 The Appendix shows the screen design forentering an individuals contribution and punishment decisions.
2.3. Voting
In a voting stage, each subject checked off one of three boxes beside each of three ballot items, on a screen set up as
follows:
I vote to allow a persons earnings to be reduced if
(a) that person assigns less than the average amount11 to the group account Yes No No preference
& & &
(b) that person assigns the average amount to the group account Yes No No preference
& & &
(c) that person assigns more than the average amount to the group account Yes No No preference
& & &
In each group of four subjects, of those expressing a preference in ballot item (a), if there was a majority or tie of No
votes against punishment of low contributors, then punishment of low contributors would be prohibited for the next 8
periods in the 3-Vote design and 6 periods in the 5-Vote design; and if a majority voted Yes, punishment of low
ARTICLE IN PRESS
10 The purpose of (i) and (ii) was to keep all decisions financially independent of each other while maintaining a guaranteed minimum payment for
recruiting reasons. The purpose of (iii) was to help subjects to avoid pointless spending on punishment in view of constraint (ii). Note, however, that it
remained possible for subjects to overspend on punishing in the sense that both subject i and, say, subject k might each spend enough to reduce js
earnings for the period to zero, although only one subjects punishment would actually be effective in that case, given (ii). This could happen because
subjects did not learn of punishment not carried out by or aimed at them, and the design (as in Fehr and Gachter, 2000a) keeps such information private
so as not to encourage free riding on punishment.11 As explained in the instructions, average amount meant the average over the four members of the group in the contribution stage of the period in
question. It could vary among groups and within a given group from one period to the next. Note that a vote to allow punishment of those contributingless than the group average of 4 players is the same event as a vote to allow punishment of those contributing less than the average of the 3 others.
A. Ertan et al. / European Economic Review 53 (2009) 495511500
8/3/2019 Ertan - Who to Punish - EER 2009
7/17
contributors would be allowed; and correspondingly for ballot items (b) and (c).12 After the vote, each groups members
received a message reporting the voting outcome, which was one of 23 8 possible punishment rules (i.e., combinations of
the three ballot item choices).13
When a group voted to restrict punishment, a fixed zero appeared in the punishment box14 for all individuals to which
the restriction applied during the punishment stages that followed each contribution stage. For example, members of a
group that had voted to prohibit all punishment saw the standard punishment stage screen with fixed 0s in all the
punishment boxes, indicating that no punishment was allowed in this case.
We conducted five sessions of each design using a total of 160 subjects (see Table 1).15 All of the sessions
were conducted by computer in a computer lab at Brown University. At the end of each session, cumulative earnings
for the 30 periods were totaled and converted to real money at the rate of 25 experimental dollars to one real dollar, and$5 was added as a participation fee. Sessions typically lasted a little less than 2 hours including instructions, and subjects
overall earnings averaged approximately $25. Instructions for both designs are similar and available in our Working
Paper.16
3. Results
3.1. The voting pattern
In the 3-Vote design there were 720 individual votes (80 subjects each voting 3 times on 3 ballot items), and in the
5-Vote design 1200 individual votes. Table 2 shows the number of individual votes on each ballot item. The table shows a
substantial number of individuals voted to allow punishment of higher-than-average contributors, but many more voting
to allow punishment of less-than-average contributors.
In the 3-Vote design there were 60 group votes (see Table 1), and in the 5-Vote design there were 100 group votes. In the160 group votes altogether, only 4 of the 8 possible combinations of rules were ever chosen by majority rule. These were to
allow: (i) no punishment, 56 group votes; (ii) punishment of lower-than-average contributors and no other punishment,
98 votes; (iii) punishment of low-or-equal-to-average contributors and no other punishment, 4 votes; and (iv) punishment
of equal-to-average contributors and no other punishment, 2 votes. Conspicuously absent from this list is that no group
ever voted to allow punishment of higher-than-average contributors.
Result 1. No group ever voted to allow punishment of higher-than-average contributors, so perverse punishment was ruled out
from the first opportunity to vote.
In ruling out perverse punishment, every group also ruled out unrestricted punishment from the beginning. The two rules
punishment of lower-than-average contributors and no other punishment and punishment of low-or-equal-to-average
ARTICLE IN PRESS
Table 1
Numbers of groups, subjects, and votes.
Session design Number of
sessions
Number of groups
in each session
Number of subjects
in each group
Total number of
subjects
Total number of
group votes on
rules
3-Vote 5 4 4 80 60
5-Vote 5 4 4 80 100
12
We expected few cases where someone was exactly an average contributor, but for symmetry we treated the average contributor on a separateballot item.
13 Only Yes and No votes were counted in determining majorities; for example, if 2 voted Yes and 2 voted No, the proposal did not pass, but if
2 voted Yes, 1 No and 1 No preference, the proposal passed. Subjects were informed of whether a ballot item passed or not, but not by how many
votes or who voted which way.14 See the boxes labeled b, c and d on the lower left portion of the diagram in the appendix showing the screen design.15 Subjects were Brown undergraduates, recruited by (a) distribution of flyers in the mailboxes of all undergraduates, (b) distribution of flyers in a
large introductory economics course, (c) distribution of table slips at a student dining hall, and (d) advertising under the heading of employment in an on-
line campus magazine, the Brown Daily Jolt. Analysis of information provided in the post-experiment debriefing shows that the subjects majored in a large
range of concentrations, with the economics concentration being that of only 15%, about 5% more than the proportion in the overall student body. A little
less than half the subjects had taken no economics courses at the college level. A total of 67% of the subjects were female, somewhat higher than the 53%
share in the general student body. Browns undergraduate population numbers about 5500, so students participating in a given session tended not to
know one another.16 See Ertan et al. (2005). In the instructions and experiment we used neutral language and did not use words like free riding, punishment, and
perverse punishment. See also Cinyabuguma et al. 2006, where we point out that punishment is most clearly perverse when aimed at a groups highest
contributor. Here as in that experiment we distinguish between punishment of above average, average, and below average contributors, rather than
between punishment of highest and of other contributors, because this seems more symmetrical and less likely to convey a biased framing of the problemto subjects.
A. Ertan et al. / European Economic Review 53 (2009) 495511 501
8/3/2019 Ertan - Who to Punish - EER 2009
8/17
contributors and no other punishment are similar and we grouped them together under the heading of allowing
punishment of low-but-not-high contributors. Fig. 2 shows how the group voting evolved, over the sequence of votes for
the 3- and 5-Vote designs. Result 2 summarizes the voting pattern over time.
Result 2. In both designs, a plurality of groups voted in their first vote to prohibit all punishment, with a substantial minority of
groups voting to allow punishment of low-but-not-high contributors. Over the sequence of votes, this ordering reversed, so that in
the final vote, nearly all groups voted to allow punishment of low-but-not-high contributors, with only a few remaining groupsvoting to prohibit all punishment.
ARTICLE IN PRESS
Table 2
Numbers of individual votes to allow punishment of high, average, and low contributors, both designs.
Yes No No preference
Allow punishment of less than average contributors 410 211 19
Allow punishment of average contributors 46 577 17
Allow punishment of above average contributors 111 493 36
0
5
10
15
20
0
5
10
15
20Prohibit all punishment
Punish low-but-not-high
1st vote 2nd vote 3rd vote 1st vote 2nd vote 3rd vote 4th vote 5th voteNum
berofgroupsvotingfortherule
Num
berofgroupsvotingfortherule
Punish equal-to-average
3-VOTE DESIGN 5-VOTE DESIGN
Fig. 2. Evolution of the voting rules: (A) 3-Vote design and (B) the 5-Vote design.
0
2
4
6
8
10
AverageContributions
Periods
3 6 9 12 15 18 21 24 27 30
Periods
3 6 9 18 21 24 27 3013 6 9 12 15 18 21 24 27 303 6 9 18 21 24 27 3010
2
4
6
8
10
AverageContributions
unrestricted (exogenous)
no punishment (exogenous)
no punishmentno punishment
low-but-not-high low-but-not-high
3-VOTE DESIGN 5-VOTE DESIGN
Fig. 3. Average contributions for the two designs, by period and punishment rule: (A) the 3-Vote design and (B) the 5-Vote design.
A. Ertan et al. / European Economic Review 53 (2009) 495511502
8/3/2019 Ertan - Who to Punish - EER 2009
9/17
3.2. Contributions and efficiency
Fig. 3 shows period-by-period contributions of groups for the two composite rules most frequently chosen. In both the
3- and 5-Vote designs, groups that allowed punishment of low-but-not-high contributors achieved substantially higher
levels of contributions than did groups that prohibited punishment altogether. We tested the hypothesis that contributions
are higher for groups choosing the punish low-but-not-high rule than for those choosing the rule of no punishment, against
the null hypothesis of no difference, in two ways. First, to avoid the possible statistical dependence from one period to
another, and from group to group in a session because of the information provided from the second vote onward, we set
aside observations from the second vote onward, and then averaged contributions in the periods between the first andsecond vote (with a probable loss of power). Comparing contributions under no punishment with those under punish low-
but-not-high at the group level, we found, in a one-tailed MannWhitney test, significance at the 0.1% level for 3-Vote
design (11 group observations without and 9 with punishment) and at the 5% level (p 0.034) for 5-Vote design
(13 observations without and 7 with punishment). Second, we tested differences in behaviors from the second vote onward
in Wilcoxon matched-pair tests at the session level, with fewer observations but similar results.17 In both the 3- and 5-Vote
designs, contributions in groups that permitted punishment of low-but-not-high contributors tended to increase over time
until the end-game fall off. In contrast, 3-Vote design groups that prohibited all punishment had falling levels of
contributions over time, replicating earlier results on basic VCMs without punishment, and in the 5-Vote design
contributions had a slightly increasing trend in the middle periods.18
Fig. 4 shows period-by-period efficiency of groups that voted to prohibit all punishment and groups that voted to
prohibit perverse punishment while allowing punishment of low contributors. In Fig. 4A, average period efficiency was
ARTICLE IN PRESS
Periods
3 6 9 12 15 18 21 24 27 30
Periods
13 6 9 12 15 18 21 24 27 3010
Efficiency
Efficiency
0
0.2
0.4
0.6
0.8
1.0
0.2
0.4
0.6
0.8
1.0
unrestricted (exogenous)
no punishment (exogenous)
no punishmentno punishment
low-but-not-high low-but-not-high
3-VOTE DESIGN 5-VOTE DESIGN
Fig. 4. Efficiency for the two designs, by period and punishment rule: (A) the 3-Vote design and (B) the 5-Vote design.
17 For these tests, we have at most one paired observation from each session, namely the average contribution per subject in all groups in the session
that chose one rule, and the corresponding average in all groups that chose the other. This yields up to five paired averaged observations in groups
allowing no punishment and in groups allowing punishment of low contributors in each design and set of periods, although there are fewer observations
for sets of periods when only one rule is observed in one or more sessions. For example, if in a certain session and set of periods two groups allowed no
punishment and two groups allowed punishment of low contributors, we averaged the contributions over the relevant periods in the first two groups and
likewise for the second two groups, giving us one pair of observations for that session; if all four groups follow the same rule, the session offered no
observation for this test. We performed Wilcoxon matched pair (ranked sum) tests on these data with the following results beginning with 3-Vote design:
for periods 714, only 3 sessions have observations of both rules, and although in all cases contributions are higher in the groups allowing punishment,
the p-value of the one-tailed test is 0.055; for periods 1522, with 4 valid sets of observations, the one-tailed test p-value is 0.034; for periods 2330, only
two sessions still have groups not using punishment, so although the ordering remains consistent, the one-tailed test p-value is 0.09. Turning to the
5-Vote design, we have for periods 16, 3 pairs of observations with one-tailed test p-value of 0.055; for periods 712, 4 pairs with one-tailed test p-value
0.034; for periods 1318, 4 pairs, one with contrary ordering, hence one-tailed test p-value 0.072; periods 1924, 4 pairs including one tie, and one-tailed
test p-value of 0.055; periods 2530, only 2 pairs, both with the usual order, but two-tailed test p-value 0.09.18 In Fig. 3A and especially 3B contributions under the endogenously chosen rule of no punishment are more sustained and decline more slowly than
is typical in a VCM without punishment. But endogenous choice includes its process, including repeated voting and the ability to change rules, possibly
leading to commitment effects (see Sutter et al.), restart effects (see the dashed vertical lines in Fig. 3), and selection effects as groups change rules inresponse to free-riding behavior (i.e., groups with the lowest levels of free riding are less likely to adopt a rule allowing punishment).
A. Ertan et al. / European Economic Review 53 (2009) 495511 503
8/3/2019 Ertan - Who to Punish - EER 2009
10/17
always higher under the rules allowing punishment of low contributors, and similarly in Fig. 4B, except in 6 periods.19 As
with contributions, we performed both MannWhitney and Wilcoxon tests of the hypothesis that earnings are higher
under restricted punishment than under no punishment at different levels of aggregation, with the resulting significance
levels varying from 0.1% to 10% in 3-Vote design and from the 10% level to insignificant in 5-Vote design due to the
similarity of earnings under the two rules in some groups of periods.20
Table 3 compares contributions and efficiency under the two most voted rules, and the exogenously imposed conditions
of unrestricted punishment (periods 46 of the 3-Vote design) and no punishment (periods 13).21 The results of the five
tests of Table 3 are summarized in Result 3:
Result 3. For each of the Wilcoxon matched pair tests on contributions, contributions are higher under the rule of punish low-
but-not-high than under the rule of unrestricted punishment, and contributions are higher under the rule of unrestricted
punishment than under the rule of no punishment, and this ordering is transitive. Correspondingly, efficiency is higher under
punish low-but-not high than under no punishment, and efficiency is higher under no punishment that under unrestricted
punishment, and this ordering is transitive.
ARTICLE IN PRESS
Table 3
Effects of the punishment rule on contribution and efficiency.
Test Ranks of contributions by the punishment rule Test Ranks of efficiency by the punishment rule
1 Punish low4unrestricted 4 Punish low4no punishment
2 unrestricted4no punishment 5 Punish low4no punishment
3 unrestricted4no punishment 2 No punishment4unrestricted
4 Punish low 4no punishment 3 No punishment$unrestricted
5 Punish low 4no punishment 1 Punish low 4unrestricted
Notes: One-tailed Wilcoxon matched pair tests. Tests 14 are for groups in the 3-Vote design. Test 1 compares the average contributions of periods 79 in
groups that chose punish low in their first vote matched with the average contributions of the same group in periods 46 of unrestricted punishment
(the number of distinct groups matched and compared is n 9); and correspondingly for efficiency. Test 2 compares contributions (efficiency) for groups
in periods 13 with contributions for the same groups in periods 46, n 20. Test 3 compares periods 46 with 79, for the groups that chose no
punishment in periods 79, n 10. Test 4 compares members of the first groups in each session that switched from a voted rule of no punishment to a
voted rule of punish low, comparing the 8-period averages before and after the switch; if two or more groups in a session switched at the same time, the
behaviors of all of their members are averaged; n 5. Test 5 is the same as Test 4, except it is for the 5-Vote design and 6-period averages are compared
before and after the switch, n 5. A less stringent version of Test 1 considers the first three periods of play in any group that adopted punish low, even if
after the 2nd or 3rd vote. This test has n 17 and the test has a p-valueo0.1% for contributions and o1% for earnings. We also considered less stringent
versions of Tests 4 and 5 that compare each group that switched from a voted rule of no punishment to one of punish low, regardless of whether this
was the first time such a switch had occurred among groups in their session. For Test 4, there are 9 paired observations and the p-value of the test statistic
iso1% for both contributions and earnings. For Test 5, there are 17 paired observations and the p-value iso0.1% for contributions ando5% for earnings.
Punish low indicates punish low-but-not-high. Significance at the 1% level. Significance at the 5% level. Significance at the 10% level, and $insignificant, in one-tailed tests.
19 The difference in earnings between groups with no punishment and those with the punish-low-but-not-high rule (Fig. 4) is smaller than the difference
in contributions (Fig. 3) because (a) an extra E$1 (one experimental dollar) of contribution raises efficiency by only E$0.60, and (b) punish-low-but-not-high
groups achieve higher contributions but incur some punishment costs (E$1.25 per E$1 of punishment assigned). Experimenters with the voluntary
contribution mechanism occasionally seen in the lab a group that achieves high contributions without punishment or other aids, and the two groups that
resisted voting for punishment in the 5-Vote design were of this type, their members perhaps priding themselves on being able to earn as much as those in
other groups even without having recourse to the punishment threat.20
As with contributions, we begin with MannWhitney tests using group level observations from the periods between the 1st and 2nd votes, only.For the 3-Vote design, the one-tailed test p-value in this case is 0.001; for 5-Vote design, the test finds no difference based on punishment rule, consistent
with what Fig. 4B shows in periods 16. Next, we performed Wilcoxon matched pair tests for each set of periods with a maximum of one pair of
observations per session. For 3-Vote design, there are 3 valid pairs for periods 714, all showing higher earnings with punishment, with one-tailed test
p-value of 0.055; for periods 15-22, 4 pairs, p-value 0.034; and for periods 2330, 2 pairs, ordered as expected, p-value of 0.090. For the 5-Vote design,
periods 16 have 3 paired observations but the difference, as with the corresponding MannWhitney test, is not significant; for periods 712, 4 pairs,
again no difference; for periods 1318, 4 pairs, with those with punishment earning more in 3 of 4 cases, thus p-value 0.072; periods 1924, 4 pairs, again
3 favoring those allowing punishment but this time one tie, thus p-value 0.055; periods 2530 only 2 valid pairs are left, with one session displaying one
order, the other the other order, hence no significant difference. Although violating the requirement of full independence of observations, it may
nevertheless help to put these results into perspective and convey a sense of the statistical power lost due to the dissemination of information if we report
also the results for tests using all group level observations for all periods: for 3-Vote design, the p-value of a one-tailed test would be less than 0.001; for
5-Vote design, the p-value of the corresponding test is 0.01.21 For example, in comparing contributions under the rule of punish low-but-not-high with contributions under the (exogenous) rule of unrestricted
punishment in Test 1, we considered the 17 groups of the 3-Vote design that eventually chose the rule allowing punishment of low-but-not-high
contributors (see Fig. 2A). For each of these groups we calculated the average group contribution over the first 3 periods that the group was governed by
this rule. We matched this average with the same groups average contribution over the 3 periods of unrestricted punishment (periods 46 of the 3-Vote
design). In the 17 matched pairs, 14 groups had higher contributions under the rule of punish low-but-not-high, 2 groups had higher contributions underunrestricted punishment, and 1 group was tied. The difference is significant (p 0.001) in a two-tailed Wilcoxon matched pair test.
A. Ertan et al. / European Economic Review 53 (2009) 495511504
8/3/2019 Ertan - Who to Punish - EER 2009
11/17
Because of the difference in the orderings for contributions and efficiency, the sequence or tests in Table 3 for efficiency
are rearranged to show the transitivity. The difference in the orderings of contributions and efficiency is likely due to the
cost of punishment.
3.3. Mitigating the free rider problem
In the literature on public goods games, it is common practice to use the term free rider loosely to denote any
individual who contributes less than the socially optimal amount. It is worth noting, however, that in the absence of
punishment anyone who contributes less than others earns more than these others and thus obtains a free ride on others
contributions; but when punishment is possible a low contributor may fail to earn more, and therefore fail to free ride in
actuality. To compare how successfully different sets of rules address free riding, we adopt in this section a definition that
considers the full outcome, not simply the contribution decision.
Specifically, the symmetric design of this and other VCM experiments suggests a simple definition of free riding: a
subject A experiences free riding when someone else in his group, B, contributes less to the public good but earns more
than A does.22 For a specified punishment rule, sequence of periods, and collection of groups, we define the frequency of
free riding as the number of cases of free riding divided by the number of observations, and an observation as a pairing in a
group, where one subject in a group has a higher contribution than the other subject of the pair. By the design of a basic
VCM without punishment and its payoff Eq. (1), every time someone contributes more than someone else, there is a caseof free riding because the higher contributor always has lower earnings than the lower contributors. Thus, in this definition
of free riding, the frequency of free riding for the basic VCM is 100% (as shown in the first bar ofFig. 5). But the frequency of
free riding may decrease when sufficient punishment is directed at low contributors.
For the rule of unrestricted punishment, overall 20 groups in periods 46 of the 3-Vote Design, there were 205
observations of pairs of unequal contributions by subjects in a group, and 148 cases of free riding, for a frequency of 72%
(see the middle bar). In comparison, the frequency of free riding in the first 3 periods after a group voted for the rule of
punishing low-but-not-high contributors was 35% of the 103 observed unequal pairs. This is a striking reduction,
considering that the rule of punish low-but-not-high does not prevent a higher-than-average contributor from free riding
on a still higher contributor. The difference in free riding between unrestricted punishment and punish low-but-not-high
contributors is significant (po0.0001) in a Fisher exact test.23
Result 4. In comparing VCMs with rules governing punishment, we found the highest frequency of free-riding in groups
operating with no punishment, less free-riding in groups with unrestricted punishment, and least free riding in groups allowingpunishment of low-but-not-high contributors.
A regression analysis of incentives to free ride finds the same ordering as in Result 4. In the regressions below, we follow
Fehr and Gachter in defining subject is absolute negative and positive deviations from the average of others contributions
as
Absolute
Negative
Deviation
jCi Cij ifCio Ci
0 otherwise
(and
Positive
Deviation
jCi Cij ifCio Ci
0 otherwise
(
where Ci P
jaiCj=3 is the average of others contributions.
ARTICLE IN PRESS
0
0.2
0.4
0.6
1.0
no
punishment
unrestricted
punishment
punish low-
but-not-high
Frequency
ofcasesoffree
riding
0.8
total number ofobservations for arule in parentheses
(205)
(103)
(229)
Fig. 5. Frequency of cases of free riding, by punishment rule.
22 Under this definition, if everyone in a group contributed the same low amount, there would be no free riding (it is only defined for unequal
contributors).23 We also did a Wilcoxon matched pair test, which is also significant; see the Working Paper for details.
A. Ertan et al. / European Economic Review 53 (2009) 495511 505
8/3/2019 Ertan - Who to Punish - EER 2009
12/17
Using Fehr and Gachters specification (see their Table 5, p. 991), we first consider behavior in the 3 periods of the
exogenously imposed rule of unrestricted punishment (periods 46 of the 3-Vote design, see column (1) of Table 4), and
compare this with the first 3 periods of the endogenously chosen rule allowing punishment of low-but-not-high in both the
3 and 5-Vote designs (columns (2) and (3)).24 Then we consider behavior for the punish low-but-not-high rule over all the
periods which it governs in the 3- and 5-Vote designs (columns (4) and (5)).
In each regression ofTable 4 the dependent variable is each subject is punishment received in each period (3 periods for
regressions (1), (2), and (3), and up to 24 and 30 periods in regressions (4) and (5) respectively). The independent variables
are the Average Contribution of Others, is Absolute Negative Deviation, is Positive Deviation, and period and group
dummies (not shown).25,26
The results in Table 4 are consistent with those of Fehr and Gachter in that Absolute Negative Deviation always obtains a
significant positive coefficient. The coefficient on the Positive Deviation term in column (1), however, suggests that when itis allowed, perverse punishment exacerbates the incentive problem for high contributors.27 Table 5 re-organizes Table 4s
ARTICLE IN PRESS
Table 4
determinants of punishment received.
Independent variables Dependent variable: experimental dollars of punishment
First three periods of the rule All periods of the rule
Unrestricted
punishment 3-Vote
design (1)
Punish low-but-
not-high 3-Vote
design (2)
Punish low-but-
not-high 5-Vote
design (3)
Punish low-but-
not-high 3-Vote
design (4)
Punish low-but-
not-high 5-Vote
design (5)
Constant 0.74 4.086 19.754 0.587 11.483
(1.067) (2.353) (4.587) (2.222) (4.367)
p 0.490 p 0.088 po0.001 p 0.792 p 0.010
Average contribution by others 0.388 0.230 1.090 0.228 0.654
(0.175) (0.244) (0.405) (0.206) (0.269)
p 0.028 p 0.350 p 0.009 p 0.269 p 0.016
Positive deviation 0.377 n.a. n.a. n.a. n.a.
(0.152)
p 0.014
Absolute negative deviation 0.888 1.217 1.039 1.054 0.967
(0.221) (0.148) (0.122) (0.138) (0.095)
po0.001 po0.001 po0.001 po0.001 po0.001
R2 0.54 0.91 0.86 0.75 0.78
Observations 240 82 92 241 176
Notes: Punishment received as a function of deviation from group average in unrestricted and restricted punishment conditions. OLS regressions with
period and group fixed effects, not shown. Unrestricted punishment, in Column 1, is observed in periods 46, where each observation is for one subject
and one period. Columns 25 include one observation per subject under the rule allowing punishment of low-but-not-high contributors. In Columns 2
and 3, only the first three periods in which a group adopted the rule for the first time are included, while Columns 4 and 5 include all periods of restricted
punishment. Numbers in parentheses are White heteroskedasticity-consistent standard errors. Significance at the 1% level.
Significance at the 5% level. Significance at the 10% level.
24 We include observations for only the first 3 periods under a rule in columns (2) and (3) to achieve comparability with the regression for periods 46
(column (1)), in view of the possibility that learning or other factors might change behaviors with more repetitions.25 In both the unrestricted (Column 1) and restricted (Columns 25) punishment regressions, only the observations of individuals who could
potentially be punished are included. The difference is that under unrestricted punishment, anyone can be punished.26 The regressions were also estimated by the Tobit method, treating 0 punishment observations as potentially left-censored. Resulting coefficients
are similar and similarly significant except in the case corresponding to Column (1), where they are not significant at conventional levels.27 In fact there was considerable perverse punishment in periods 46 of the 3-Vote design. Of the 129 events of punishment, 28% were punishments
aimed at higher-than-average contributors for the period and group in question, 19% at the highest contributor for the period and group in question and
11% at individuals who contributed their full endowment. These percentages are calculated by counting each event (rather than dollar amount) of
someone punishing someone else. They may be atypically high due to the short duration of the unrestricted punishment portion of our experiment. Yet
similarly large amounts of perverse punishment are found in some other studies; see for example Anderson and Putterman (2006), Gachter and
Herrmann (2005), and for a regression result similar to column (1), in which the absolute positive deviation term also has a positive significant coefficient,Ones and Putterman (2007), Table 2.
A. Ertan et al. / European Economic Review 53 (2009) 495511506
8/3/2019 Ertan - Who to Punish - EER 2009
13/17
findings in a manner that makes this clearer. In column (1) of Table 4 the coefficient for Absolute Negative Deviation is
$0.89, the estimated punishment for a $1 reduction in contribution for a less-than-average contributor, under the rule ofunrestricted punishment, in the first 3 periods of the 3-Vote design, and shown as a negative gain of $0.89 in Column (1)
ofTable 5. In Column (2) of Table 4 the coefficient for Absolute Negative Deviation is $1.22, the estimated punishment for a
$1 reduction in contribution for a less than average contributor, under the rule of punish low-but-not-high contributors, in
the first 3 periods of the 3-Vote design, and shown as a negative gain of $1.22 in Column (2) of Table 5, etc.
The $+0.60 throughout Table 5 is the $1 gain in the private account from reducing ones contribution by $1, minus the
$0.40 loss in the individuals earnings from the group account. In Column (1) ofTable 4 the coefficient for Positive Deviation
is $0.38, the estimated punishment for each $1 of additional contribution for a higher-than-average contributor, under the
rule of unrestricted punishment, in periods 46 of the 3-Vote design. The $+0.38 in Column (1) of Table 5 is the positive
gain from contributing $1 less and avoiding $0.38 in perverse punishment, for a higher-than-average contributor. The cases
labeled n.a. in Table 5 are for the rule of punish low-but-not-high in Columns (2)(5), in which case punishment of higher-
than-average contributors is not allowed.
Table 5 shows that for less-than-average contributors the net gain from contributing $1 less is negative for each of the
cases in Columns (1)(5). The $0.29 in Column (1) suggests that unrestricted punishment can reverse a subjects incentiveto free ride, for a subject contributing less than average, replicating Fehr and Ga chters earlier finding for the case of less-
than-average contributors. But the negative gains for less-than-average contributors is even more negative in Columns
(2)(5), suggesting that the incentive against free riding is strengthened for less-than-average contributors under the rule
of punish low-but-not-high.
Table 5 suggests that the incentives to contribute $1 less for higher-than-average contributors is not reversed under
unrestricted punishment or the rule of punish low-but-not-high. In Column (1) under unrestricted punishment, a subject
with a higher-than-average contribution makes an estimated net gain of $0.98 from contributing $1 less (a gain of $0.38
from reduced perverse punishment added to the $0.60 gain from shifting away from the group account). In Columns
(2)(5), under the rule of punish low-but-not-high, a higher-than-average contributor bears no punishment, but still gains
the $0.60 from a $1 shift from the public account. While neither rule reverses the incentive for a higher-than-average
contributor to contribute less, the incentive toward free riding is less under the rule of punish low-but-not-high than under
unrestricted punishment.28
3.4. Do subjects vote according to their type?
We conjectured that even though some subjects use opportunities to perversely punish (when punishment is
unrestricted) and would likely vote to allow perverse punishment in our experiment, punishment of high contributors
might nonetheless be ruled out since few groups would have a majority of members of this type. Results at group level are
consistent with this conjecture. Is there also evidence at the level of individuals, however, that subjects tended to vote
according to type? Logit regressions provide some affirmative evidence.
We estimated regressions in which the dependent variable is 1 if a subject voted to permit punishment specified by a
particular rule and 0 otherwise. Explanatory variables included the subjects contributions relative to their group averages
ARTICLE IN PRESS
Table 5
Incentives to contribute $1 less.
(1) Unrestricted
punishment
(2) Punish low-but-not-
high
(3) Punish low-but-not-
high
(4) Punish low-but-not-
high
(5) Punish low-but-not-
high
Less-than-average contributors, subject to punishment
Abs. neg.
deviation
$0.89 1.22 1.04 1.05 0.97
$1 account shift +0.60 +0.60 +0.60 +0.60 +0.60Net gain $0.29 0.62 0.44 0.45 0.37
Higher-than-average contributors, subject to punishment only in Column (1)
Positive
deviation
+0.38 n.a. n.a. n.a. n.a.
$1 account shift +0.60 +0.60 +0.60 +0.60 +0.60
Net gain +0.98 +0.60 +0.60 +0.60 +0.60
Note: Net gain is the change in earnings from contributing $1 less.
28 When subjects make their contribution decision, they do not know what the other subjects contributions will be, and are uncertain of what will be
the average and its boundary line of punishment risk. This uncertainty creates an incentive toward higher contributions to be on the safe side of theunknown boundary.
A. Ertan et al. / European Economic Review 53 (2009) 495511 507
8/3/2019 Ertan - Who to Punish - EER 2009
14/17
during the periods preceding each vote, measures of how much punishment they had given and received, and period (i.e.
vote) and group dummies. The coefficients on the subjects relative contribution were positive in the regressions on voting
to allow punishment of low contributors, significant at the 5% level or better for both the 3- and the 5-Vote designs, and
negative in the regressions on voting to allow punishment of high contributors, significant at the 10% level in the regression
for the 3-Vote but not in that for the 5-Vote design.
This evidence suggests that the subjects were more (less) likely to vote to allow punishment of less-(greater-)than-average
contributors the higher on average was their contribution above their groups average contribution in the 8 (6) previous
periods. Details are in the Working Paper (Ertan et al., 2005).
4. Discussion and interpretation
We discuss our results and interpretation under the following headings: (a) A rough calculation on the plausibility that
no group would ever allow punishment of high contributors in the 160 group votes of the combined 3- and 5-Vote designs;
(b) Institutional choice and its evolution with and without information on other groups performance; (c) Distaste for
punishment and the role of opportunities to reconsider, (d) Variability of experimental results; (e) Implementation;
(f) What the experiment appears to tell us about models of heterogeneous preference types; (g) Heterogeneous preferences
in other voting models.
(a) On the plausibility of unanimously prohibiting perverse punishment in 160 group votes. Even if only a quarter of subjects
are prone to perversely punishing, it might seem implausibly rare that not a single group vote produced a majority for
allowing it. As an anonymous referee commented: [t]he fact that no group ever allowed punishment of high contributorswill make readers suspicious, since results of such clarity are quite rare. How improbable is the unanimity result? Simple
calculations suggest a wide range in the assessment of probability.
Consider the following composite hypothesis: (i) about 25% of punishment is targeted on higher-than-average
contributors when punishment is unrestricted (see Section 1.1 and footnote 27); (ii) an individual who has a preference
toward punishing high contributors is just as likely to punish as an individual who has a preference against such
punishment (i.e., the proportion of subjects of given preference is the same as the proportion of corresponding punishment
observations); (iii) perverse punishers are likely to vote their preference type to allow punishing high contributors, and
similarly normal punishers are likely to vote their preference type to prohibit punishing high contributors (evidence for
this from the logit analysis in Section 3.4); and (iv) the preference types are stable and randomly distributed.
With these rough assumptions the binomial probability that a group of four subjects chooses to allow perverse
punishment by a majority of 3 or 4 votes for the third ballot item is 0.0508 (we are setting aside complications from
abstentions), the probability of prohibiting perverse punishment is 0.9492, and the expected number of group votes
prohibiting perverse punishment is 152 out of the combined 160 group votes in the 3- and 5-Vote designs. This calculationroughly suggests that the vast majority of group votes will be to prohibit perverse punishment. But the binomial
probability of unanimity, the event that 160 out of 160 votes prohibit perverse punishment, is small, 0.0002.
However, this calculation depends on the assumption of statistical independence in type from period to period even for
the same individual, and this is an unrealistically strong assumption. Consider another simple but unrealistic assumption in
the other direction: that preference types and beliefs are so stable that they remain fixed from period to period. Then it is as
though there were only 40 independent group-level observations in the 10 sessions of the experiment and the same votes
and other decisions are repeated many times. Then the expected number of votes prohibiting perverse punishment is 38
out of the combined 40 group votes in the 3- and 5-Vote designs, and the binomial probability of unanimity, that 40 out 40
votes prohibit perverse punishment, is much larger, 0.12.
A glance at Fig. 2 shows that this second assumption on statistical dependence is unrealistically strong in the
other directioni.e., views change over time. Presumably the probability of unanimity is somewhere between 0.0002
and 0.12, likely pretty far from the two extreme calculations. The calculations serve to remind us of the sensitivity
of assumptions on statistical independence, when there are aggregations over many periods, and of the other uncertaintiesin (i)(iv).
(b) On institutional choice, learning and evolution: Our experiment is one of several recent ones in which institutions are
chosen by subjects through voting. Despite its stylized character, we think it suggests the considerable potential that the
experimental method has for contributing to our understanding of how institutions emerge and evolve. We note again our
choice of promoting a more accelerated and informed evolution of institutions by sharing information about outcomes
among groups in given sessions, despite some cost to statistical independence. We would argue that when real-world
groups decide on rules and practices, they often have access to information about the experience of similar groups, so the
information spill-over in the experiment has a real-to-life quality. We want to emphasize, however, that 40 first votes were
taken by the 160 subjects in our core treatments, and 4 more by the 16 subjects in our pilot experiment, and that each of
these votes occurred with no information about others choices or outcomes. Apart from the evolution towards more use of
punish low-but-not-high with additional votes, our findingsunanimous rejection of allowing punishment of high
contributors in the initial vote, higher contributions and earnings with than without punishment of low contributors, lower
earnings with unrestricted punishment, lower frequency of successful free riding under the punish low rule
are allsupported by tests using only decisions taken prior to information dissemination, as well as by tests using the full data set.
ARTICLE IN PRESS
A. Ertan et al. / European Economic Review 53 (2009) 495511508
8/3/2019 Ertan - Who to Punish - EER 2009
15/17
(c) Distaste for punishment and opportunities to reconsider. In one treatment of a set of 4-person VCM interactions similar
to those in this paper, Bochet et al. (2006) let subjects communicate in a chat room before the 1st, 4th, and 7th periods of
ten rounds of play. A noticeable finding was that out of 12 groups in the chat room treatment with punishment
opportunities, not a single group discussed an explicit strategy of punishing low contributors, and in some groups,
members messages expressed the view that the punishment option was a trap set by the experimenters to reduce
earnings. A distaste for punishment may help to account for the rejection of all forms of punishment in many of the initial
votes in our experiment, for the rejection of punishment by most groups in Sutter et al. (2005) and Botelho et al. (2005),
and for the initial preference shown for being in the group without punishment by most subjects in Gurerk et al. (2005,2006).
While eschewing the punishment idea in their deliberations, however, many of Bochet et al.s subjects engaged in costly
punishment when group members defected from their verbal agreements to contribute. And subjects in the present
experiment seem to warm to the idea of allowing punishment of low contributors as they experience the sense of
resentment of or anger at free riders and as they learn that groups permitting punishment tend to have higher earnings.
The institutional choices made in our paper and in those of Sutter et al. (2005), Botelho et al. (2005), and Gurerk et al.
(2005, 2006) might seem at first glance to be at odds, since our subjects and Gu rerk et al.s subjects seem to show a greater
overall preference for punishment than do those of Sutter et al. and Botelho et al. However, all share a common reluctance
to adopt punishment rules at the outset, and much of the difference in overall outcomes may be attributed to the fact that
our subjects and Gurerk et al.s subjects have many opportunities to change rules or groups, while Sutter et al. and Botelho
et al. subjects have only one opportunity to vote on rules. Also, our subjects might have voted more like Botelho et al.s had
they been required to choose between no punishment and unrestricted punishment only, since the results of periods 46 of
our 3-Vote design are consistent with Botelho et al.s point that subjects may be worse, not better, off with (unrestricted)punishment.
(d) On variability: At the same time, even a brief review of the literature of punishment in social dilemmas shows a large
variability in experimental results. Experimentalists are well-aware that small changes in experimental design and
wording of instructions can affect experimental results, not just for experiments on punishment but quite generally. Still,
the literature on punishment in social dilemmas seems to yield an especially large variability in results. Our suspicion is
that this variability is partly due to punishment behavior itself being scattershot and variable. Thus there may not be a
simple general answer to the question of whether punishment in social dilemmas is a good or bad thing. The effects of
punishment may vary so much with the specific conditions that there is no general answer.29
(e) On implementation: In the experiment, once a rule of punishment is chosen by vote, it is easily implemented by
the experiments computer software. In the real world, there is no such easy implementation. Nonetheless, in the
practical world most organizations are hierarchical or a blend of hierarchy and symmetric volunteer elements, and
organizations often find ways of managing, albeit imperfectly, who gets punished. For example, in hierarchical
organizations if managers were more aware of the possibly high frequencies of perverse punishers and high costs inefficiency, they might focus more on mitigation. Once aware, managers could work to limit decentralized punishment
and attempt to instill norms of cooperation in much the same manner that managers attempt to control bullying behavior
and harassment.
(f) On heterogeneous preferences: There is a continuing discussion about keeping the standard model which limits the
type of preferences to self-regarding (individual profit maximizing) preferences. In favor of this approach is that it is
parsimonious and often leads to specific predictions, which in turn are often consistent with experimental results.
However, in this experiment, we dont see how we can interpret the results without positing some form of other-regarding
preference types (e.g. conditional cooperators, perverse punishers). Other experiments on social dilemmas also suggest the
need for modeling heterogeneous preference types, including both self-regarding and other-regarding or reciprocating
types. Our experiment adds to the interpretation of heterogeneity, in a particularly striking way.
An appeal of modeling only homogeneous self-regarding preferences is that introducing heterogeneous preferences is
too mushy, allowing almost any prediction and rationalizing almost any observed result. But our experiment has a strong
and consistent pattern to it, suggesting that the existence of heterogeneous preferences need not always lead toindeterminate results.
(g) On heterogeneous preferences in other voting models: Our analysis suggests that the presence of multiple preference
types may be important to predicting voting outcomes, and this may be true for other instances of public choice as well.
Pork barrel politics provides an example. Ordeshooks (1986, pp. 210215) model of pork barrel politics is one of a social
dilemma where what is good for an individual legislator is bad for society as a whole. For example, Senator Stevens benefits
ARTICLE IN PRESS
29 The fact that Gurerk et al.s subjects earn more with than without unrestricted punishment while the comparison goes the opposite way for our
subjects and Botelho et al.s illustrates this variability. In personal communication, Simon Ga chter reported that he and his collaborators found large
differences in the frequency of perverse punishment and, correspondingly, in the benefit or lack of benefit of introducing a punishment option across
subject pools in different countries and settings (a finding documented shortly before our paper went to press in the remarkable study by Herrmann et al.,2008).
A. Ertan et al. / European Economic Review 53 (2009) 495511 509
8/3/2019 Ertan - Who to Punish - EER 2009
16/17
by bringing pork to his district (the bridge to nowhere), while other Senators lose because their districts end up paying
for the bridge, even when the net benefits of the bridge are negative. Why then dont the other Senators outvote Stevens?
Ordeshooks answer is that in a pork bill, there can easily be an equilibrium where there are just enough ear-marked pork
projects to form a winning coalition, even when each of the projects has negative net benefits.
Ordeshooks analysis depends heavily on the assumption that each legislator is narrowly self-interested (the self-
interest may be in the form of an increased probability of re-election). In fact, the assumption of a single preference type of
self-interest is still common in voting models in the political science literature.
Our experiment and others on VCMs, the dictator game, and the centipede game (McKelvey and Palfrey, 1992) suggestthat the assumption of homogeneous preference types can be misleading. If one allows for the possibility of heterogeneous
preference types in Ordeshooks model, the equilibrium can shift and the predicted outcomes are not always as dire as
Ordeshooks original model suggests. For example, some senators may care about doing the right thing, or some voters
may choose not to reward a senator who joins a pork coalition, so the situation may be more fluid than it appears in
Ordeshooks model.
But if the situation is this fluid, can anything happen? To deal with this possibility we focused on observed
behavior under the specific experimental conditions, and then interpreted the specific results in terms of heterogeneous
preferences. We believe that this approach can work in experimental studies of other voting models, such as Ordeshooks,
even when there are signs of heterogeneity and odd behavior, as there were in our study of voting and perverse
punishment.
As another example, Meltzer and Richards (1981) model of the level of redistributive taxation uses a median voter
solution assuming strictly self-regarding preferences. More accurate explanations of the level of redistribution and its
variation over time and place would consider the strength of preferences for greater equality, on the parts of some citizens,and resentment of the undeserving poor, on the parts of others (see, for instance, Benabou and Tirole, 2005). Such an
addition of two almost opposite social preference types alongside self-interested types resembles the situation studied in
this paper, where self-interested subjects co-exist with both cooperation-preferring and cooperation-resisting types, with
the associated demographic leading to predictable voting outcomes.30
Appendix
Fig. A1 is the screen design for an individual to enter her contribution to the group account (box a), to learning of others
contributions (boxes b, c, and d), to enter her punishment decisions (boxes b0, c0, and d0), and to observe the computers
calculation of net earnings for a period.
ARTICLE IN PRESS
Put in groupaccount
Total ingroup account
Reduce others
earnings
You B C D
a b c d
a b c d
e = a+b+c+d
h = a +b+c +dSum of yourreductions of
others earnings
Earnings fromgroup account
f = 0.4e
i = 0.25h
Cost of yourreductions of
others earnings
j
k = g + f i j
Total of othersreductions ofyour earnings
Net earningsthis round
Earnings fromprivate account
g = 10.0 a
Fig. A1. Screen design for entering contribution and punishment decisions, receiving information, and calculating net earnings.
30 See Camerer and Fehr (2004) for other applications of other-regarding preferences to the study of public choice.
A. Ertan et al. / European Economic Revie