Ertan - Who to Punish - EER 2009

8/3/2019 Ertan - Who to Punish - EER 2009

1/17

Who to punish? Individual decisions and majority rule in mitigatingthe free rider problem$

Arhan Ertan a, Talbot Page b, Louis Putterman c,

a Tufts University, MA, USAb Brown University (Emeritus), RI, USAc Department of Economics, Box B, Brown University, Providence, RI 02912, USA

a r t i c l e i n f o

Article history:

Received 2 August 2006

Accepted 21 September 2008Available online 17 October 2008

JEL classification:

C91

C92

D71

H41

Keywords:

Public goods

Collective actionPunishment

Voting

Institutions

a b s t r a c t

We study a voluntary contributions mechanism in which punishment may be allowed,

depending on subjects voted rules. We found that out of 160 group votes, even when

groups had no prior experience with unrestricted punishment, no group ever voted to

allow unrestricted punishment and no group ever allowed punishment of high

contributors. Over a series of votes and periods of learning we found a distinct

reluctance to allow any punishment at the beginning, with a gradual but clear evolution

toward allowing punishment of low contributors. And groups allowing punishment of

only low contributors achieved levels of cooperation and efficiency that are among the

highest in the literature on social dilemmas.

& 2008 Elsevier B.V. All rights reserved.

0. Introduction

Organizations such as teams, firms, and military units depend on cooperative effort to succeed, and organizational

leadership often attempts to increase cooperative contributions and/or reduce free riding by instituting rewards and

sanctions and by building a culture or norms of cooperation. Problems of cooperation, for example, in efforts to limit

greenhouse gases or depletion of fisheries, and free riding in efforts to provide public goods share a common characteristic:

incentives for the individual that lead to inefficiency in the group. Such problems are often called social dilemmas and havebeen the focus of numerous studies using the method of the laboratory decision-making experiment.

In one key social dilemma experiment, Ostrom et al. (1992) found for a model of overuse of a commons, that allowing

face-to-face communication and allowing the subjects to sanction (punish) each other led to a significant increase in

cooperative behavior. In another influential experiment, Fehr and Gachter (2000a) found that, in a voluntary contributions

mechanism1 (VCM), the opportunity for punishment had a dramatic positive effect on contributions, but this finding did

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/eer

European Economic Review

ARTICLE IN PRESS

0014-2921/$ - see front matter & 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.euroecorev.2008.09.007

$ The research reported here was supported by N.S.F. Grant SES-0001769. We are indebted to two anonymous referees for helpful comments and

suggestions. Corresponding author. Tel.: +1 401863 3837; fax: +1 4018631970.

E-mail address: [email protected] (L. Putterman).1 The basic voluntary contributions mechanism without punishment is a particularly sharp social dilemma, in which each individual maximizes his

payoff when others contribute their full endowments but he himself contributes nothing; yet when everyone contributes nothing, efficiency is minimized.

European Economic Review 53 (2009) 495511
http://www.sciencedirect.com/science/journal/eerhttp://www.elsevier.com/locate/eerhttp://dx.doi.org/10.1016/j.euroecorev.2008.09.007mailto:[email protected]:[email protected]://dx.doi.org/10.1016/j.euroecorev.2008.09.007http://www.elsevier.com/locate/eerhttp://www.sciencedirect.com/science/journal/eer


2/17

not extend to average efficiency. In both experiments, punishment was made possible by allowing a subject to pay out of

his/her earnings to reduce by a larger amount the earnings of another. Since punishment is costly to both the punisher and

the punished, it was not surprising to observe that punishment had a less positive effect on efficiency than on contributions

in VCMs or overuse in commons problems.

But at the same time practically everyone who studied the role of punishment noticed a curious phenomenon. While

most punishment was targeted at low contributors in VCMs (and overusers in commons problems), a considerable amount

of punishment was targeted at cooperators (high contributors in VCMs, low extractors in commons problems). The

frequency of punishing high contributors in VCMs was too high to be explained as mistakes. Cinyabugama et al. (2006)estimated that about 15% of punishment in several experiments2 of this type was targeted at the highest contributor in a

group, and about 25% at those who contributed more than their groups average. Researchers suggested several possible

explanations: for example, revenge and harming others more than oneself to win relatively (tournament style), and moral

resentment.3 These possible explanations suggested multiple preference typesincluding other-directed preferences

(revenge, etc.) in addition to the self-interested preference for maximizing earnings found in most economic models. It

seemed to us that the phenomenon of punishing high contributors in VCMs was more frequent than commonly recognized

and likely to have adverse effects on contributions and efficiency. In practical life, if decentralized punishment of high

contributors by resentful free riders has comparably high frequencies it would be a serous problem.4 We called the

punishment of high contributors perverse punishment because of its seeming inconsistency with self-interested earnings

maximization.

Because the directing of a significant fraction of punishment at high contributors appears to limit the usefulness of

decentralized punishment as a mechanism or institution, we asked whether the problem might be corrected if groups

of individuals were provided with the opportunity to choose their own rules governing the application of punishment.We conducted an experiment in which rules determining who can be punished are chosen by a series of votes, in order to

see how the choices of rules evolved over time and how these choices affected cooperation and efficiency. In our

experiment, subjects voted on three ballot items determining independently whether group members could reduce the

earnings of low (below average), of average, and of high (above average) contributors to their group account (public good).

We found that out of 160 group votes, no group ever voted to allow punishment of high contributors. Over a series of

votes and periods of learning we found a distinct reluctance to allow any punishment at the beginning, with a gradual but

clear evolution toward allowing punishment of low contributors. And groups adopting this rule of controlling perverse

punishment achieved levels of contributions and efficiency that are among the highest in the literature on social dilemmas.

Our main contributions are: to show how rules of punishment can evolve endogenously to address free rider problems,

within the opportunities of institutional choice presented to the experimental subjects; and to show that perverse

punishment can have strong negative effects on contributions and efficiency but is amenable to group control.

These contributions, listed more specifically in Results 14, are based on the observed behaviors in the experiment and

rely on direct counts or non-parametric tests using fully independent observations at the group or session level. Toward theend of the result section, we also discuss regressions estimated using individual-level observations, here using group and

period fixed effects to partially address the possible interdependence among observations.

The paper is organized as follows. Section 1 reviews the theoretical outlook that informs our own and related research,

then discusses the related literature. Section 2 presents the experimental design, Section 3 presents the analysis, and

Section 4 discusses interpretative issues.

1. Theoretical intuitions and literature

1.1. Theory

Several social dilemmas have an iterated dominant strategy equilibrium, which implies a unique Nash equilibrium

without any cooperation. The finitely repeated prisoners dilemma (Kreps et al., 1982), the centipede game (McKelvey and

Palfrey, 1992), and VCMs are examples having a unique Nash equilibrium with no cooperation. (One of the assumptions

that leads to this result is that of a single preference type of payoff maximizers, all of whom believe that all the players are

payoff maximizers.) Kreps et al. found this equilibrium result disturbing because many experiments on the prisoners

dilemma showed a pattern of substantial cooperation. A little later, McKelvey and Palfrey (1992) developed an exponential

version of the centipede game for which there are large benefits of cooperation, a unique Nash equilibrium with no

cooperation, and substantial cooperation in experimental observations. McKelvey and Palfrey thought the centipede game

ARTICLE IN PRESS

2 In particular, Fehr and Gachter (2000a), Page et al. (2005), and Bochet et al. (2006).3 A low contributing individual may be made uncomfortable by a high contributors action, feel moral resentment and want to get even by

punishing the high contributor. An experimental subject gave us this explanation in a debriefing statement.4 Cinyabuguma et al. find support for the idea that most punishment of high contributors by low ones may reflect retaliatory motives. For an

experiment on retaliatory punishment, see Nikiforakis (2008). Recently, the on-line auction site eBay announced a clamp-down on tit-for-tat feedback

to prevent sellers from leaving negative feedback on buyers. Today, the biggest issue with the system is that buyers are more afraid than ever to leave

honest, accurate feedback because of the threat of retaliation, explained eBay North America president Bill Cobb in his January 29, 2008 announcement(Bangeman, 2008).

A. Ertan et al. / European Economic Review 53 (2009) 495511496


3/17

was an even simpler andymore compelling example of the Nash equilibriums predictive failure than is the prisoners

dilemma.

In response to the Nash equilibriums predictive failure under assumption of payoff maximizing as the only preference

type, Kreps et al. and McKelvey and Palfrey modeled the two social dilemmas as (different) games of incomplete

information with multiple preference types. Kreps et al. used two types: payoff maximizers and tit-for-tat players.

McKelvey and Palfrey used two types: payoff maximizers and altruists. With multiple types and incomplete information,

iterated dominance no longer is implied. Instead, the researchers solved for BayesNash equilibria that more accurately

predicted substantial cooperation until near the end of the game, as observed experimentally.It is easy to check that for the VCM with a punishment opportunity and voting in our experiment, under the assumption

of payoff maximizers as the single type, iterated dominance implies a Nash equilibrium predicting no cooperation and no

punishment (and any voting pattern, including 100% abstentions). But this implication no longer holds when there are

multiple preference types. This non-implication is suggestive because in numerous experiments researchers found

substantial contributions in finitely repeated VCMs without punishment (see Davis and Holt, 1993; Ledyard, 1995, for

surveys). And in VCM experiments with punishment but without voting, Fehr and Gachter (2000a, b, 2002), Carpenter and

Matthews (2002), Masclet et al. (2003), Page et al. (2005), and Sefton et al. (2002) found substantial contributions and

substantial punishment. These studies and the non-implication suggest the presence of multiple preference types in our

experiment and other VCMs.

Comparison with the prisoners dilemma, the centipede game, and other Bayesian games points toward several

predictions. Payoff maximizers are likely to mimic cooperators to encourage their cooperation, because this is a reasonable

strategy for increasing their payoffs. Cooperators are likely to punish low contributors because they dislike free riding

(see Gintis et al., 2005), and this signals and warns free riders to contribute more. Perverse punishers appear, however, tobe the opposite of cooperators. Fehr and Gachter (2000a, b) interpreted their results primarily in terms of the interaction of

two preference types: purely selfish players (what we have called payoff maximizers) and a conditional cooperator type

(see also Hoffman et al., 1998). Fischbacher et al. (2001) and Fischbacher and Gachter (2006) used a strategy method

protocol to estimate that about 50% of those in their subject pools were of this second type. 5 Further, punishment of high

contributors, observed by Gachter and Herrmann (2005), Gachter et al. (2005), and Cinyabugama et al. (2006), suggests

that when punishment is an available option, the presence of a third type, whom we call perverse punishers, should also

be taken into account. Based on the work mentioned above, we expected perverse punishers to account for not more than

25% of our subjects.6

A word of cautionwe believe that these preference types are somewhat stylized interpretations rather than sharply

fixed, non-overlapping characteristics. With this in mind, intuitively the interaction of the three types in our experiment

leads to predictions regarding voting. It seems likely that conditional cooperators would vote to allow punishment of low

contributors and prohibit punishment of high contributors, and payoff maximizers might also vote similarly.7 It also seems

likely that perverse punishers would vote to allow punishment of high contributors. But being in a minority, they wouldlikely be outvoted, although by chance they might form a majority in a few out of a large number of randomly formed

groups.

Considering multiple preference types has been useful in explaining results in a large number of basic VCMs and VCMs

with punishment. But VCMs are more complicated than the prisoners dilemma or the centipede game, and to our

knowledge, solving even the basic VCM for BayesNash equilibria has so far been intractable. We attempt here only to use

the intuitions developed above to guide interpretation of observed behaviors, hopefully contributing both to a practical

understanding of social dilemmas and to future refinements of theory.

1.2. Related literature

While our paper is the first to directly address effects of perverse punishment by allowing or prohibiting intermediate

restrictions on punishment, there are related papers on the endogenous choice of institutional rules that allow or prohibit

punishment altogether, or exogenously affect the role of punishment. Gurerk et al. (2005, 2006) designed two experimentsthat allowed subjects to vote with their feet in choosing between two groups, one allowing unrestricted punishment and

the other no punishment. Subjects initially avoided the group with punishment, but with repeated opportunities to choose,

almost all eventually chose the group with punishment, in result achieving high contributions and efficiency. Their

ARTICLE IN PRESS

5 In a different experimental setup, a VCM with endogenous group formation, Page et al. (2005) estimated a 59% proportion of conditional

cooperators.6 When subjects from a population with this rough demography of types are randomly assigned to play a VCM in small groups, the groups may

differ from one another in cooperation levels due to random differences in which types are represented and with what frequencies. Ones and Putterman

(2007) grouped together on the one hand subjects displaying more cooperative behaviors and on the other hand subjects displaying less cooperation and

more perverse punishing. They found, predictably, that the former achieved higher contributions and earnings than the latter.7 Incentives in voting of course differ from those in a private action. For example, a payoff maximizer may prefer free riding to contributing, but at the

same time find it in his interest to vote to allow punishment of low contributors. In his calculation he may believe that by such a rule he would lose the

benefit from his own free riding, but be more than compensated by many erstwhile free riders who will contribute more in response to the threatened

punishment of free riding. And in a population of mixed preference types, a payoff maximizers calculations of the net advantage from the rule depends onhis beliefs on whether there will be a sufficient number willing to punish free riders and make the threat of punishment effective.

A. Ertan et al. / European Economic Review 53 (2009) 495511 497


4/17

experiments differ from ours in that their subjects choose groups with either no punishment or unrestricted punishment,

while our subjects have fixed groups and vote over alternative restrictions on punishment.

Botelho et al. (2005) designed an experiment that allowed subjects to choose between an institution with unrestricted

punishment and another without any punishment. They found that the subjects voted overwhelmingly for the institution

without punishment. In a related experiment, Sutter et al. (2005) found that subjects most often voted to allow rewards

rather than punishment even though the latter raised contributions more. These experiments differed from ours by

allowing only one vote for each group, and not allowing choices of partially restricted punishment.

Botelho et al. (2005) also analyzed Fehr and Gachters (2000a, 2002) data, finding lower earnings when punishment wasallowed than when it was not allowed.8 In contrast, Gurerk et al. (2005, 2006) found earnings (efficiency) as high or higher

in VCMs with unrestricted punishment than in VCMs without punishment opportunities. Masclet et al. (2003) also found

higher earnings with unrestricted punishment compared with no punishment allowed. By varying the ratio of

punishments cost to the punisher versus the target of punishment, Nikiforakis and Normann (2008) and Egas and Riedl

(2005) shed light on the conditions under which the unrestricted opportunity to punish does and does not increases

efficiency.

Noting the detrimental effects of the punishment of high contributors, Cinyabugama et al. (2006) designed a procedure

they believed might reduce its incidence. The first two stages of the experiment were an ordinary VCM followed by a

punishment opportunity. But in a third stage, each subject learned the frequency of each other subjects punishment of

high, average, and low contributors, and each was given an opportunity to punish on the basis of this information. The

authors found that this incentive system led to less perverse punishment in the second stage, but fairly frequent perverse

punishment in the third stage, for example subjects who punished free riders in the second stage were then severely

punished in the third stage, undermining the incentives in the first stages.Gachter and Herrmann (2005) used population groupings (young rural Russians, older rural Russians, young urban

Russians, older urban Russians) to study the effects of unrestricted punishment. They found large variations among the

groups in frequency of punishing high contributors and the harmful effects of this perverse punishment which, they wrote,

can undermine the positive impact of punishment for cooperation and thereby limit the success of self-governance. Like

Cinyabuguma et al. and our paper, Gachter and Herrmann emphasized the detrimental effect of perverse punishment on

efficiency.

Casari and Luini (2005) compared effects of exogenously imposed punishment rules, including a rule requiring a subject

to be targeted for punishment by at least two group members (in a group of five) before the punishment takes effect. They

found that the restriction decreased punishment of high contributors and raised efficiency, but in this treatment the

average contribution was quite low, not exceeding half of the endowment.

2. Design and predictions

2.1. Basic design

Our design extends the basic VCM in which subjects are randomly assigned to groups that remain fixed (a partners

design) for a finite and known number of periods. Each subject in a group is provided with an initial endowment that he or

she is asked to divide between a private account and a group account. Any funds placed in the group account are scaled up

by the experimenter and divided equally among the subjects in the group without regard to individual contribution. To this

basic VCM we added punishment and voting opportunities in two designs to study how rules restricting or allowing

punishment might emerge initially and evolve over a series of votes. In the experiment, individuals act anonymously and

without communication.

We initially conducted a pilot experiment in which there were four partner groups with four subjects in each group. At

the beginning of the 1st period, the subjects received instructions for playing a basic VCM without punishment, and each

group played 10 periods of this repeated game (details of the basic VCM and its payoff function (1) are shown below). At the

beginning of the 10th period the subjects received instructions for playing a VCM with unrestricted punishment, and each

group played 10 periods of this repeated game (details and payoff function (2) shown below). So far, this design is similar to

Fehr and Gachter (2000a). But following these first 20 periods, each group voted on who if anyone could be punished in a

final 10 periods (details of the ballot process is shown below). Of the four group votes, all four voted to prohibit punishment

of higher-than-average contributors; one group prohibited all punishment and the other three groups voted to allow

punishment of low contributors.9

ARTICLE IN PRESS

8 Cinyabuguma et al. (2004) found similar results for Fehr and Gachter (2000a) and in public goods and sanctions experiments by Carpenter and

Matthews (2002), Sefton et al. (2002), Page et al. (2005), and Bochet et al. (2006). In their working paper, Cinyabuguma et al. (2004) used regression to

study the impact of punishment upon changes in the punished subjects contribution, and found that each dollar of punishment of a groups highest

contributor substantially decreased his or her next period contribution. The authors concluded that a major reason why punishment reduces efficiency in

the experiments mentioned is the punishment of high contributors. Their calculations showed that in the related public goods and sanctions experiments

by Bochet et al. and Page et al., earnings would have been higher with punishment than without it but for the presence of perverse punishment.

9 Due to a computer problem, the voted rules were not properly implemented; nonetheless, decisions up to and including the vote remainuncompromised, allowing us to make inferences from this pilot experiment occasionally in what follows.



5/17

Following this pilot, we wanted to see not only what rules are chosen initially but also what voting patterns would

emerge with further experience. In the first of two designs, we increased the number of votes to three votes for each group,

and correspondingly shortened the number of periods under which a voted rule governed before the next vote from

10 periods to 8. To keep the total number of periods to 30, we shortened the introductory experiences of VCMs with and

without punishment from 10 periods each to 3 periods each. This became our 3-Vote design (see Fig. 1A).

As in the pilot treatment, subjects in the 3-Vote design were given instructions describing the basic VCM, and then

participated in the basic VCM (this time for 3 periods), then received their second instructions about the opportunity of

voluntary punishment, unrestricted except for some budgetary constraints (see below), then played for 3 periods under

this condition, all before learning of the voting opportunities and items to be voted on. At the beginning of the 7th period,

the subjects received their third instructions, which explained the voting process, and took their first vote on the rulesgoverning who, if anyone, could be punished for the next 8 periods. At the beginning of the 15th period a second vote was

taken and new rules regulating punishment were chosen. Then the subjects participated in 8 periods of the VCM with

punishment (if any) governed by the second chosen rules. At the beginning of the 23rd period the third and final vote was

taken, and the remaining 8 periods were conducted with possible punishment governed by this last vote. As in the pilot, we

included practice exercises in each of the three sets of instructions.

Surprised to find that out of 60 group votes none allowed punishment of high contributors and that the majority of

groups seemed to be converging towards allowing punishment of low contributors, we added a 5-Vote design (Fig. 1B)

which differed from the 3-Vote design in that (a) there was no play, whether with or without punishment, before the

determination of rules by vote, and (b) the sequences of play between votes were reduced from 8 to 6 periods, to allow for

five votes and play phases in a session of similar duration. As Fig.1B shows, the first and only instructions were given at the

beginning of the experiment. They explained the basic VCM mechanism without punishment, possible rules governing

punishment, and the opportunity to vote on them. Subjects then voted to allow or restrict punishment (without any hands-

on experience of punishment or its restrictions). Then they participated for 6 periods in the VCM, governed by the chosenrules of punishment. At the beginning of the 7th period, the subjects voted again, and then participated in 6 periods of the

VCM, governed by the chosen rules of punishment. The same process repeated for three more times, as shown in the figure.

The 5-Vote design had the same number of periods (30) as the 3-Vote design.

The 5-Vote design functioned as a stress test for the results of the 3-Vote design in several ways. First, the task of

learning and familiarization was harder, since the first choice of rules occurred before subjects had any experience

interacting in a VCM with or without punishment. Second, the possibility that experiences such as annoyance with free

riders or with receiving punishment could influence the first vote was eliminated. These differences permitted a test of

whether the 3-Vote designs results were driven by the 3-Vote designs more gradual, hand-on learning. Third, in the 5-Vote

design there were 100 group votes, thus with 160 group votes in total, unanimity in prohibiting perverse punishment

would be very unlikely unless there were strong factors leading in this direction. Finally, with each group voting 5 times

instead of 3, the monotone increase in votes for the rule allowing punish low-but-not-high would be less likely unless there

were strong factors leading to this pattern.

In both the 3- and 5-Vote designs, sessions had 16 subjects assigned randomly to four groups of four subjects whoremained together throughout the session. Each subject knew there were 16 subjects in the experiment room but could not

ARTICLE IN PRESS

1instru

ctions

2instru

ctions

3period

s with

3period

s with

nopunishmen

t

punishment

3instru

ctions

and

1Vote

rd rdnd nd

st

st

8period

s with

the

chosen

rule

2Vo

te

8period

s with

the

chosen

rule

3Vote

8period

s with

the

chosen

rule

3 6 9 12 15 18 21 24 27 30

instr

uctio

ns

and

1Vote

thrd

st

6period

s with

6period

s with

the

chosen

rule

the

chosen

rule

3Vo

te

4Vo

te

3 6 9 12 15 18 21 24 27 30

Periods

nd

2Vo

te

6period

s with

the

chosen

rule

6period

s with

the

chosen

rule

th5

Vote

6period

s with

the

chosen

rule

Periods

3-VOTE DESIGN

5-VOTE DESIGN

Fig. 1. (A) The 3-Vote design and (B) the 5-Vote design.



6/17

tell which among the others in the session belonged to her group. Contribution and punishment choices (if any) were

announced to other group members under randomly changing labels B, C, and D, for ones fellow members, so that the

behaviors of individuals could not be tracked from period to period, except by conjecture. A subject learned the total

amount of punishment she had received, but not which group members punished her or by how much.

Just before the second and later votes of both designs, each subject was informed of the punishment rule chosen in the

preceding votes of each of the four groups in their session along with each groups average contributions and earnings

during the periods the rule governed (the information was new for the most recently taken vote, and was repeated for the

earlier votes). This information was included to speed the adjustment process, if there is one, and of course learning fromthe examples of others occurs in many real-world settings. The downside of providing this information, in terms of the

number of fully independent observations, is substantial, but our main results are statistically significant, in spite of this.

Also, the first vote of each group remains a strictly independent observation, since no information about other groups was

shared until immediately before the second vote.

2.2. Payoffs

All periods shared the same underlying structure. In each period, each subject had to decide on a division of 10

experimental dollars, in integer amounts, between a private account and a group account, before observing the choices of

fellow group members. In a period, subject i earned

yi 10 Ci 0:4X4

j1

Cj (1)

where Ci is is contribution to the public account and the summation is taken overall members ofis group, including i. After

all four made their decisions, each was informed of the contribution choices of the others. When punishment was

permitted, it cost a subject 0.25 experimental dollars to reduce the earnings of another person by 1.00 experimental dollar.

Subject is earnings after punishment were thus

yi 10 Ci 0:4X4j1

Cj 0:25Xjai

Rij Xjai

Rji (2)

where Rij is the number of dollars by which i reduced js earnings, and conversely for Rji. General constraints on punishment

in all treatments were: (i) a subject could not spend more than her/his pre-punishment earnings for the period on reducing

the earnings of other subjects, (ii) a subjects post-punishment earnings for a period would be set to zero if earnings yi in

Eq. (2) were negative, and (iii) a subject i could not spend more on reducing the earnings of a subject j in any period than

would single-handedly reduce js earnings according in (2) to less than zero.10 The Appendix shows the screen design forentering an individuals contribution and punishment decisions.

2.3. Voting

In a voting stage, each subject checked off one of three boxes beside each of three ballot items, on a screen set up as

follows:

I vote to allow a persons earnings to be reduced if

(a) that person assigns less than the average amount11 to the group account Yes No No preference

& & &

(b) that person assigns the average amount to the group account Yes No No preference

& & &

(c) that person assigns more than the average amount to the group account Yes No No preference

& & &

In each group of four subjects, of those expressing a preference in ballot item (a), if there was a majority or tie of No

votes against punishment of low contributors, then punishment of low contributors would be prohibited for the next 8

periods in the 3-Vote design and 6 periods in the 5-Vote design; and if a majority voted Yes, punishment of low

ARTICLE IN PRESS

10 The purpose of (i) and (ii) was to keep all decisions financially independent of each other while maintaining a guaranteed minimum payment for

recruiting reasons. The purpose of (iii) was to help subjects to avoid pointless spending on punishment in view of constraint (ii). Note, however, that it

remained possible for subjects to overspend on punishing in the sense that both subject i and, say, subject k might each spend enough to reduce js

earnings for the period to zero, although only one subjects punishment would actually be effective in that case, given (ii). This could happen because

subjects did not learn of punishment not carried out by or aimed at them, and the design (as in Fehr and Gachter, 2000a) keeps such information private

so as not to encourage free riding on punishment.11 As explained in the instructions, average amount meant the average over the four members of the group in the contribution stage of the period in

question. It could vary among groups and within a given group from one period to the next. Note that a vote to allow punishment of those contributingless than the group average of 4 players is the same event as a vote to allow punishment of those contributing less than the average of the 3 others.



7/17

contributors would be allowed; and correspondingly for ballot items (b) and (c).12 After the vote, each groups members

received a message reporting the voting outcome, which was one of 23 8 possible punishment rules (i.e., combinations of

the three ballot item choices).13

When a group voted to restrict punishment, a fixed zero appeared in the punishment box14 for all individuals to which

the restriction applied during the punishment stages that followed each contribution stage. For example, members of a

group that had voted to prohibit all punishment saw the standard punishment stage screen with fixed 0s in all the

punishment boxes, indicating that no punishment was allowed in this case.

We conducted five sessions of each design using a total of 160 subjects (see Table 1).15 All of the sessions

were conducted by computer in a computer lab at Brown University. At the end of each session, cumulative earnings

for the 30 periods were totaled and converted to real money at the rate of 25 experimental dollars to one real dollar, and$5 was added as a participation fee. Sessions typically lasted a little less than 2 hours including instructions, and subjects

overall earnings averaged approximately $25. Instructions for both designs are similar and available in our Working

Paper.16

3. Results

3.1. The voting pattern

In the 3-Vote design there were 720 individual votes (80 subjects each voting 3 times on 3 ballot items), and in the

5-Vote design 1200 individual votes. Table 2 shows the number of individual votes on each ballot item. The table shows a

substantial number of individuals voted to allow punishment of higher-than-average contributors, but many more voting

to allow punishment of less-than-average contributors.

In the 3-Vote design there were 60 group votes (see Table 1), and in the 5-Vote design there were 100 group votes. In the160 group votes altogether, only 4 of the 8 possible combinations of rules were ever chosen by majority rule. These were to

allow: (i) no punishment, 56 group votes; (ii) punishment of lower-than-average contributors and no other punishment,

98 votes; (iii) punishment of low-or-equal-to-average contributors and no other punishment, 4 votes; and (iv) punishment

of equal-to-average contributors and no other punishment, 2 votes. Conspicuously absent from this list is that no group

ever voted to allow punishment of higher-than-average contributors.

Result 1. No group ever voted to allow punishment of higher-than-average contributors, so perverse punishment was ruled out

from the first opportunity to vote.

In ruling out perverse punishment, every group also ruled out unrestricted punishment from the beginning. The two rules

punishment of lower-than-average contributors and no other punishment and punishment of low-or-equal-to-average

ARTICLE IN PRESS

Table 1

Numbers of groups, subjects, and votes.

Session design Number of

sessions

Number of groups

in each session

Number of subjects

in each group

Total number of

subjects

Total number of

group votes on

rules

3-Vote 5 4 4 80 60

5-Vote 5 4 4 80 100

12

We expected few cases where someone was exactly an average contributor, but for symmetry we treated the average contributor on a separateballot item.

13 Only Yes and No votes were counted in determining majorities; for example, if 2 voted Yes and 2 voted No, the proposal did not pass, but if

2 voted Yes, 1 No and 1 No preference, the proposal passed. Subjects were informed of whether a ballot item passed or not, but not by how many

votes or who voted which way.14 See the boxes labeled b, c and d on the lower left portion of the diagram in the appendix showing the screen design.15 Subjects were Brown undergraduates, recruited by (a) distribution of flyers in the mailboxes of all undergraduates, (b) distribution of flyers in a

large introductory economics course, (c) distribution of table slips at a student dining hall, and (d) advertising under the heading of employment in an on-

line campus magazine, the Brown Daily Jolt. Analysis of information provided in the post-experiment debriefing shows that the subjects majored in a large

range of concentrations, with the economics concentration being that of only 15%, about 5% more than the proportion in the overall student body. A little

less than half the subjects had taken no economics courses at the college level. A total of 67% of the subjects were female, somewhat higher than the 53%

share in the general student body. Browns undergraduate population numbers about 5500, so students participating in a given session tended not to

know one another.16 See Ertan et al. (2005). In the instructions and experiment we used neutral language and did not use words like free riding, punishment, and

perverse punishment. See also Cinyabuguma et al. 2006, where we point out that punishment is most clearly perverse when aimed at a groups highest

contributor. Here as in that experiment we distinguish between punishment of above average, average, and below average contributors, rather than

between punishment of highest and of other contributors, because this seems more symmetrical and less likely to convey a biased framing of the problemto subjects.



8/17

contributors and no other punishment are similar and we grouped them together under the heading of allowing

punishment of low-but-not-high contributors. Fig. 2 shows how the group voting evolved, over the sequence of votes for

the 3- and 5-Vote designs. Result 2 summarizes the voting pattern over time.

Result 2. In both designs, a plurality of groups voted in their first vote to prohibit all punishment, with a substantial minority of

groups voting to allow punishment of low-but-not-high contributors. Over the sequence of votes, this ordering reversed, so that in

the final vote, nearly all groups voted to allow punishment of low-but-not-high contributors, with only a few remaining groupsvoting to prohibit all punishment.

ARTICLE IN PRESS

Table 2

Numbers of individual votes to allow punishment of high, average, and low contributors, both designs.

Yes No No preference

Allow punishment of less than average contributors 410 211 19

Allow punishment of average contributors 46 577 17

Allow punishment of above average contributors 111 493 36

0

5

10

15

20

0

5

10

15

20Prohibit all punishment

Punish low-but-not-high

1st vote 2nd vote 3rd vote 1st vote 2nd vote 3rd vote 4th vote 5th voteNum

berofgroupsvotingfortherule

Num

berofgroupsvotingfortherule

Punish equal-to-average

3-VOTE DESIGN 5-VOTE DESIGN

Fig. 2. Evolution of the voting rules: (A) 3-Vote design and (B) the 5-Vote design.

0

2

4

6

8

10

AverageContributions

Periods

3 6 9 12 15 18 21 24 27 30

Periods

3 6 9 18 21 24 27 3013 6 9 12 15 18 21 24 27 303 6 9 18 21 24 27 3010

2

4

6

8

10

AverageContributions

unrestricted (exogenous)

no punishment (exogenous)

no punishmentno punishment

low-but-not-high low-but-not-high


Fig. 3. Average contributions for the two designs, by period and punishment rule: (A) the 3-Vote design and (B) the 5-Vote design.



9/17

3.2. Contributions and efficiency

Fig. 3 shows period-by-period contributions of groups for the two composite rules most frequently chosen. In both the

3- and 5-Vote designs, groups that allowed punishment of low-but-not-high contributors achieved substantially higher

levels of contributions than did groups that prohibited punishment altogether. We tested the hypothesis that contributions

are higher for groups choosing the punish low-but-not-high rule than for those choosing the rule of no punishment, against

the null hypothesis of no difference, in two ways. First, to avoid the possible statistical dependence from one period to

another, and from group to group in a session because of the information provided from the second vote onward, we set

aside observations from the second vote onward, and then averaged contributions in the periods between the first andsecond vote (with a probable loss of power). Comparing contributions under no punishment with those under punish low-

but-not-high at the group level, we found, in a one-tailed MannWhitney test, significance at the 0.1% level for 3-Vote

design (11 group observations without and 9 with punishment) and at the 5% level (p 0.034) for 5-Vote design

(13 observations without and 7 with punishment). Second, we tested differences in behaviors from the second vote onward

in Wilcoxon matched-pair tests at the session level, with fewer observations but similar results.17 In both the 3- and 5-Vote

designs, contributions in groups that permitted punishment of low-but-not-high contributors tended to increase over time

until the end-game fall off. In contrast, 3-Vote design groups that prohibited all punishment had falling levels of

contributions over time, replicating earlier results on basic VCMs without punishment, and in the 5-Vote design

contributions had a slightly increasing trend in the middle periods.18

Fig. 4 shows period-by-period efficiency of groups that voted to prohibit all punishment and groups that voted to

prohibit perverse punishment while allowing punishment of low contributors. In Fig. 4A, average period efficiency was

ARTICLE IN PRESS

Periods

3 6 9 12 15 18 21 24 27 30

Periods

13 6 9 12 15 18 21 24 27 3010

Efficiency

Efficiency

0

0.2

0.4

0.6

0.8

1.0

0.2

0.4

0.6

0.8

1.0

unrestricted (exogenous)

no punishment (exogenous)

no punishmentno punishment

low-but-not-high low-but-not-high


Fig. 4. Efficiency for the two designs, by period and punishment rule: (A) the 3-Vote design and (B) the 5-Vote design.

17 For these tests, we have at most one paired observation from each session, namely the average contribution per subject in all groups in the session

that chose one rule, and the corresponding average in all groups that chose the other. This yields up to five paired averaged observations in groups

allowing no punishment and in groups allowing punishment of low contributors in each design and set of periods, although there are fewer observations

for sets of periods when only one rule is observed in one or more sessions. For example, if in a certain session and set of periods two groups allowed no

punishment and two groups allowed punishment of low contributors, we averaged the contributions over the relevant periods in the first two groups and

likewise for the second two groups, giving us one pair of observations for that session; if all four groups follow the same rule, the session offered no

observation for this test. We performed Wilcoxon matched pair (ranked sum) tests on these data with the following results beginning with 3-Vote design:

for periods 714, only 3 sessions have observations of both rules, and although in all cases contributions are higher in the groups allowing punishment,

the p-value of the one-tailed test is 0.055; for periods 1522, with 4 valid sets of observations, the one-tailed test p-value is 0.034; for periods 2330, only

two sessions still have groups not using punishment, so although the ordering remains consistent, the one-tailed test p-value is 0.09. Turning to the

5-Vote design, we have for periods 16, 3 pairs of observations with one-tailed test p-value of 0.055; for periods 712, 4 pairs with one-tailed test p-value

0.034; for periods 1318, 4 pairs, one with contrary ordering, hence one-tailed test p-value 0.072; periods 1924, 4 pairs including one tie, and one-tailed

test p-value of 0.055; periods 2530, only 2 pairs, both with the usual order, but two-tailed test p-value 0.09.18 In Fig. 3A and especially 3B contributions under the endogenously chosen rule of no punishment are more sustained and decline more slowly than

is typical in a VCM without punishment. But endogenous choice includes its process, including repeated voting and the ability to change rules, possibly

leading to commitment effects (see Sutter et al.), restart effects (see the dashed vertical lines in Fig. 3), and selection effects as groups change rules inresponse to free-riding behavior (i.e., groups with the lowest levels of free riding are less likely to adopt a rule allowing punishment).



10/17

always higher under the rules allowing punishment of low contributors, and similarly in Fig. 4B, except in 6 periods.19 As

with contributions, we performed both MannWhitney and Wilcoxon tests of the hypothesis that earnings are higher

under restricted punishment than under no punishment at different levels of aggregation, with the resulting significance

levels varying from 0.1% to 10% in 3-Vote design and from the 10% level to insignificant in 5-Vote design due to the

similarity of earnings under the two rules in some groups of periods.20

Table 3 compares contributions and efficiency under the two most voted rules, and the exogenously imposed conditions

of unrestricted punishment (periods 46 of the 3-Vote design) and no punishment (periods 13).21 The results of the five

tests of Table 3 are summarized in Result 3:

Result 3. For each of the Wilcoxon matched pair tests on contributions, contributions are higher under the rule of punish low-

but-not-high than under the rule of unrestricted punishment, and contributions are higher under the rule of unrestricted

punishment than under the rule of no punishment, and this ordering is transitive. Correspondingly, efficiency is higher under

punish low-but-not high than under no punishment, and efficiency is higher under no punishment that under unrestricted

punishment, and this ordering is transitive.

ARTICLE IN PRESS

Table 3

Effects of the punishment rule on contribution and efficiency.

Test Ranks of contributions by the punishment rule Test Ranks of efficiency by the punishment rule

1 Punish low4unrestricted 4 Punish low4no punishment

2 unrestricted4no punishment 5 Punish low4no punishment

3 unrestricted4no punishment 2 No punishment4unrestricted

4 Punish low 4no punishment 3 No punishment$unrestricted

5 Punish low 4no punishment 1 Punish low 4unrestricted

Notes: One-tailed Wilcoxon matched pair tests. Tests 14 are for groups in the 3-Vote design. Test 1 compares the average contributions of periods 79 in

groups that chose punish low in their first vote matched with the average contributions of the same group in periods 46 of unrestricted punishment

(the number of distinct groups matched and compared is n 9); and correspondingly for efficiency. Test 2 compares contributions (efficiency) for groups

in periods 13 with contributions for the same groups in periods 46, n 20. Test 3 compares periods 46 with 79, for the groups that chose no

punishment in periods 79, n 10. Test 4 compares members of the first groups in each session that switched from a voted rule of no punishment to a

voted rule of punish low, comparing the 8-period averages before and after the switch; if two or more groups in a session switched at the same time, the

behaviors of all of their members are averaged; n 5. Test 5 is the same as Test 4, except it is for the 5-Vote design and 6-period averages are compared

before and after the switch, n 5. A less stringent version of Test 1 considers the first three periods of play in any group that adopted punish low, even if

after the 2nd or 3rd vote. This test has n 17 and the test has a p-valueo0.1% for contributions and o1% for earnings. We also considered less stringent

versions of Tests 4 and 5 that compare each group that switched from a voted rule of no punishment to one of punish low, regardless of whether this

was the first time such a switch had occurred among groups in their session. For Test 4, there are 9 paired observations and the p-value of the test statistic

iso1% for both contributions and earnings. For Test 5, there are 17 paired observations and the p-value iso0.1% for contributions ando5% for earnings.

Punish low indicates punish low-but-not-high. Significance at the 1% level. Significance at the 5% level. Significance at the 10% level, and $insignificant, in one-tailed tests.

19 The difference in earnings between groups with no punishment and those with the punish-low-but-not-high rule (Fig. 4) is smaller than the difference

in contributions (Fig. 3) because (a) an extra E$1 (one experimental dollar) of contribution raises efficiency by only E$0.60, and (b) punish-low-but-not-high

groups achieve higher contributions but incur some punishment costs (E$1.25 per E$1 of punishment assigned). Experimenters with the voluntary

contribution mechanism occasionally seen in the lab a group that achieves high contributions without punishment or other aids, and the two groups that

resisted voting for punishment in the 5-Vote design were of this type, their members perhaps priding themselves on being able to earn as much as those in

other groups even without having recourse to the punishment threat.20

As with contributions, we begin with MannWhitney tests using group level observations from the periods between the 1st and 2nd votes, only.For the 3-Vote design, the one-tailed test p-value in this case is 0.001; for 5-Vote design, the test finds no difference based on punishment rule, consistent

with what Fig. 4B shows in periods 16. Next, we performed Wilcoxon matched pair tests for each set of periods with a maximum of one pair of

observations per session. For 3-Vote design, there are 3 valid pairs for periods 714, all showing higher earnings with punishment, with one-tailed test

p-value of 0.055; for periods 15-22, 4 pairs, p-value 0.034; and for periods 2330, 2 pairs, ordered as expected, p-value of 0.090. For the 5-Vote design,

periods 16 have 3 paired observations but the difference, as with the corresponding MannWhitney test, is not significant; for periods 712, 4 pairs,

again no difference; for periods 1318, 4 pairs, with those with punishment earning more in 3 of 4 cases, thus p-value 0.072; periods 1924, 4 pairs, again

3 favoring those allowing punishment but this time one tie, thus p-value 0.055; periods 2530 only 2 valid pairs are left, with one session displaying one

order, the other the other order, hence no significant difference. Although violating the requirement of full independence of observations, it may

nevertheless help to put these results into perspective and convey a sense of the statistical power lost due to the dissemination of information if we report

also the results for tests using all group level observations for all periods: for 3-Vote design, the p-value of a one-tailed test would be less than 0.001; for

5-Vote design, the p-value of the corresponding test is 0.01.21 For example, in comparing contributions under the rule of punish low-but-not-high with contributions under the (exogenous) rule of unrestricted

punishment in Test 1, we considered the 17 groups of the 3-Vote design that eventually chose the rule allowing punishment of low-but-not-high

contributors (see Fig. 2A). For each of these groups we calculated the average group contribution over the first 3 periods that the group was governed by

this rule. We matched this average with the same groups average contribution over the 3 periods of unrestricted punishment (periods 46 of the 3-Vote

design). In the 17 matched pairs, 14 groups had higher contributions under the rule of punish low-but-not-high, 2 groups had higher contributions underunrestricted punishment, and 1 group was tied. The difference is significant (p 0.001) in a two-tailed Wilcoxon matched pair test.



11/17

Because of the difference in the orderings for contributions and efficiency, the sequence or tests in Table 3 for efficiency

are rearranged to show the transitivity. The difference in the orderings of contributions and efficiency is likely due to the

cost of punishment.

3.3. Mitigating the free rider problem

In the literature on public goods games, it is common practice to use the term free rider loosely to denote any

individual who contributes less than the socially optimal amount. It is worth noting, however, that in the absence of

punishment anyone who contributes less than others earns more than these others and thus obtains a free ride on others

contributions; but when punishment is possible a low contributor may fail to earn more, and therefore fail to free ride in

actuality. To compare how successfully different sets of rules address free riding, we adopt in this section a definition that

considers the full outcome, not simply the contribution decision.

Specifically, the symmetric design of this and other VCM experiments suggests a simple definition of free riding: a

subject A experiences free riding when someone else in his group, B, contributes less to the public good but earns more

than A does.22 For a specified punishment rule, sequence of periods, and collection of groups, we define the frequency of

free riding as the number of cases of free riding divided by the number of observations, and an observation as a pairing in a

group, where one subject in a group has a higher contribution than the other subject of the pair. By the design of a basic

VCM without punishment and its payoff Eq. (1), every time someone contributes more than someone else, there is a caseof free riding because the higher contributor always has lower earnings than the lower contributors. Thus, in this definition

of free riding, the frequency of free riding for the basic VCM is 100% (as shown in the first bar ofFig. 5). But the frequency of

free riding may decrease when sufficient punishment is directed at low contributors.

For the rule of unrestricted punishment, overall 20 groups in periods 46 of the 3-Vote Design, there were 205

observations of pairs of unequal contributions by subjects in a group, and 148 cases of free riding, for a frequency of 72%

(see the middle bar). In comparison, the frequency of free riding in the first 3 periods after a group voted for the rule of

punishing low-but-not-high contributors was 35% of the 103 observed unequal pairs. This is a striking reduction,

considering that the rule of punish low-but-not-high does not prevent a higher-than-average contributor from free riding

on a still higher contributor. The difference in free riding between unrestricted punishment and punish low-but-not-high

contributors is significant (po0.0001) in a Fisher exact test.23

Result 4. In comparing VCMs with rules governing punishment, we found the highest frequency of free-riding in groups

operating with no punishment, less free-riding in groups with unrestricted punishment, and least free riding in groups allowingpunishment of low-but-not-high contributors.

A regression analysis of incentives to free ride finds the same ordering as in Result 4. In the regressions below, we follow

Fehr and Gachter in defining subject is absolute negative and positive deviations from the average of others contributions

as

Absolute

Negative

Deviation

jCi Cij ifCio Ci

0 otherwise

(and

Positive

Deviation

jCi Cij ifCio Ci

0 otherwise

(

where Ci P

jaiCj=3 is the average of others contributions.

ARTICLE IN PRESS

0

0.2

0.4

0.6

1.0

no

punishment

unrestricted

punishment

punish low-

but-not-high

Frequency

ofcasesoffree

riding

0.8

total number ofobservations for arule in parentheses

(205)

(103)

(229)

Fig. 5. Frequency of cases of free riding, by punishment rule.

22 Under this definition, if everyone in a group contributed the same low amount, there would be no free riding (it is only defined for unequal

contributors).23 We also did a Wilcoxon matched pair test, which is also significant; see the Working Paper for details.



12/17

Using Fehr and Gachters specification (see their Table 5, p. 991), we first consider behavior in the 3 periods of the

exogenously imposed rule of unrestricted punishment (periods 46 of the 3-Vote design, see column (1) of Table 4), and

compare this with the first 3 periods of the endogenously chosen rule allowing punishment of low-but-not-high in both the

3 and 5-Vote designs (columns (2) and (3)).24 Then we consider behavior for the punish low-but-not-high rule over all the

periods which it governs in the 3- and 5-Vote designs (columns (4) and (5)).

In each regression ofTable 4 the dependent variable is each subject is punishment received in each period (3 periods for

regressions (1), (2), and (3), and up to 24 and 30 periods in regressions (4) and (5) respectively). The independent variables

are the Average Contribution of Others, is Absolute Negative Deviation, is Positive Deviation, and period and group

dummies (not shown).25,26

The results in Table 4 are consistent with those of Fehr and Gachter in that Absolute Negative Deviation always obtains a

significant positive coefficient. The coefficient on the Positive Deviation term in column (1), however, suggests that when itis allowed, perverse punishment exacerbates the incentive problem for high contributors.27 Table 5 re-organizes Table 4s

ARTICLE IN PRESS

Table 4

determinants of punishment received.

Independent variables Dependent variable: experimental dollars of punishment

First three periods of the rule All periods of the rule

Unrestricted

punishment 3-Vote

design (1)

Punish low-but-

not-high 3-Vote

design (2)

Punish low-but-

not-high 5-Vote

design (3)

Punish low-but-

not-high 3-Vote

design (4)

Punish low-but-

not-high 5-Vote

design (5)

Constant 0.74 4.086 19.754 0.587 11.483

(1.067) (2.353) (4.587) (2.222) (4.367)

p 0.490 p 0.088 po0.001 p 0.792 p 0.010

Average contribution by others 0.388 0.230 1.090 0.228 0.654

(0.175) (0.244) (0.405) (0.206) (0.269)

p 0.028 p 0.350 p 0.009 p 0.269 p 0.016

Positive deviation 0.377 n.a. n.a. n.a. n.a.

(0.152)

p 0.014

Absolute negative deviation 0.888 1.217 1.039 1.054 0.967

(0.221) (0.148) (0.122) (0.138) (0.095)

po0.001 po0.001 po0.001 po0.001 po0.001

R2 0.54 0.91 0.86 0.75 0.78

Observations 240 82 92 241 176

Notes: Punishment received as a function of deviation from group average in unrestricted and restricted punishment conditions. OLS regressions with

period and group fixed effects, not shown. Unrestricted punishment, in Column 1, is observed in periods 46, where each observation is for one subject

and one period. Columns 25 include one observation per subject under the rule allowing punishment of low-but-not-high contributors. In Columns 2

and 3, only the first three periods in which a group adopted the rule for the first time are included, while Columns 4 and 5 include all periods of restricted

punishment. Numbers in parentheses are White heteroskedasticity-consistent standard errors. Significance at the 1% level.

Significance at the 5% level. Significance at the 10% level.

24 We include observations for only the first 3 periods under a rule in columns (2) and (3) to achieve comparability with the regression for periods 46

(column (1)), in view of the possibility that learning or other factors might change behaviors with more repetitions.25 In both the unrestricted (Column 1) and restricted (Columns 25) punishment regressions, only the observations of individuals who could

potentially be punished are included. The difference is that under unrestricted punishment, anyone can be punished.26 The regressions were also estimated by the Tobit method, treating 0 punishment observations as potentially left-censored. Resulting coefficients

are similar and similarly significant except in the case corresponding to Column (1), where they are not significant at conventional levels.27 In fact there was considerable perverse punishment in periods 46 of the 3-Vote design. Of the 129 events of punishment, 28% were punishments

aimed at higher-than-average contributors for the period and group in question, 19% at the highest contributor for the period and group in question and

11% at individuals who contributed their full endowment. These percentages are calculated by counting each event (rather than dollar amount) of

someone punishing someone else. They may be atypically high due to the short duration of the unrestricted punishment portion of our experiment. Yet

similarly large amounts of perverse punishment are found in some other studies; see for example Anderson and Putterman (2006), Gachter and

Herrmann (2005), and for a regression result similar to column (1), in which the absolute positive deviation term also has a positive significant coefficient,Ones and Putterman (2007), Table 2.



13/17

findings in a manner that makes this clearer. In column (1) of Table 4 the coefficient for Absolute Negative Deviation is

$0.89, the estimated punishment for a $1 reduction in contribution for a less-than-average contributor, under the rule ofunrestricted punishment, in the first 3 periods of the 3-Vote design, and shown as a negative gain of $0.89 in Column (1)

ofTable 5. In Column (2) of Table 4 the coefficient for Absolute Negative Deviation is $1.22, the estimated punishment for a

$1 reduction in contribution for a less than average contributor, under the rule of punish low-but-not-high contributors, in

the first 3 periods of the 3-Vote design, and shown as a negative gain of $1.22 in Column (2) of Table 5, etc.

The $+0.60 throughout Table 5 is the $1 gain in the private account from reducing ones contribution by $1, minus the

$0.40 loss in the individuals earnings from the group account. In Column (1) ofTable 4 the coefficient for Positive Deviation

is $0.38, the estimated punishment for each $1 of additional contribution for a higher-than-average contributor, under the

rule of unrestricted punishment, in periods 46 of the 3-Vote design. The $+0.38 in Column (1) of Table 5 is the positive

gain from contributing $1 less and avoiding $0.38 in perverse punishment, for a higher-than-average contributor. The cases

labeled n.a. in Table 5 are for the rule of punish low-but-not-high in Columns (2)(5), in which case punishment of higher-

than-average contributors is not allowed.

Table 5 shows that for less-than-average contributors the net gain from contributing $1 less is negative for each of the

cases in Columns (1)(5). The $0.29 in Column (1) suggests that unrestricted punishment can reverse a subjects incentiveto free ride, for a subject contributing less than average, replicating Fehr and Ga chters earlier finding for the case of less-

than-average contributors. But the negative gains for less-than-average contributors is even more negative in Columns

(2)(5), suggesting that the incentive against free riding is strengthened for less-than-average contributors under the rule

of punish low-but-not-high.

Table 5 suggests that the incentives to contribute $1 less for higher-than-average contributors is not reversed under

unrestricted punishment or the rule of punish low-but-not-high. In Column (1) under unrestricted punishment, a subject

with a higher-than-average contribution makes an estimated net gain of $0.98 from contributing $1 less (a gain of $0.38

from reduced perverse punishment added to the $0.60 gain from shifting away from the group account). In Columns

(2)(5), under the rule of punish low-but-not-high, a higher-than-average contributor bears no punishment, but still gains

the $0.60 from a $1 shift from the public account. While neither rule reverses the incentive for a higher-than-average

contributor to contribute less, the incentive toward free riding is less under the rule of punish low-but-not-high than under

unrestricted punishment.28

3.4. Do subjects vote according to their type?

We conjectured that even though some subjects use opportunities to perversely punish (when punishment is

unrestricted) and would likely vote to allow perverse punishment in our experiment, punishment of high contributors

might nonetheless be ruled out since few groups would have a majority of members of this type. Results at group level are

consistent with this conjecture. Is there also evidence at the level of individuals, however, that subjects tended to vote

according to type? Logit regressions provide some affirmative evidence.

We estimated regressions in which the dependent variable is 1 if a subject voted to permit punishment specified by a

particular rule and 0 otherwise. Explanatory variables included the subjects contributions relative to their group averages

ARTICLE IN PRESS

Table 5

Incentives to contribute $1 less.

(1) Unrestricted

punishment

(2) Punish low-but-not-

high


high


high


high

Less-than-average contributors, subject to punishment

Abs. neg.

deviation

$0.89 1.22 1.04 1.05 0.97

$1 account shift +0.60 +0.60 +0.60 +0.60 +0.60Net gain $0.29 0.62 0.44 0.45 0.37

Higher-than-average contributors, subject to punishment only in Column (1)

Positive

deviation

+0.38 n.a. n.a. n.a. n.a.

$1 account shift +0.60 +0.60 +0.60 +0.60 +0.60

Net gain +0.98 +0.60 +0.60 +0.60 +0.60

Note: Net gain is the change in earnings from contributing $1 less.

28 When subjects make their contribution decision, they do not know what the other subjects contributions will be, and are uncertain of what will be

the average and its boundary line of punishment risk. This uncertainty creates an incentive toward higher contributions to be on the safe side of theunknown boundary.



14/17

during the periods preceding each vote, measures of how much punishment they had given and received, and period (i.e.

vote) and group dummies. The coefficients on the subjects relative contribution were positive in the regressions on voting

to allow punishment of low contributors, significant at the 5% level or better for both the 3- and the 5-Vote designs, and

negative in the regressions on voting to allow punishment of high contributors, significant at the 10% level in the regression

for the 3-Vote but not in that for the 5-Vote design.

This evidence suggests that the subjects were more (less) likely to vote to allow punishment of less-(greater-)than-average

contributors the higher on average was their contribution above their groups average contribution in the 8 (6) previous

periods. Details are in the Working Paper (Ertan et al., 2005).

4. Discussion and interpretation

We discuss our results and interpretation under the following headings: (a) A rough calculation on the plausibility that

no group would ever allow punishment of high contributors in the 160 group votes of the combined 3- and 5-Vote designs;

(b) Institutional choice and its evolution with and without information on other groups performance; (c) Distaste for

punishment and the role of opportunities to reconsider, (d) Variability of experimental results; (e) Implementation;

(f) What the experiment appears to tell us about models of heterogeneous preference types; (g) Heterogeneous preferences

in other voting models.

(a) On the plausibility of unanimously prohibiting perverse punishment in 160 group votes. Even if only a quarter of subjects

are prone to perversely punishing, it might seem implausibly rare that not a single group vote produced a majority for

allowing it. As an anonymous referee commented: [t]he fact that no group ever allowed punishment of high contributorswill make readers suspicious, since results of such clarity are quite rare. How improbable is the unanimity result? Simple

calculations suggest a wide range in the assessment of probability.

Consider the following composite hypothesis: (i) about 25% of punishment is targeted on higher-than-average

contributors when punishment is unrestricted (see Section 1.1 and footnote 27); (ii) an individual who has a preference

toward punishing high contributors is just as likely to punish as an individual who has a preference against such

punishment (i.e., the proportion of subjects of given preference is the same as the proportion of corresponding punishment

observations); (iii) perverse punishers are likely to vote their preference type to allow punishing high contributors, and

similarly normal punishers are likely to vote their preference type to prohibit punishing high contributors (evidence for

this from the logit analysis in Section 3.4); and (iv) the preference types are stable and randomly distributed.

With these rough assumptions the binomial probability that a group of four subjects chooses to allow perverse

punishment by a majority of 3 or 4 votes for the third ballot item is 0.0508 (we are setting aside complications from

abstentions), the probability of prohibiting perverse punishment is 0.9492, and the expected number of group votes

prohibiting perverse punishment is 152 out of the combined 160 group votes in the 3- and 5-Vote designs. This calculationroughly suggests that the vast majority of group votes will be to prohibit perverse punishment. But the binomial

probability of unanimity, the event that 160 out of 160 votes prohibit perverse punishment, is small, 0.0002.

However, this calculation depends on the assumption of statistical independence in type from period to period even for

the same individual, and this is an unrealistically strong assumption. Consider another simple but unrealistic assumption in

the other direction: that preference types and beliefs are so stable that they remain fixed from period to period. Then it is as

though there were only 40 independent group-level observations in the 10 sessions of the experiment and the same votes

and other decisions are repeated many times. Then the expected number of votes prohibiting perverse punishment is 38

out of the combined 40 group votes in the 3- and 5-Vote designs, and the binomial probability of unanimity, that 40 out 40

votes prohibit perverse punishment, is much larger, 0.12.

A glance at Fig. 2 shows that this second assumption on statistical dependence is unrealistically strong in the

other directioni.e., views change over time. Presumably the probability of unanimity is somewhere between 0.0002

and 0.12, likely pretty far from the two extreme calculations. The calculations serve to remind us of the sensitivity

of assumptions on statistical independence, when there are aggregations over many periods, and of the other uncertaintiesin (i)(iv).

(b) On institutional choice, learning and evolution: Our experiment is one of several recent ones in which institutions are

chosen by subjects through voting. Despite its stylized character, we think it suggests the considerable potential that the

experimental method has for contributing to our understanding of how institutions emerge and evolve. We note again our

choice of promoting a more accelerated and informed evolution of institutions by sharing information about outcomes

among groups in given sessions, despite some cost to statistical independence. We would argue that when real-world

groups decide on rules and practices, they often have access to information about the experience of similar groups, so the

information spill-over in the experiment has a real-to-life quality. We want to emphasize, however, that 40 first votes were

taken by the 160 subjects in our core treatments, and 4 more by the 16 subjects in our pilot experiment, and that each of

these votes occurred with no information about others choices or outcomes. Apart from the evolution towards more use of

punish low-but-not-high with additional votes, our findingsunanimous rejection of allowing punishment of high

contributors in the initial vote, higher contributions and earnings with than without punishment of low contributors, lower

earnings with unrestricted punishment, lower frequency of successful free riding under the punish low rule

are allsupported by tests using only decisions taken prior to information dissemination, as well as by tests using the full data set.

ARTICLE IN PRESS



15/17

(c) Distaste for punishment and opportunities to reconsider. In one treatment of a set of 4-person VCM interactions similar

to those in this paper, Bochet et al. (2006) let subjects communicate in a chat room before the 1st, 4th, and 7th periods of

ten rounds of play. A noticeable finding was that out of 12 groups in the chat room treatment with punishment

opportunities, not a single group discussed an explicit strategy of punishing low contributors, and in some groups,

members messages expressed the view that the punishment option was a trap set by the experimenters to reduce

earnings. A distaste for punishment may help to account for the rejection of all forms of punishment in many of the initial

votes in our experiment, for the rejection of punishment by most groups in Sutter et al. (2005) and Botelho et al. (2005),

and for the initial preference shown for being in the group without punishment by most subjects in Gurerk et al. (2005,2006).

While eschewing the punishment idea in their deliberations, however, many of Bochet et al.s subjects engaged in costly

punishment when group members defected from their verbal agreements to contribute. And subjects in the present

experiment seem to warm to the idea of allowing punishment of low contributors as they experience the sense of

resentment of or anger at free riders and as they learn that groups permitting punishment tend to have higher earnings.

The institutional choices made in our paper and in those of Sutter et al. (2005), Botelho et al. (2005), and Gurerk et al.

(2005, 2006) might seem at first glance to be at odds, since our subjects and Gu rerk et al.s subjects seem to show a greater

overall preference for punishment than do those of Sutter et al. and Botelho et al. However, all share a common reluctance

to adopt punishment rules at the outset, and much of the difference in overall outcomes may be attributed to the fact that

our subjects and Gurerk et al.s subjects have many opportunities to change rules or groups, while Sutter et al. and Botelho

et al. subjects have only one opportunity to vote on rules. Also, our subjects might have voted more like Botelho et al.s had

they been required to choose between no punishment and unrestricted punishment only, since the results of periods 46 of

our 3-Vote design are consistent with Botelho et al.s point that subjects may be worse, not better, off with (unrestricted)punishment.

(d) On variability: At the same time, even a brief review of the literature of punishment in social dilemmas shows a large

variability in experimental results. Experimentalists are well-aware that small changes in experimental design and

wording of instructions can affect experimental results, not just for experiments on punishment but quite generally. Still,

the literature on punishment in social dilemmas seems to yield an especially large variability in results. Our suspicion is

that this variability is partly due to punishment behavior itself being scattershot and variable. Thus there may not be a

simple general answer to the question of whether punishment in social dilemmas is a good or bad thing. The effects of

punishment may vary so much with the specific conditions that there is no general answer.29

(e) On implementation: In the experiment, once a rule of punishment is chosen by vote, it is easily implemented by

the experiments computer software. In the real world, there is no such easy implementation. Nonetheless, in the

practical world most organizations are hierarchical or a blend of hierarchy and symmetric volunteer elements, and

organizations often find ways of managing, albeit imperfectly, who gets punished. For example, in hierarchical

organizations if managers were more aware of the possibly high frequencies of perverse punishers and high costs inefficiency, they might focus more on mitigation. Once aware, managers could work to limit decentralized punishment

and attempt to instill norms of cooperation in much the same manner that managers attempt to control bullying behavior

and harassment.

(f) On heterogeneous preferences: There is a continuing discussion about keeping the standard model which limits the

type of preferences to self-regarding (individual profit maximizing) preferences. In favor of this approach is that it is

parsimonious and often leads to specific predictions, which in turn are often consistent with experimental results.

However, in this experiment, we dont see how we can interpret the results without positing some form of other-regarding

preference types (e.g. conditional cooperators, perverse punishers). Other experiments on social dilemmas also suggest the

need for modeling heterogeneous preference types, including both self-regarding and other-regarding or reciprocating

types. Our experiment adds to the interpretation of heterogeneity, in a particularly striking way.

An appeal of modeling only homogeneous self-regarding preferences is that introducing heterogeneous preferences is

too mushy, allowing almost any prediction and rationalizing almost any observed result. But our experiment has a strong

and consistent pattern to it, suggesting that the existence of heterogeneous preferences need not always lead toindeterminate results.

(g) On heterogeneous preferences in other voting models: Our analysis suggests that the presence of multiple preference

types may be important to predicting voting outcomes, and this may be true for other instances of public choice as well.

Pork barrel politics provides an example. Ordeshooks (1986, pp. 210215) model of pork barrel politics is one of a social

dilemma where what is good for an individual legislator is bad for society as a whole. For example, Senator Stevens benefits

ARTICLE IN PRESS

29 The fact that Gurerk et al.s subjects earn more with than without unrestricted punishment while the comparison goes the opposite way for our

subjects and Botelho et al.s illustrates this variability. In personal communication, Simon Ga chter reported that he and his collaborators found large

differences in the frequency of perverse punishment and, correspondingly, in the benefit or lack of benefit of introducing a punishment option across

subject pools in different countries and settings (a finding documented shortly before our paper went to press in the remarkable study by Herrmann et al.,2008).



16/17

by bringing pork to his district (the bridge to nowhere), while other Senators lose because their districts end up paying

for the bridge, even when the net benefits of the bridge are negative. Why then dont the other Senators outvote Stevens?

Ordeshooks answer is that in a pork bill, there can easily be an equilibrium where there are just enough ear-marked pork

projects to form a winning coalition, even when each of the projects has negative net benefits.

Ordeshooks analysis depends heavily on the assumption that each legislator is narrowly self-interested (the self-

interest may be in the form of an increased probability of re-election). In fact, the assumption of a single preference type of

self-interest is still common in voting models in the political science literature.

Our experiment and others on VCMs, the dictator game, and the centipede game (McKelvey and Palfrey, 1992) suggestthat the assumption of homogeneous preference types can be misleading. If one allows for the possibility of heterogeneous

preference types in Ordeshooks model, the equilibrium can shift and the predicted outcomes are not always as dire as

Ordeshooks original model suggests. For example, some senators may care about doing the right thing, or some voters

may choose not to reward a senator who joins a pork coalition, so the situation may be more fluid than it appears in

Ordeshooks model.

But if the situation is this fluid, can anything happen? To deal with this possibility we focused on observed

behavior under the specific experimental conditions, and then interpreted the specific results in terms of heterogeneous

preferences. We believe that this approach can work in experimental studies of other voting models, such as Ordeshooks,

even when there are signs of heterogeneity and odd behavior, as there were in our study of voting and perverse

punishment.

As another example, Meltzer and Richards (1981) model of the level of redistributive taxation uses a median voter

solution assuming strictly self-regarding preferences. More accurate explanations of the level of redistribution and its

variation over time and place would consider the strength of preferences for greater equality, on the parts of some citizens,and resentment of the undeserving poor, on the parts of others (see, for instance, Benabou and Tirole, 2005). Such an

addition of two almost opposite social preference types alongside self-interested types resembles the situation studied in

this paper, where self-interested subjects co-exist with both cooperation-preferring and cooperation-resisting types, with

the associated demographic leading to predictable voting outcomes.30

Appendix

Fig. A1 is the screen design for an individual to enter her contribution to the group account (box a), to learning of others

contributions (boxes b, c, and d), to enter her punishment decisions (boxes b0, c0, and d0), and to observe the computers

calculation of net earnings for a period.

ARTICLE IN PRESS

Put in groupaccount

Total ingroup account

Reduce others

earnings

You B C D

a b c d

a b c d

e = a+b+c+d

h = a +b+c +dSum of yourreductions of

others earnings

Earnings fromgroup account

f = 0.4e

i = 0.25h

Cost of yourreductions of

others earnings

j

k = g + f i j

Total of othersreductions ofyour earnings

Net earningsthis round

Earnings fromprivate account

g = 10.0 a

Fig. A1. Screen design for entering contribution and punishment decisions, receiving information, and calculating net earnings.

30 See Camerer and Fehr (2004) for other applications of other-regarding preferences to the study of public choice.

A. Ertan et al. / European Economic Revie

Ertan - Who to Punish - EER 2009

Documents