1. Introduction

A public good is characterized by non-excludability: once it is produced, all actors can enjoy its benefits regardless of their contribution to the provision of the good (Olson 1965 [1971]). Since public good provision is costly, this implies a tension between the individual and collective interest. While mutual cooperation leads to the best possible group outcome, individuals have an incentive to free-ride on the contributions of others.

Contributions to public goods can be supported by positive or negative peer sanctions, that is, the opportunity for actors to reward or punish each other. Experimental research established that high contributions are maintained when sanctioning is possible (Yamagishi 1986; Ostrom et al. 1992; Fehr and Gächter 2000, 2002; Sefton et al. 2007; Balliet et al. 2011). However, a challenge for research and policy is to design institutions that best enable heterogeneous actors to enforce cooperation (Ostrom 2010, 2012). In this respect, which method of implementing sanctions is most successful in increasing contributions remains an open question (Gächter and Thöni 2011). For example, it is unclear whether contributions are higher when the decision of whom to sanction is made individually or when it is made collectively. A sanctioning system with an individual decision rule (IDR) is a system in which every actor individually decides whom to sanction and pays the associated costs. A sanctioning system with a collective decision rule (CDR) is a system in which sanctions are executed only when multiple actors agree and pay the cost of sanctioning.

In real-life public good problems, actors often employ a sanctioning institution with a CDR. For example, Ostrom (1990) and Veszteg and Narhetali (2010) describe small communities where group members successfully enforce collective action through collective sanctioning decisions. Typically, members of the community regularly meet to identify free-riders and decide upon their punishment, for example in a vote. Also, in international cooperation, nations use collective sanctioning decision rules to ensure provision of global public goods such as international security and economic stability. Sanctioning decisions are usually taken by a variant of majority voting. Unanimity voting is uncommon, because it gives every individual nation the opportunity to veto a sanction, thereby making collective organizations ineffective decision makers (www.europa.eu, www.un.org).

So far, there is limited experimental research comparing the effect of sanctioning through IDRs and CDRs in public good problems.1, Casari and Luini (2009) find that, compared to an IDR, contributions to public goods are higher when punishment is only carried out if at least two out of four actors punish the fifth member of their group. Thus, they consider only one CDR, and do not compare positive and negative sanctions. This leaves a number of unresolved issues, in which the current paper provides further insight.

First, it is unclear how the effect of a CDR on contribution depends upon the proportion of actors required to agree for a sanction to be implemented. On the one hand, the higher the proportion required, the less likely it will be that a sufficient number of actors agrees on the necessity of sanctioning and is willing to incur the associated costs (cf. Buchanan and Tullock 1962). Thus, while under an IDR all desired sanctions are carried out by definition, under CDRs there is a higher chance that free-riders remain unpunished or contributors unrewarded. On the other hand, under an IDR individuals might decide to use sanctions in ways that hurt contribution and thereby result in decreasing payoffs for the group, i.e. to reward free-riders or to punish contributors (Casari and Luini 2009; Ellingsen et al. 2012). Consequently, the more actors collectively agree that a certain group member should be sanctioned, the higher the chance that this sanction will be in the collective interest, that is, in accordance with enforcing contributions to the public good. In the current paper, we address the effect of the required proportion of consenting actors on contribution levels by comparing contributions under an IDR to a CDR for which majority and a CDR for which unanimity is required.

Second, theoretical arguments and empirical results on punishment cannot be straightforwardly generalized to reward. For example, to maintain cooperation rewards have to be repeatedly allocated to contributors. Conversely, the mere threat of punishment can be enough to deter free-riding (Dari-Mattiacci and De Geest 2010). This suggests that punishments and rewards may differ in efficiency. Empirically, it has been shown that punishments and rewards might differ also in terms of efficacy (e.g. Sefton et al. 2007; Choi and Ahn 2013). We therefore study decision rules for assigning both punishment and reward.

The effects of the decision rules on macro-behavior such as aggregate contribution levels depend on assumptions about the micro-motives of individual actors (cf. Gächter and Thöni 2011). For example, these effects depend on which proportion of actors is willing to sanction, who is likely to be targeted, and how sanctions influence contribution decisions. We summarize existing knowledge on individual behavior in the PGG with sanctions. Subsequently, we apply this to predict macro-level behavior in the PGG with different decision rules, and with punishment or reward. We thus assess through which mechanisms our empirical extensions could result in different contribution levels between sanctioning systems.

The paper is structured as follows. In the theory section, we review the literature on behavior in public good problems with opportunities for sanctioning. Subsequently, we develop hypotheses about contribution and sanctioning behavior, and on how this behavior of individuals translates in different contribution levels under IDRs and CDRs. Individual-level and macro-level hypotheses are tested in an experiment where individual, majority, and unanimity decision rules for punishing and rewarding are employed in an incentivized manner.

2. Theory

2.1 The Public Goods Game

The linear Public Goods Game (PGG; also called Voluntary Contribution Mechanism, e.g. Isaac and Walker 1988) is used as a model of public good problems. It is played by n actors. All actors i receive an endowment w. They simultaneously and independently decide whether to keep this endowment for themselves or contribute an amount giɛ[0, w] to a “group account”. The total amount contributed by all n actors together, g=∑gi, is multiplied by a number m, with 1<m<n, and mg is divided equally among all actors. Because m<n, the individual return obtained from the amount contributed to the group account is smaller than when it would have been kept to oneself (mgi/n<gi). Therefore, when the PGG is played once under standard game-theoretic assumptions – that is, when actors are rational in maximizing utility and selfish in that utility equals own payoff – contributing nothing is a dominant strategy, yielding the highest utility regardless what others do. This results in the unique Nash equilibrium of no contributions. However, since m>1 the joint group outcome nwg+mg is maximized when everybody contributes the full endowment. Every player would then be better off compared to when all contribute nothing (mw>w). Thus, individually rational behavior leads to a Pareto-suboptimal outcome, making the PGG a social dilemma (Dawes 1980).

2.2 Behavior in the PGG

The prediction of complete free-riding is typically refuted in experimental research employing the PGG. Instead, contributions averaging 50% of the endowment are consistently observed in one-shot PGGs (Walker and Halloran 2004; Kocher et al. 2007). Also in repeated PGGs where group composition changes after each round, as in our experiment, subjects initially contribute around 50% on average. However, in subsequent rounds contributions gradually decline to very low levels (Ledyard 1995).

Research explaining this declining contribution pattern focuses on non-standard utility as an alternative behavioral assumption. It has been empirically established that actors in the PGG can be classified in two main preference types (Ostrom 2000; Fehr and Gintis 2007; Ones and Putterman 2007). Actors of the first type are rational and selfish free-riders who never contribute to the public good. Actors of the second type are conditional cooperators who contribute more, the more they expect others to contribute (see Gächter 2007; Chaudhuri 2011 for an overview of empirical evidence). These actors are assumed to derive utility from reciprocating others’ expected contribution even in one-shot settings. Conditional cooperators are heterogeneous in the extent to which they match others’ contributions. Many are ‘imperfect’ reciprocators in that they contribute slightly below what they expect others to contribute on average. In an experiment specifically designed to identify preference types, Fischbacher et al. (2001) classify 50% of their subjects as (partial) conditional cooperators and 30% as free-riders.2 Others have roughly replicated this distribution of types in different subject pools (Kocher et al. 2008; Herrmann and Thöni 2009; Kamei 2012; Thöni et al. 2012). Conditionally cooperative behavior is consistent with a prosocial orientation (Van Lange 1999).

In repeated PGGs, conditional cooperators adapt their expectation of others’ contribution on the basis of their experience of the average group contribution in the previous rounds (Fischbacher and Gächter 2010). The more free-riders and imperfect conditional cooperators there are, the lower group contribution will be. Conditional cooperators decrease their contribution accordingly, which causes the average to further decline. This explains the decrease of cooperation over time.

2.3 The PGG with sanctions

Sanctioning can be modeled by adding a second stage to the standard PGG. After all actors i have determined their contribution and observed the contributions of the other group members, they decide for every other group member j whether to pay an amount to punish and/or reward this actor. Let sij denote the amount actor i uses to sanction actor j. We assume here that an actor can only choose whether or not to sanction, but not the magnitude of the sanction: sij is either a fixed amount f>0 or zero. When the amount is used for punishment, a multiple k of f is subtracted from the payoff actor j obtained in the PGG. The same amount is added to the payoff of actor j when sij is used for reward. Thus, in addition to the payoff from the standard PGG, every actor j loses a total amount of received punishment from all other actors i or gains this amount of received rewards. Moreover, every actor i forfeits by assigning sanctions to other actors j. This captures the essential features of how sanctions are executed in the PGG, denoted here as an IDR.3

In sanctioning systems with a CDR, all actors i likewise decide whether to pay an amount to sanction others. Sanctioning under a CDR is different from an IDR in the sense that a sanction is only implemented when at least a proportion p of all group members, save the prospective recipient, sanctions the same actor. Because we fixed the sanctioning amount sij to 0 or f, this implies that sanctions under a CDR are more severe than those under an IDR assigned by a smaller number of actors. Thus, every actor j loses an amount in a punishment system with a CDR if the proportion qj of actors i for whom sij=f is larger than or equal to p. The same applies to the amount gained under rewards. If qj<p no sanction is executed, that is, actor j does not gain or lose money due to received sanctions. Moreover, the actors who proposed to sanction actor j do not pay the cost of sanctioning if qj<p. Thus, every actor i who sanctions j loses an amount

We assume one-shot interactions.4 Thus, actors cannot benefit from group members who increase their contribution in subsequent games after being sanctioned. This implies that long-term incentives for sanctioning, which differ between IDRs and CDRs, are ruled out. Under these assumptions, rational selfish actors do not use costly sanctions regardless of the sanctioning of others. The Nash equilibrium of the one-shot PGG with sanctions under standard assumptions of rationality and selfishness is no sanctioning and no contributions. Although repeated interactions with sanction opportunities might be more realistic for many applications, we do stick to one-shot interactions also because in repeated interactions actors have alternative sanctioning mechanisms, too. For example, actors can reciprocate others’ low contribution decisions by own low contribution decisions in future interactions. This would lead to possible confounding effects of the exogenous sanctioning mechanisms we want to study with the endogenous sanctioning opportunities due to repeated interactions (cf. Fehr and Gächter 2002).

Given that not all actors are rational and selfish, some might sanction despite the prediction that follows from the assumptions of selfish rationality. Non-executed sanctions are costless, and group members are not informed about sanctions that were proposed by others but were not executed. Thus, non-executed sanctions cannot influence behavior of other actors than the ones who proposed the sanction. Therefore, actors have no incentive to take the probability that the sanction is executed into account when deciding whether or not to sanction under a CDR.5 Given these characteristics of the interaction situation, there is no reason to assume that actors make different sanctioning decisions under IDRs and CDRs.

We proceed with a review of empirical evidence and a theoretical account of contributing and sanctioning behavior in the PGG with an IDR. This reveals which actors allocate sanctions, and which behaviors are more likely to be sanctioned. Multiple individual sanctions for a given behavior imply a high consensus. Thus, given that the decision rule will not directly influence sanctioning decisions, those behaviors that are sanctioned individually by many are more likely to be sanctioned when a CDR is used. Behavior in the PGG with an IDR then allows to predict the likelihood that sanctions will be implemented under a CDR.

2.4 Behavior in the PGG with sanctions under an IDR

Despite the equilibrium prediction, empirical evidence shows that actors frequently use punishment under an IDR in one-shot settings. It is consistently found that punishment is assigned in accordance with enforcing cooperation. That is, actors receive more punishment the less they contribute (e.g. Carpenter and Matthews 2009; Casari and Luini 2009), and the less they contribute compared to the average contribution of the group (e.g. Fehr and Gächter 2000, 2002; Ones and Putterman 2007; Sefton et al. 2007; Carpenter and Matthews 2009; Ertan et al. 2009). This punishment is mostly executed by high contributors (e.g. Fehr and Gächter 2002; Sefton et al. 2007). It is also observed, however, that low contributors occasionally punish above-average contributors. This ‘perverse’ punishment is usually carried out by a small number of actors (Casari and Luini 2009). The extent to which it occurs varies greatly between subject pools, up to 50% of total punishment expenditure (Herrmann et al. 2008), but is typically estimated between 5% and 25% (Ostrom et al. 1992; Cinyabuguma et al. 2006; Ones and Putterman 2007; Casari and Luini 2009; Ertan et al. 2009). The effect of cooperation-enforcing and perverse punishment differs. Below-average contributors increase their contribution in the subsequent round after being punished (e.g. Fehr and Gächter 2002), but for above-average contributors empirical evidence is mixed. Some studies show that above-average contributors decrease their contribution after being sanctioned (Masclet et al. 2003; Bochet et al. 2006; Ones and Putterman 2007); others find no effect of perverse punishment on contribution (Denant-Boemont et al. 2007; see also Ellingsen et al. 2012).

Like punishments, rewards are typically used to enforce cooperation in one-shot settings. High contributors tend to reward other high contributors (Walker and Halloran 2004; Sefton et al. 2007; Sutter et al. 2010; Ellingsen et al. 2012; Choi and Ahn 2013). However, while rewards are mainly allocated to above-average contributors, it is often less clear than for punishment that the amount of rewards received increases with the (positive) deviation from the average group contribution (Walker and Halloran 2004; Sefton et al. 2007; Nosenzo and Sefton 2012; Choi and Ahn 2013; but see Ellingsen et al. 2012). Also, in repeated PGGs where actors can identify each other it is found that rewards are frequently used in every successive interaction (Rand et al. 2009; Milinski and Rockenbach 2011; Ellingsen et al. 2012), while the use of rewards declines over time in fixed groups when actors cannot infer who rewarded them (Sefton et al. 2007; Choi and Ahn 2013). As with punishments, the effect of rewards differs with the recipients’ contribution. Above-average contributors are found to contribute more in the subsequent interaction the more rewards they receive, while below-average contributors decrease their contribution the more they are rewarded (Ellingsen et al. 2012).

In repeated interactions in fixed groups, contributions under reward are sometimes found to be lower than those under punishment (Sutter et al. 2010 low leverage; Milinski and Rockenbach 2011; Wiedemann et al. 2011; Drouvelis and Jamison 2012; Nosenzo and Sefton 2012) although others did not find a difference, at least until the final periods (Sefton et al. 2007; Rand et al. 2009; Sutter et al. 2010 high leverage; also see Balliet et al. 2011; Choi and Ahn 2013). However, in repeated one-shot settings, which are most similar to our experiment, it is found that contributions are lower under rewards than under punishment (Choi and Ahn 2013).

2.5 Non-selfish utility in the PGG with sanctions

Rational selfish free-riders never sanction when this is costly. However, anticipation on being sanctioned will induce them to contribute, provided that the loss due to received punishment or gain from rewards offsets the payoff advantage of free-riding (Fehr and Fischbacher 2004). Non-selfish actors could derive utility from sanctioning defectors even in one-shot interactions (Diekmann and Voss 2003). These cooperation-enforcing punishers are sometimes classified as a separate type of actor, which partly but not completely overlaps with conditional cooperators in the PGG without punishment (e.g. Ostrom 2000; Ones and Putterman 2007).

Empirical evidence is indeed consistent with the assumption that people derive utility from punishing and rewarding in one-shot settings. Fehr and Gächter (2002) already noted that subjects experience anger when they observe free-riding in a hypothetical situation. This anger increases the more the free-rider deviates from the average contribution of others. Casari and Luini (2009) show that punishment decisions are not influenced by information that others already punished the recipient. Thus, subjects do not care so much about actors being punished, but derive utility from the act of punishing. Fudenberg and Phatak (2010) show that subjects punish even when the recipient is not informed on the punishment, implying that punishment cannot influence future cooperation. In a neurobiological experiment, De Quervain et al. (2004) show that the human reward system is activated in the brain of an actor punishing a defector. Utility from rewarding is addressed by Dawes et al. (2007), who conduct an experiment in which subjects can decide on a costly in- or decrease of a random amount of tokens other subjects had received. They find that subjects who afterwards indicate more anger and annoyance towards those with a high amount also spend more to increase low and reduce high amounts received by others. Yet, despite utility derived from sanctioning, it is found that actors sanction less the higher the costs of sanctioning are (Anderson and Putterman 2006; Carpenter 2007; Nikiforakis and Normann 2008; Vyrastekova and Van Soest 2008; Sutter et al. 2010). Thus, actors take their own payoff into account in sanctioning decisions (Fehr and Fischbacher 2004).

As mentioned above, some actors use sanctions perversely. Although they are relatively rare, perverse sanctioners constitute a separate type of actors. These actors free-ride in the PGG, and subsequently punish high contributors (e.g. Cinyabuguma et al. 2006; Herrmann et al. 2008; Gächter and Herrmann 2009; Chaudhuri 2011). A motive for perverse punishment might be revenge on previous punishment received from high contributors (Ostrom et al. 1992; Fehr and Gächter 2000; Denant-Boemont et al. 2007; Nikiforakis 2008), a desire to increase relative payoff advantage of free-riding (Fehr and Gächter 2000), or a dislike of do-gooders or norm violators (Monin 2007; Ones and Putterman 2007; Gächter and Herrmann 2009). Alternatively, it could be that actors occasionally punish high contributors by mistake (Fehr and Gächter 2000). Rand et al. (2010) and Rand and Nowak (2011) show that punishment of cooperators can be evolutionary stable, thus providing a potential explanation for the fact that perverse punishment can drive out cooperation. Perverse rewards, i.e. rewards targeted at free-riders, just as perverse punishments, increase the payoff discrepancy between high and low contributors. Hence, they are potentially equally detrimental for cooperation (Ellingsen et al. 2012).

Punishment and reward are used in different ways. The possibility of being punished might be enough to deter free-riding, such that there is no need to actually allocate punishment. However, when an actor makes a high contribution, rewards actually have to be carried out sufficiently often to induce free-riders to contribute (Dari-Mattiacci and De Geest 2010). Thus, when contributions in a population increase due to the existence of a sanctioning system, more rewards than punishments have to be allocated. In one-shot settings, actors cannot establish a norm of direct mutual rewarding. They are therefore unsure whether the costs of allocating rewards will be offset by reciprocation (Rand et al. 2009). This makes rewards more expensive than punishments in the one-shot PGG. As stated above, more expensive sanctioning implies that less sanctions are assigned. This explains why, without opportunities for directly reciprocating received rewards, actors initially attempt to reward but eventually give up when others do not continue to reward as well.

2.6 Micro-level hypotheses

Before turning to differences in contribution levels between IDRs and CDRs, we capture the framework developed for micro-motives, that is, contributing and sanctioning behavior of individual actors, in a number of hypotheses. These hypotheses are based on empirical regularities observed in previous experiments. The hypotheses will be used as a micro-level framework summarizing which actors are likely to sanction, and how actors react to receiving cooperation-enforcing or perverse sanctions. When theorizing about the effect of sanctioning decision rules on contributions, we assume that actors behave as summarized in this framework.

We first derive hypotheses on sanctioning behavior. Although perverse punishment is sometimes observed, punishment is usually allocated by cooperation-enforcing high contributors. Accordingly, we hypothesize that actors are more likely to punish others the more they contributed themselves.

Hypothesis 1: The more an actor contributes, the higher this actor’s likelihood to assign punishment.

Punishment of high contributors is more often targeted at free-riders than punishment of low contributors, who might punish perversely. Thus, the more an actor contributed the more likely he is to punish a free-rider. This implies that we expect an interaction between the contribution of the actor allocating punishment and the contribution of the recipient on the likelihood to sanction. We argue that actors perceive free-riding both in the sense of the recipient contributing a low amount and in the sense of contributing less than the other group members. This means that low as well as below-average contributors are likely to be punished by high contributors.

Hypothesis 2a: The more an actor contributes, the more this actor’s likelihood of assigning punishment decreases with the contribution of the recipient.

Hypothesis 2b: The more an actor contributes, the more this actor’s likelihood of assigning punishment increases with the negative deviation of the recipient from the group average contribution.

Also reward is predominantly allocated by high contributors.

Hypothesis 3: The more an actor contributes, the higher this actor’s likelihood to assign reward.

High contributors are more likely to reward other high contributors. This applies both in an absolute sense, and compared to the average of other group members. Again, we hypothesize an interaction between the contribution of the rewarding actor and the contribution of the recipient.

Hypothesis 4a: The more an actor contributes, the more this actor’s likelihood of assigning reward increases with the contribution of the recipient.

Hypothesis 4b: The more an actor contributes, the more this actor’s likelihood of assigning reward increases with the positive deviation of the recipient from the group average contribution.

Unlike punishments, in order to enforce cooperation rewards have to be allocated repeatedly to high contributors. They are therefore costly to maintain when direct reciprocation is impossible. Accordingly, the likelihood of rewarding decreases over rounds.

Hypothesis 5: The more rounds have already been played, the lower the likelihood that rewards are allocated.

We now turn to the effect of sanctions on contribution. Receiving punishment leads to conformation to behavior of other actors, in order to avoid receiving punishment in future interactions. Free-riders thus increase and high contributors decrease contribution the more they are punished. Consequently, their contribution is more in line with others’ average.

Hypothesis 6: The more an actor contributing below the average is punished, the more this actor contributes in the subsequent interaction.

Hypothesis 7: The more an actor contributing above the average is punished, the less this actor contributes in the subsequent interaction.

Rewards strengthen current deviations from average behavior. Above-average contributors will thus contribute more and below-average contributors less the more they are rewarded, provided they did not already contribute the full endowment or free-ride completely, respectively.

Hypothesis 8: The more an actor contributing above the average is rewarded, the more this actor contributes in the subsequent interaction.

Hypothesis 9: The more an actor contributing below the average is rewarded, the less this actor contributes in the subsequent interaction.

2.7 Macro-level effects of CDRs

Only the sanctions on which required consensus is reached are executed under a CDR. Given sanctioning behavior as predicted in the micro-level hypotheses, it is likely that there will be more consensus on some sanctions than on others. This gives rise to different contribution levels under IDRs versus CDRs. Macro-level hypotheses differ for punishment and reward.

Under an IDR, all allocated punishments are carried out. This implies that high contributors will frequently punish free-riders. Free-riders will receive more punishment the less they contribute in absolute sense and compared to the others. Also, perverse punishers have the opportunity to punish high contributors.

The situation is different when only those sanctions are implemented to which a majority of actors consents. A large proportion of actors derives utility from sanctioning. It is therefore likely that majority consent is often reached on punishment of free-riders. The more a free-rider deviates from the average, the higher the chance that consent is reached. Conversely, when perverse punishment is relatively rare, as is typically found, it will be unlikely that a majority of actors agrees on punishing a high contributor. Thus, a majority sanctioning system will mitigate perverse punishment while at the same time cooperation-enforcing punishment is likely to be implemented. We therefore expect a majority decision rule to lead to higher contribution levels than an IDR.

Hypothesis 10a: Contribution is higher under a majority than under an individual punishment decision rule.

Some previous studies indeed found that majority consent is sufficient to rule out perverse punishment, but that cooperation-enforcing punishment could still be implemented. Casari and Luini (2009) found that punishment was more effective when two out of four actors had to agree on sanctioning a fifth. Perverse punishment was to a large extent ruled out under this decision rule. Likewise, Ertan et al. (2009) let subjects choose whether or not to enable punishment of high contributors. While this was sometimes favored by a number of free-riders, it was never implemented because a majority opposed the possibility.

Under a unanimity decision rule punishment is only executed when all remaining group members decide to punish an actor. Perverse punishment is therefore even less likely than under a majority decision rule. However, also for cooperation-enforcing punishment a unanimity decision rule requires a very high proportion of actors willing to punish. Therefore, it will be difficult to implement any punishment at all. Conversely, under an IDR there could be perverse punishment, although the vast majority of punishment should be targeted at below-average contributors. It is therefore likely that contribution levels under a unanimity punishment decision rule are lower than under an individual rule.

Hypothesis 10b: Contribution is higher under an individual than under a unanimity punishment decision rule.

As explained above, continuous need of rewarding makes reciprocating through rewards more expensive than through punishment, which causes the use of rewards to decline (Dari-Mattiacci and De Geest 2010). Thus, more punishment than reward will be executed under every decision rule, making sanctioning through punishment more effective. Therefore, we expect that for every decision rule contribution is higher under punishment than under reward.

Hypothesis 11: For every decision rule, contribution is higher under punishment than under reward.

The more actors are required for a reward to be executed, the more likely it is that too many actors give up on using rewards. Thus, the more actors are required the more likely it is that consensus cannot be reached anymore. Also, perverse rewards have to be carried out when an actor free-rides in anticipation on being rewarded. Perverse rewards are thus likewise costly to maintain. Therefore, while perverse rewards might be occasionally allocated it is unlikely that they are persistently problematic for enforcing cooperation. Thus, rewards under an IDR are not thwarted by perverse sanctions as much as punishment, while it is difficult to raise enough actors to agree on rewards under a CDR. The more actors are required to agree, the more problematic enforcing cooperation becomes. Accordingly, we hypothesize that the more actors are required to agree on rewards, the less rewards will be carried out and the lower contribution levels are. Thus, the macro-level hypotheses on rewards are partly different from those on punishment.

Hypothesis 12a: Contribution is higher under an individual than under a majority rewarding decision rule.

Hypothesis 12b: Contribution is higher under a majority than under a unanimity rewarding decision rule.

3. Experimental design

In the experiment, subjects participated in interaction situations based on the PGG as described above with group size n=4; endowment w=20, and multiplier m=1.6. The outcome of the game represented points that subjects earned. After the experiment, subjects received 1 eurocent for every 60 points earned.

The experiment comprised three parts. In the first part, preferences for conditional cooperation were assessed using a measure designed by Fischbacher et al. (2001). First, subjects decided on an unconditional contribution, i.e. how much to contribute in the PGG in a group with three other subjects. Second, subjects made this same decision conditional on others’ average contribution. Thus, they decided how much they would contribute for every possible average of the three other group members (strategy method, Selten 1967). The more conditionally cooperative a subject is, the more contribution should increase with others’ average. Subjects were randomly matched in groups of four. For three randomly chosen group members, payoff was calculated based on the unconditional contribution. For the fourth group member the conditional contribution corresponding to the average unconditional contribution of the three others was used. This makes both decisions incentive-compatible. Note that conditionally cooperative preferences were always assessed at the beginning of a session, prior to playing the actual PGGs. Fischbacher and Gächter (2010) measured conditional cooperation using a similar design, administered either at the start or end of the experiment. They did not find a sequence effect, suggesting that measuring preferences does not significantly influence subsequent behavior.

In the second part of the experiment, the standard PGG as described above was played for 10 rounds. Between the rounds, subjects were randomly rematched into different groups. They could not infer their group members’ previous decisions. After every round, subjects were informed about the contribution of the others in their group and their own payoff. Numerous previous experiments have administered baseline games before the experimental treatments (cf. Sefton et al. 2007; Casari and Luini 2009). No treatment effects were found in experiments where the order of baseline and punishment treatments was randomized (e.g. Fehr and Gächter 2002; Herrmann et al. 2008).

In the third part, the PGG with sanctions was employed. In every session, 10 rounds were played with only punishment and 10 rounds with only reward; the order varied between sessions. Both reward and punishment took place in one of three experimental conditions; individual, majority, or unanimity. In all three conditions, subjects first decided upon a contribution. Subsequently, they were informed about contributions of their group members and decided for all three others separately whether to sanction this person. If executed, a sanction added or subtracted six points from the earnings of the recipient at a cost of two points. This cost ratio of 1:3 is often used in PGG experiments (cf. Fehr and Gächter 2002). The effect and cost of the sanction were chosen to ensure that receiving a sanction has a severe impact on payoffs. Because the amount by which actors could sanction was fixed, the severity of the sanction is equal to the number of actors sanctioning.6

In the individual condition, all assigned rewards and punishments were implemented. Subjects who received multiple sanctions were sanctioned by the cumulative amount while all subjects allocating the sanction paid the cost of two points. The procedure in the majority condition was exactly the same, except that the sanction was only executed when at least two group members wanted to sanction the same recipient. Thus, an actor sanctioned by two others lost 12 points, while both sanctioning actors lost 2 points. In the unanimity condition, the sanction was only executed when it was requested by all three remaining group members. When the number of subjects who wanted to sanction was insufficient in the majority or unanimity condition, the sanction was not executed and no costs had to be paid. Note that the labels “majority” and “unanimity” imply that a subject is not involved in the decision of sanctioning him- or herself. Thus, only the three other subjects determine whether the fourth subject is going to be sanctioned. After each round, subjects were informed about all sanctions that had been executed in their group but could not infer who allocated them. No information was provided about sanctions that were not executed. Again, subjects were randomly rematched between the rounds.

The experiment was programmed using z-Tree (Fischbacher 2007) and conducted at the ELSE laboratory of Utrecht University. Subjects were recruited using the online recruiting system ORSEE (Greiner 2004). Twelve sessions were held, four in each experimental condition of which two with reward first and two with punishment first. Instructions were provided on paper. It was made clear that the instructions were always truthful and identical for all subjects in a session. In the first set of instructions, the standard PGG and the first two parts of the experiment were explained. It was announced that there would be further tasks, but not what these tasks entailed. These instructions included a number of control questions, which appeared on the computer screen. When a subject did not answer correctly to a question, the answer was explained on the screen. Additional instructions, adapted for each experimental condition, were provided for the reward as well as for the punishment part. The options in the PGG were labeled in a neutral way: punishment and reward were called ‘subtracting’ and ‘adding’ points, respectively.

A total number of 184 student subjects participated in the experiment (32% male; 34% economics major). Both the majority and unanimity sessions comprised 64 subjects in total, while 56 subjects were in a session which was held in the individual condition. Payoffs averaged €12.50, with a minimum of €8.50 and a maximum of €15.

4. Method and results

4.1 Descriptive results

All subjects participated first in the baseline, and subsequently in reward as well as punishment of one of the conditions. A Mann-Whitney test revealed no significant effect of the order in which punishment and reward treatments were administered on average contribution in either the reward (z=1.601; p=0.11) or punishment (z=1.441; p=0.15) games.7 However, since these p-values are relatively low we check the robustness of our parametric analyses, in which we combine the two sanctioning treatments, against analyses in which only the first sanctioning treatment is included.

Figure 1 shows the average contributions in the PGGs over the rounds in the baseline and in each experimental condition. Contributions are initially around 50% of the endowment. This is in line with previous findings (Ledyard 1995). After the first round, Figure 1 shows strong differences in contribution levels between the conditions. Contributions in the baseline decline to almost zero. Conversely, individual and majority punishment are the only conditions under which contributions increase over time. A Wilcoxon signed rank test confirms that average contribution is higher in the reward than the baseline (z=2.432; p=0.02) and in the punishment than the reward conditions (z=3.059; p<0.01). For both reward and punishment the individual and majority conditions lead to higher contributions than unanimity, although only the difference between individual and unanimity punishment is significant in a Mann-Whitney test (z=2.309; p=0.02).

Overall average profits are higher in the reward than in both the punishment and baseline treatments (Wilcoxon signed rank test: z=2.589; p=0.01 for baseline vs. reward; z=2.981; p<0.01 for punishment vs. reward; z=1.098; p=0.27 for baseline vs. punishment). However, this is related to our reward technology, which enables earnings to be higher in the reward than the other treatments. Highest possible group earnings are achieved with full contribution in baseline and punishment, and with full contribution and mutual rewarding in the reward treatments. When we consider average earnings as a proportion of the highest possible, this proportion is higher in both punishment (z=3.059; p<0.01) and baseline (z=3.059; p<0.01) than in the reward treatments.

When a subject was punished in the majority condition, in 58% of the cases this was by one person only and therefore the punishment was not carried out. Likewise, in 81% of the cases in which a subject was punished in the unanimity condition the required number of three sanctioning subjects was not reached. For reward, in 72% of the cases in which someone was rewarded in the majority condition and in 97% of the cases under unanimity the reward was not implemented. In line with previous research, 25% of punishments were targeted at subjects contributing the average of other group members or more. Of these, 91% and 98% were not implemented in majority and unanimity, respectively. 33% of rewards were targeted at below-average contributors, of which 89% and 100% were not implemented under majority and unanimity.

Figure 2 shows the average number of sanctions allocated and average number of sanctions carried out for different deviations of the recipient from the average contribution of the other group members. Note that between one and three other group members can propose to sanction. Figure 2 shows a clear trend of more punishment proposed on average the more the recipient negatively deviates from the average contribution of others. Also, more rewards are proposed for above-average contributors, but it is not so clear that more rewards are proposed the further the deviation.

4.2 Contribution – methods

The first dependent variable, contribution, is measured as the contribution decisions of subjects in the PGG. First, we test macro-level hypotheses by comparing dummies for the experimental conditions individual, majority, and unanimity punishment and reward. These are less conservative tests for the differences between conditions than the comparisons in the previous subsection, because the interdependencies between the observations are modeled in more detail. Still, the results mainly reconfirm the differences that resulted from the non-parametric tests. Second, we test the micro-level hypotheses explaining differences between experimental conditions based on individual decision patterns. Punishment and reward conditions are analyzed separately.

In the micro-level models, sanctions received are measured as the number of others who had sanctioned the subject in the previous round. Only executed sanctions are included. Furthermore, three dichotomous variables indicate whether in the previous round a subject had contributed more than 4 points below the average of other group members, more than 4 points above the average, or did not deviate from the average by more than 4 points. These three dummies for previous deviation are interacted with the number of sanctions received to test whether the effect of being sanctioned is different for above- and below-average contributors.

Previous deviation was measured using dummies for more than 4 points higher/lower rather than a continuous variable indicating the precise extent of the deviation. This is because a continuous variable interacted with received reward tests if subjects increase (decrease) their contribution more, the higher (lower) the contribution for which they were rewarded. This is unrealistic, since contribution is limited between 0 and 20. The boundaries of 4 points from the average are chosen such that the deviation is substantial enough for subjects to perceive sanctions as clearly norm-enforcing or perverse. Accordingly, log-likelihoods of models with different boundaries are equal to or lower than those of the models presented here. We control for the subjects’ contribution in the previous round, round number, treatment order, and experimental condition. Furthermore, preference for conditional cooperation is included, measured as the slope of the conditional contribution assessed in the first part of the experiment. The steeper the slope, the more a subject indicated to contribute more when others do so as well.8

We use Tobit regression to take into account that contribution has a limited range, between 0 and 20, of which both extremes are often chosen. The units of analysis are decisions in the PGGs. Random effects at the subject level are included to model that decisions are nested in subjects, since every subject makes multiple contribution decisions. Also, within a session subjects often encounter others with whom they or their group members have interacted previously. Thus, subjects are interdependent within sessions. It is not possible to include both the subject and session level in a three-level Tobit model. Therefore, all models were replicated using multilevel linear regression, in which both subject and session level random effects are included but where contribution is treated as if its range is unlimited. Also, we estimated the models using Tobit regression with random effects at the session level to test if disregarding this level in the models presented below influenced the results, and we ran a Tobit model with robust standard errors adjusted for clustering within sessions. The latter model provides the most conservative way of correcting for the clustering of observations and, therefore, might underestimate the significance of some effects. Given the limited effect of the session level in, e.g. the three-level linear regression model, we have considerable confidence in the estimations of the two-level Tobit models with random effects for subjects reported in the tables. Finally, we examined the possible effects of punishment and reward treatment order in more detail by rerunning all models with only the first treatment that subjects participated in included. Effects of treatment order and robustness of the results in alternative analyses are discussed for every model separately below.

4.3 Contribution – results

Table 1 shows differences in contribution decisions between the experimental conditions. The baseline condition, in which every subject participated, serves as a reference. Contributions in all experimental conditions except unanimity reward were higher than in the baseline, although the effect of majority reward is insignificant when we adjust for clustering within sessions. Contrary to Hypothesis 10a, contribution under punishment is higher in the individual than the majority condition (χ2(1)=29.51; p<0.01). The other macro-level hypotheses are confirmed. Contribution under punishment is higher in the individual than the unanimity condition (χ2(1)=136.58; p<0.01), confirming Hypothesis 10b. As predicted in Hypothesis 11, contribution is higher under punishment than reward in the individual (χ2(1)=228.83; p<0.01), majority (χ2(1)=246.01; p<0.01), and unanimity (χ2(1)=122.01; p<0.01) condition. Finally, contribution under reward is higher in the individual than the majority condition (χ2(1)=23.76; p<0.01) and higher in the majority than the unanimity condition (χ2(1)=12.79; p<0.01). This confirms Hypotheses 12a and 12b. All differences between decision rules are insignificant in the conservative model that accounts for clustering in sessions, but remain highly significant in other model specifications. The differences between punishment and reward remain significant in every alternative specification.

Because we want to exclude that the support for the hypotheses confounds with effects of subjects playing a punishment and reward treatment after each other, we also consider effects of the ordering of treatments. Contributions in the punishment conditions are lower when punishment was the first compared to when it was the second treatment. In the reward conditions, contributions are higher when it was the first treatment. Still, when we only consider the first treatments subjects participated in, contributions are higher in individual than in majority conditions, although this difference becomes insignificant for reward (χ2(1)=0.93; p=0.34). Also, contributions are higher in majority than in unanimity conditions. Finally, contributions are higher in the individual and unanimity punishment conditions than in the related reward conditions. Only in the majority condition this difference disappears (χ2(1)=0.16; p=0.69). Hence, the confirmation of this part of Hypothesis 11 should be interpreted with caution.

The micro-level model for the punishment conditions is presented in Table 2. Only main effects are included in Model 2. Several control variables are significant. Contribution is lower in the unanimity compared to the individual condition and when punishment was administered first, and higher the more a subject contributed in the previous round. The difference between the individual and majority condition is not significant in this model. Subjects who contributed 4 points or more below the average increase and subjects who contributed above the average decrease their contribution compared to around-average contributors. Also, contribution is higher the more punishment was received previously.

Interaction effects are included in Model 3. The main effect of punishment is excluded from this model, so the three interactions represent the effect of received punishment for the three groups of subjects belonging to specific deviations from the mean contribution. The model shows that subjects contributing below the average increase their contribution more, the more they are punished. Hypothesis 6 is thus confirmed. The insignificant main effect of negative deviation indicates that subjects who contributed below the average but were not punished do not significantly increase their contribution compared to around-average contributors. Subjects who contributed above the average decreased their contribution if they had not been punished, but did not decrease their contribution further after receiving punishment.

Thus, no support is found for Hypothesis 7. This might be due to the relatively limited amount of sanctioning against high contributors even in the individual condition. The effect remains insignificant in a separate analysis of the individual condition.

These findings in Models 2 and 3 are similar in a multilevel model, with random effects and clustering at session level, and in a model in which only the first treatments are considered. All hypothesis-related effects are robust.

Model 4 in Table 3 shows the determinants of contribution decisions in the reward conditions. In this model the differences between experimental conditions and treatment order are not significant. The other control variables are significant; contribution is higher the more conditionally cooperative a subject is and the more a subject contributed previously, and decreases over rounds. Subjects who previously contributed above the average decrease and those who contributed below the average increase their contribution compared to around-average contributors. Finally, the more rewards a subject had previously received, the higher the contribution.

In Model 5, the interaction effects are included. Again, the three interactions represent the separate main effects. This shows that subjects who had contributed above the average significantly decrease their contribution. However, the decrease was significantly weaker the more they were rewarded. This confirms Hypothesis 8. Very few subjects received rewards after a below-average contribution, and virtually all were ruled out by majority and unanimity. Hence, we find no significant effect of being rewarded for around-average or below-average contributors. The effect remains insignificant in a separate analysis of the individual condition. Hypothesis 9 is not confirmed. Again, findings are similar in a multilevel model, with random effects and clustering at session level, and in a model in which only the first treatments are considered All hypothesis-related effects are robust.

4.4 Sanctioning – methods

The second dependent variable in the analysis of the micro-level framework are the decisions whether or not to sanction. These are three observations for each subject in each period, one for every other group member.

The first independent variable is a subjects’ own contribution. Furthermore, contribution of the recipient is included as a continuous variable. Deviation of the recipient from the average of others is measured as the contribution of the recipient minus the average of the other group members. The variable positive deviation includes all positive values of this measure, negative values are set to zero. Absolute negative deviation represents the extent of the deviation of all negative values, zero for positive deviations. For punishment, the contribution and absolute negative deviation of the recipient are interacted with the subjects’ own contribution to test whether high contributors are more likely to punish the less the recipient contributes, and the further he deviates from the average. For reward, contribution and positive deviation of the recipient are interacted with subjects’ contribution. We control for experimental condition, treatment order, slope of the conditional contribution, and for sanctions assigned and received by the subject in the previous round.

We use logistic regression to analyze the dichotomous sanctioning decisions. Every subject makes three sanctioning decisions, one for every other group member, in all ten periods. Decisions are thus nested within periods and subjects. A multilevel intercept-only model with decisions nested in periods and subjects revealed that variance at the period level is negligible for both punishment and reward decisions. We therefore use multilevel models with decisions nested only in subjects. All models were repeated using only the first treatment subjects participated in. We discuss the treatment effects of all models below.

4.5. Sanctioning – results

Models on punishment decisions are displayed in Table 4. Model 6 shows that there are no differences between the experimental conditions in the likelihood that a subject decides to punish another. We do find that subjects who have received or have allocated punishment in the previous round are more likely to punish. The likelihood of punishing increases with contribution, confirming Hypothesis 1. Also, the more a recipient negatively deviates from others’ contribution, the higher the likelihood that punishment is allocated while no effect is found for positive deviation. Finally, the more a group member contributes, the less likely subjects are to punish this person.

Model 7 shows a significant interaction effect of contribution with the contribution of the recipient, confirming Hypothesis 2a. A significant interaction with negative deviation of the recipient confirms Hypothesis 2b. High contributors are thus more likely to punish the less a recipient contributes in absolute sense, and relative to the average of others.

The effect that high contributors punish especially others who contribute less than average (Hypothesis 2b) is not found if we only consider the first treatment for Model 7. This is probably due to the lower number of observations when only one treatment is included, which makes it more difficult to disentangle the different reasons why high contributors punish others.

Table 5 shows the models on reward. Main effects included in Model 8 show that subjects in the unanimity condition are more likely than in the individual condition to allocate rewards. Furthermore, subjects are more likely to reward the more rewards they had allocated in the previous period. The effect of period is significant, confirming Hypothesis 5. Also, subjects are more likely to reward the more the recipient contributes, but not the higher the positive deviation from the average. We do find that rewarding is less likely the more the recipient negatively deviates. Hypothesis 3 is supported: subjects who made a higher contribution are more likely to reward.

Model 9 shows the interaction of a subjects’ own contribution with the contribution and positive deviation of the recipient. The significant effects indicate that high contributors are more likely to reward the higher and the further above the average someone contributes, confirming Hypotheses 4a and 4b.

The main effect of contribution (Hypothesis 3) and the effect that high contributors reward especially others who contribute much (Hypothesis 4a) are not found if we only consider the first treatment for Model 9. Again, this is probably due to the lower number of observations when only one treatment is included, which makes it more difficult to disentangle the different reasons why high contributors reward others.

5. Conclusion and discussion

We compared the effect of individual, majority, and unanimity decision rules for implementing punishment and reward on actors’ ability to enforce cooperation in a Public Goods Game (PGG). For punishment, we conjectured that contributions are higher under a majority than an individual decision rule (Hypothesis 10a). However, we find higher contributions under the individual decision rule instead. As expected, we do find that contribution is lower under a unanimity than an individual punishment decision rule (Hypothesis 10b). For reward, the hypotheses concerning the effects of decision rules on contribution are all confirmed. We find that contribution is higher under an individual than a majority decision rule (Hypothesis 12a) and higher under a majority than a unanimity decision rule (Hypothesis 12b). In sum, for both punishment and reward contributions are lower, the more actors are required to agree on sanctioning. Also, as hypothesized, contribution is higher under punishment than reward for every decision rule (Hypothesis 11), although no difference is found in the majority condition when only the first treatment with sanctions is considered.

Findings on individual behavior, as captured in micro-level hypotheses, offer an explanation for the observed differences in contribution between decision rules. The emerging pattern is very similar for reward and punishment. Hypotheses on the use of cooperation-enforcing sanctions are all confirmed. High contributors are more likely to punish (Hypothesis 1) and to reward (Hypothesis 3) than low contributors. These high contributors enforce the norm that others should contribute as well. That is, they are more likely to punish the less a recipient contributes (Hypotheses 2a) and the lower the contribution of the recipient is compared to the other group members (Hypothesis 2b). Likewise, high contributors reward group members who also make a high contribution (Hypothesis 4a) and who contribute more compared to the others (Hypothesis 4b). In other words, there is more consensus on sanctions among high contributors, the more an actor violates or adheres to their cooperative norm. Still, many punishments and rewards under the majority and unanimity decision rules were not executed. This implies that reaching the required number of actors was difficult despite the high consensus on whom to target.

When low contributors are punished, they contribute more in the subsequent interaction (Hypothesis 6). Similarly, actors who are rewarded for contributing more than other group members increase their contribution compared to others who are rewarded less (Hypothesis 8). Thus, we find strong evidence that cooperation-enforcing sanctions have a positive effect on contributions. Conversely, perverse sanctioning occurred too infrequently to affect contribution levels. We cannot confirm that high contributors decrease their contribution after being punished perversely (Hypothesis 7). Likewise, contrary to our expectations, free-riders who are rewarded perversely do not decrease their contribution further (Hypothesis 9). We did find that almost all perverse sanctions were ruled out under majority and unanimity.

In sum, we find strong evidence for cooperation-enforcing sanctions, and their positive effects on contribution. Concurrently, perverse sanctions occur too infrequently to affect cooperation. This makes an individual decision rule (IDR) unproblematic: punishment is mostly targeted at free-riders regardless of the possibility for individual actors to sanction perversely. Because more cooperation-enforcing sanctions are obstructed the more actors are required for the collective decision rule (CDRs), we observe lower contribution levels the more actors are required to agree. The observed micro-level behavior thus explains the macro-level finding of lower contribution levels under unanimity than majority, and lower contributions in the majority than in the individual condition.

The use of rewards decreases over time (Hypothesis 5). This provides an additional impediment for CDRs, because it implies that the more actors are required to agree, the sooner consensus cannot be reached anymore. Rewards are therefore even more problematic to enforce than punishment, hence contributions are higher under punishment than reward.

Casari and Luini (2009) find, in groups of five, that punishments on which two out of four actors agree are much more effective than sanctions with an IDR. We use stricter CDRs of two and three actors in groups of four, and find that contributions are highest under an IDR. However, contribution levels in our majority punishment condition (Figure 1) and the CDR of Casari and Luini (2009, Figure 1) are very similar. The difference between their findings and ours is that contribution in their individual punishment condition is much lower. Herrmann et al. (2008) find such differences in contributions under individual decision rules between subject pools. They attribute this to different levels of perverse punishment. Indeed, Casari and Luini (2009) find that contributions in their individual punishment condition are diminished due to perverse punishments. We find that perverse punishments do not affect contributions even under an IDR.

We started this paper with the observation that actors engaged in real-life public good problems often use CDRs to successfully enforce cooperation. One possible reason why we find that an IDR is more effective might be that interactions in our experiment are one-shot and anonymous rather than repeated. In many real-life public good problems, especially in small communities or between nations, participants interact repeatedly. Moreover, actors can often communicate before deciding whether or not to sanction. Repeated interaction and communication both imply that actors can coordinate on raising the required proportion of agreeing actors.

Furthermore, in real-life it is often possible to identify which actors neglected to agree on sanctioning. Therefore, when the required consensus is not reached the actors who did not sanction can be held accountable, for example through second-order punishment (Cinyabuguma et al. 2006; Denant-Boemont et al. 2007; Nikiforakis 2008). Previous studies found that second-order punishment is not always effective because it is used by defectors to punish first-order punishers. This issue should be alleviated when CDRs are used, because responsibility for punishment is shared by multiple others. Also, when a CDR is used for second-order punishment as well, agreement on punishment of punishers might not be reached.

Finally, in our experiment actors had complete information about others’ contributions. In reality, some of the actors might make an inaccurate observation of the contributions of some of the others. An IDR might lead to inaccurate sanctioning decisions in such an environment (Grechenig et al. 2010; Ambrus and Greiner 2012). However, under a CDR mistaken sanctions caused by a wrong observation of an actor’s contribution by one of the others will be ruled out.

Repeated interactions, communication on whom to sanction, public announcement of sanctioning decisions, use of counter-punishment, and noise can be implemented in future experiments to enhance resemblance with actual public good problems. As indicated above, these adaptations might favor CDRs, because coordination of sanctions in CDRs can become easier and mistakes in sanctions can be prevented. Still, the disentangling of sanctions through reciprocal contributions and sanctions through exogenous institutions will remain a challenge in these set-ups. In addition, there might also be some more realistic specifications of the interaction situation which favor IDRs. Most importantly, we assumed that non-implemented sanctions are costless. In reality, it might be more plausible that people have to invest in sanctioning before knowing whether others will agree. This would make implementation of sanctions under a CDR even more problematic. Future research should further specify conditions under which either CDRs or IDRs are more successful in enforcing cooperation.