Bin Xu is the holder of the grant from the Social Science Experimental Center of Zhejiang University that funded this project. All authors contributed equally to the study.
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
We examine the effectiveness of the individualpunishment mechanism in larger groups, comparing groups of four to groups of 40 participants. We find that the individual punishment mechanism is remarkably robust when the marginal per capita return (MPCR),
The voluntary contribution mechanism (VCM) has been an important topic of research in experimental economics. Among the many issues addressed by laboratory experiments is the relationship between group size and the level of contributions. Isaac, Walker and Williams [
In a separate line of research, Fehr and Gächter [
A number of studies have examined the robustness of Fehr and Gächter’s results with respect to punishment effectiveness and cost (Egas and Riedl [
The effectiveness of the individual punishment mechanism in laboratory groups of four, five, or ten provides a persuasive explanation of how freeriding behavior can be mitigated in relatively small groups that need to mobilize contributions of money or effort towards a common public good. However, it is uncertain whether such a mechanism would continue to be effective in the much larger groups that must often cooperate together in the real world for the common good. Carpenter [
As group size increases, two potential problems arise with the individual punishment mechanism. First, it may become more difficult to identify free riders. For example, if four people share an office and are together obliged to keep the shared facilities clean, it may not be too difficult to identify and punish the one person who neglects to clean the microwave. However, if 40 people share an office, it may be more difficult to identify all ten people who fail to do their share. Since identification of the responsible individuals is necessary in order to punish them, such a problem could detract from the effectiveness of the punishment mechanism.
The primary objective of our study is to focus on the latter problem. In particular, we examine the robustness of the individualpunishment mechanism at a constant finetofee ratio in the context of the potential punishment coordination problems that may occur in larger groups even when all freeriders can be identified. Following Isaac, Walker and Williams [
It is less clear how an increase in group size, with a constant MPCR, would influence the effectiveness of the individual punishment mechanism. On the one hand, the increase in MGR might be expected to encourage the punishment of low contributors by those who care about the larger potential social surplus. On the other hand, the coordination problem described above may cause freeriding to take hold if some low contributors are not initially punished.
Our specific experimental design adopted key elements from Fehr and Gächter’s two important studies [
Each subject was endowed with 20 tokens for each round. As in Fehr and Gächter [
The exchange rate was set at 21 Tokens = 1 RMB for group size = 4 and 39.23 (150) Tokens = 1 RMB for group size = 40 with MPCR = 0.04 (0.4). These exchange rates were calculated by holding the mean of the freeriding payoff and the fullcontribution payoff plus the 25 (500) tokens for the four (40) person P condition equal in RMB between these treatments. Lastly, each subject was also given a 10 RMB showup fee.
In summary, there are three independent variables: group size (small/large, namely four
Small group (4), PN, High MPCR (0.4), Low MGR (1.6), ten groups
Large group (40), PN, High MPCR (0.4), High MGR (16), three groups
Large group (40), PN, Low MPCR (0.04), Low MGR (1.6), three groups
Small group (4), NP, High MPCR (0.4), Low MGR (1.6), ten groups
Large group (40), NP, High MPCR (0.4), High MGR (16), three groups
Large group (40), NP, Low MPCR (0.04), Low MGR (1.6), three groups
The two smallgroup treatments always have a high MPCR and a low MGR. In what follows, we will refer to them simply as smallgroup treatments. In contrast, it is necessary to distinguish between the large group treatments with a high MPCR (and high MGR) and those with a low MPCR (and low MGR). The six treatments are displayed in
Subjects were randomly recruited via online advertisements at Zhejiang University in Hangzhou, China. All subjects were fulltime undergraduate students in diverse majors across the Sciences, Social Sciences, and Humanities. A total of 560 subjects participated in the study. All sessions were run at the Zhejiang University Experimental Social Science Laboratory.
All sessions were computerized.
Data summary. (Note that NP Order means that the ten nonpunishment rounds preceded the ten rounds with punishment while PN Order means that the ten rounds with punishment preceded the nonpunishment rounds.)
Treatment  Sample Size  P_N difference per capita  P contribution per capita  N contribution per capita  Punishments per capita 

1: Small (n=4), High MPCR=0.4, NP Order  10  69.83  124.40  54.58  10.6 
2: Large (n=40), High MPCR=0.4, NP Order  3  65.15  175.83  110.68  32.89 
3: Large (n=40), Low MPCR=0.04, NP Order  3  32.61  63.60  30.99  8.47 
4: Small (n=4), High MPCR=0.4, PN Order  10  38.08  125.18  87.10  6.08 
5: Large (n=40), High MPCR=0.4, PN Order  3  43.77  126.54  82.78  26.30 
6: Large (n=40), Low MPCR=0.04, PN Order  3  36.41  54.10  17.70  9.68 
In
Regression results on tenround per capita differences in contributions between the punishment and nopunishment conditions (pvalues in parentheses) [Treatment numbers in square brackets as defined in
Estimation: DV= β_{0 }+ β_{1}(Large_High) + β_{2}(Large_Low) + β_{3}(Order) + β_{4}(Large_High×Order) + β_{5}(Large_Low×Order) Observations: 560 Number of Groups: 32 Overall R squared: 0.109  


β_{0} = 69.83 (0.000) [Trmt. 1]  β_{0 }+ β_{1 }= 65.15 (0.000) [Trmt. 2] 
β_{1 }= −4.68 (0.741) [Trmt. 2−1]  β_{0 }+ β_{2} = 32.61 (0.000) [Trmt. 3] 
β_{2 }= −37.22 (0.009) [Trmt. 3−1]  β_{0 }+ β_{3 }= 38.08 (0.000) [Trmt. 4] 
β_{3 }= −31.75 (0.007) [Trmt. 4−1]  β_{0 }+ β_{1} + β_{3} +β_{4 }= 43.77 (0.000) [Trmt. 5] 
β_{4 }= 10.37 (0.604) [Trmt. (5−2) − (4−1)]  β_{0 }+ β_{2} + β_{3} +β_{5 }= 36.41 (0.000) [Trmt. 6] 
β_{5 }= 35.55 (0.076) [Trmt. (6−3) − (4−1)]  


β_{1} – β_{2} = 32.54 (0.044) [Trmt. 2−3]  β_{3} + β_{4} = −21.38 (0.185) [Trmt. 5−2] 
β_{1 }+β_{4 }= 5.69 (0.688) [Trmt. 5−4]  β_{3 }+β_{5 }= 3.80 (0.814) [Trmt. 6−3] 
β_{2} + β_{5 }= 1.67 (0.906) [Trmt. 6−4]  β_{5} – β_{4 }= 25.18(0.269) [Trmt. (6−3) − (5−2)] 
(β_{1 }+β_{4}) – (β_{2} + β_{5}) = 7.36 (0.648) [Trmt. 5−6] 
The first thing to notice is that, for all six treatments, the difference in tenround per capita contributions between punishment and nonpunishment rounds is significant with a
It may take time for participants to adjust to the change of condition. Thus, it is interesting to examine the analogous results for the last round under each condition.
Regression results on lastround per capita differences in contributions between the punishment and nopunishment conditions (pvalues in parentheses) [Treatment numbers in square brackets as defined in
Estimation: DV= β_{0 }+ β_{1}(Large_High) + β_{2}(Large_Low) + β_{3}(Order) + β_{4}(Large_High×Order) + β_{5}(Large_Low×Order) Observations: 560 Number of Groups: 32 Overall R squared: 0.164  


β_{0} = 12.28 (0.000) [Trmt. 1]  β_{0 }+ β_{1 }= 10.10 (0.000) [Trmt. 2] 
β_{1 }= −2.18 (0.490) [Trmt. 2−1]  β_{0 }+ β_{2} = 3.44 (0.186) [Trmt. 3] 
β_{2 }= −8.83 (0.005) [Trmt. 3−1]  β_{0 }+ β_{3 }= 9.000 (0.000) [Trmt. 4] 
β_{3 }= −3.28 (0.191) [Trmt. 4−1]  β_{0 }+ β_{1} + β_{3} +β_{4 }= 8.38 (0.001) [Trmt. 5] 
β_{4 }= 1.55 (0.728) [Trmt. (5−2) − (4−1)]  β_{0 }+ β_{2} + β_{3} +β_{5 }= 2.89 (0.266) [Trmt. 6] 
β_{5 }= 2.73 (0.076) [Trmt. (6−3) − (4−1)]  


β_{1} – β_{2} = 6.66 (0.070) [Trmt. 2−3]  β_{3} + β_{4} = −1.73 (0.639) [Trmt. 5−2] 
β_{1 }+β_{4 }= −0.63 (0.843) [Trmt. 5−4]  β_{3 }+β_{5 }= −0.55 (0.881) [Trmt. 6−3] 
β_{2} + β_{5 }= −6.11 (0.052) [Trmt. 6−4]  β_{5} – β_{4 }= 1.18 (0.821) [Trmt. (6−3) − (5−2)] 
(β_{1 }+β_{4}) – (β_{2} + β_{5}) = 5.48 (0.136) [Trmt. 5−6] 
The difference in lastround per capita contributions between punishment and nonpunishment rounds is significant for both the smallgroup (
In contrast to the tenround average data, none of the order effects or interactions involving order effects is individually significant for the lastround data. Moreover, a joint test that the coefficients on the main order effect together with those on its interactions with the two other treatment dummies all equal zero yields a ChiSquare statistic of 1.95 with three degrees of freedom (
Regression results on lastround per capita differences in contributions between the punishment and nopunishment conditions dropping insignificant order effects (pvalues in parentheses) [Treatment numbers in square brackets as defined in
Estimation: DV= β_{0 }+ β_{1}(Large_High) + β_{2}(Large_Low) Observations: 560 Number of Groups: 32 Overall R squared: 0.154  


β_{0} = 10.64 (0.000) [Trmt. 1, 4]  β_{0 }+ β_{1 }= 9.24 (0.000) [Trmt. (2, 5)] 
β_{1 }= −1.40 (0.522) [Trmt. (2, 5) − (1, 4)]  β_{0 }+ β_{2} = 3.17 (0.079) [Trmt. (3, 6)] 
β_{2 }= −7.47 (0.001) [Trmt. (3, 6) − (1, 4)]  


β_{1} – β_{2} = 6.07 (0.017) [Trmt. (2, 5) − (3, 6)] 
The difference in lastround per capita contributions between punishment and nonpunishment rounds continues to be significant for the smallgroup and highMPCR largegroup cases (
Is the punishment condition less effective in the lowMPCR largegroup treatment simply because fewer punishments are purchased when the potential gains from further contributions are relatively small? The last column of
To investigate this issue further, we regress per capita expenditures on punishment for each group aggregated over all ten punishment rounds on the same dummy variables representing the different treatments as used above. There are 32 observations, one for each group. The estimated coefficients and related hypothesis tests are presented in
Regression results on tenround per capita expenditures on punishment (pvalues in parentheses) [Treatment numbers in square brackets as defined in
Estimation: DV= β_{0 }+ β_{1}(Large_High) + β_{2}(Large_Low) + β_{3}(Order) + β_{4}(Large_High×Order) + β_{5}(Large_Low×Order) Observations: 32 Adjusted R squared: 0.475  


β_{0} = 10.60 (0.000) [Trmt. 1]  β_{0 }+ β_{1 }= 32.89 (0.000) [Trmt. 2] 
β_{1 }= 22.29 (0.000) [Trmt. (2−1)]  β_{0 }+ β_{2} = 8.47 (0.093) [Trmt. 3] 
β_{2 }= −2.13 (0.703) [Trmt. (3−1)]  β_{0 }+ β_{3 }= 6.08 (0.031) [Trmt. 4] 
β_{3 }= −4.53 (0.240) [Trmt. (4−1)]  β_{0 }+ β_{1} + β_{3} +β_{4 }= 26.30 (0.000) [Trmt. 5] 
β_{4 }= −2.07 (0.794) [Trmt. (5−2) − (4−1)]  β_{0 }+ β_{2} + β_{3} +β_{5 }= 9.68 (0.057) [Trmt. 6] 
β_{5 }= 5.73 (0.471) [Trmt. (6−3) − (4−1)]  


β_{1} – β_{2} = 24.43 (0.001) [Trmt. (5−2)]  β_{3} + β_{4} = −6.59 (0.346) [Trmt.(5−2)] 
β_{1 }+β_{4 }= 20.23 (0.001) [Trmt. (5−2)]  β_{3 }+β_{5 }= 1.21 (0.862) [Trmt. (6−3)] 
β_{2} + β_{5 }= 3.60 (0.521) [Trmt. (5−2)]  β_{5} – β_{4 }= 7.80 (0.429) [Trmt. (6−3) − (5−2)] 
(β_{1 }+β_{4}) – (β_{2} + β_{5}) = 16.63 (0.023) [Trmt. (5−6)] 
While per capita expenditures on punishment are significantly higher in the highMPCR largegroup treatment, the only highMGR treatment, than in the other two lowMGR treatments, the effectiveness of the punishment condition at increasing contributions is significantly higher in both the highMPCR largegroup treatment and the smallgroup treatment than in the lowMPCR largegroup treatment. Thus, statistically indistinguishable levels of per capita spending on punishment are significantly more effective at increasing contributions in the smallgroup treatment than in the lowMPCR large group treatment. Moreover, significantly higher levels of per capita spending on punishment in the highMPCR largegroup treatment relative to the smallgroup treatment produce increases in contributions that are statistically indistinguishable from each other. We hypothesize that this reflects a coordination problem that afflicts the decentralized punishment mechanism in large groups, making per capita expenditures on punishment less effective at increasing contributions in such groups.
Suppose for example that 25% of participants are low contributors. In a group of four, this implies that there is just one low contributor and three higher contributors who might decide to punish him or her. Suppose that each high contributor purchases one punishment point. The low contributor will receive three punishment points, perhaps an inducement to contribute more in the next round. In an analogous group of 40, there would be ten low contributors and thirty higher contributors who might decide to punish one or more of the ten low contributors. If each high contributor purchases one punishment point, the ten low contributors will together receive thirty punishment points, an average of three per person. It is possible that these thirty punishment points will be divided equally among the ten low contributors. In that case, each low contributor will receive three punishment points just as in the small fourperson group. However, there is no mechanism to coordinate the distribution of punishment points among the low contributors. Therefore, it is unlikely that they will be distributed equally. Instead it is probable that some low contributors will receive more punishment points than necessary to motivate higher contributions, while others will receive fewer or none at all.
Proportion of low contributors punished averaged across sessions by treatment for two definitions of low contributor.
Treatment  Sample Size  25^{th} Percentile or Lower and not Highest in Round  Ten or Lower and not Highest in Round 

1: Small (n=4), High MPCR=0.4, NP Order  10  0.609  0.606 
2: Large (n=40), High MPCR=0.4, NP Order  3  0.761  0.741 
3: Large (n=40), Low MPCR=0.04, NP Order  3  0.347  0.188 
4: Small (n=4), High MPCR=0.4, PN Order  10  0.674  0.538 
5: Large (n=40), High MPCR=0.4, PN Order  3  0.689  0.486 
6: Large (n=40), Low MPCR=0.04, PN Order  3  0.476  0.238 
To determine whether there is a significant difference in the likelihood of a low contributor being punished in the lowMPCR large group treatment than in the other two treatments, we employed a negative binomial regression for each definition of a low contributor. For each group of participants, we have one count of the number of times a low contributor received at least one punishment aggregated across all rounds. This is the dependent variable. In addition, we calculate the number of times a low contribution occurred aggregated across all rounds, the log of which is used as the exposure variable.
Negative binomial regression results for the proportion of times people in the lowest contribution quartile who were not among the highest contributors in a round were punished (pvalues in parentheses) [Treatment numbers in square brackets as defined in
Estimation: DV= β_{0 }+ β_{1}(Large_High) + β_{2}(Large_Low) + β_{3}(Order) + β_{4}(Large_High×Order) + β_{5}(Large_Low×Order) Observations: 32 Pseudo R squared: 0.061  


β_{0} = −0.54 (0.000)  
β_{1 }= 0.24 (0.202) [Trmt. (2−1)] 

β_{2 }= −0.52 (0.007) [Trmt. (3−1)] 

β_{3 }= 0.14 (0.462) [Trmt. (4−1)] 

β_{4 }= −0.23 (0.397) [Trmt. (5−2) − (4−1)] 

β_{5 }= 0.17 (0.522) [Trmt. (6−3) − (4−1)] 



β_{1} – β_{2} = 0.77 (0.000) [Trmt. (2−3)] 

β_{1 }+β_{4 }= 0.02 (0.929) [Trmt. (5−4)] 

β_{2} + β_{5 }= –0.35 (0.066) [Trmt. (6−4)] 

(β_{1 }+β_{4}) – (β_{2} + β_{5}) = 0.37 (0.053) [Trmt. (5−6)] 



β_{3} + β_{4} = –0.09 (0.646) [Trmt. (5–2)] 

β_{3 }+β_{5 }= 0.31 (0.103) [Trmt. (6–3)] 

β_{5} – β_{4 }= 0.40 (0.137) [Trmt. (6–3) – (5–2)] 

A joint test of the null hypothesis that the order effect and its interactions with the treatment variables all equal zero yields a chisquare statistic of 3.41 with three degrees of freedom (
Negative binomial regression results for the proportion of times people who contributed ten or less who were not among the highest contributors in a round were punished (pvalues in parentheses) [Treatment numbers in square brackets as defined in
Estimation: DV= β_{0 }+ β_{1}(Large_High) + β_{2}(Large_Low) + β_{3}(Order) + β_{4}(Large_High×Order) + β_{5}(Large_Low×Order) Observations: 32 Pseudo R squared: 0.104  


β_{0} = –0.61 (0.000) [Trmt. 1]  
β_{1 }= 0.29 (0.228) [Trmt. (2–1)] 

β_{2 }= –1.06 (0.000) [Trmt. (3–1)] 

β_{3 }= 0.08 (0.720) [Trmt. (4–1)] 

β_{4 }= –0.51 (0.119) [Trmt. (5–2) – (4–1)] 

β_{5 }= 0.16 (0.608) [Trmt. (6–3) – (4–1)] 



β_{1} – β_{2} = 1.35 (0.000) [Trmt. (2–3)] 

β_{1 }+β_{4 }= –0.22 (0.319) [Trmt. (5–4)] 

β_{2} + β_{5 }= –0.90 (0.000) [Trmt. (6–4)] 

(β_{1 }+β_{4}) – (β_{2} + β_{5}) = 0.68 (0.003) [Trmt. (5–6)] 



β_{3} + β_{4} = –0.43 (0.081) [Trmt. (5–2)] 

β_{3 }+β_{5 }= 0.23 (0.304) [Trmt. (6–3)] 

β_{5} – β_{4 }= 0.66 (0.048) [Trmt. (6–3) – (5–2)] 

These results together corroborate the coordination hypothesis, supporting the idea that a given per capita expenditure on decentralized individual punishments is more effective at increasing contributions for smaller than for larger groups. In small groups, for a given level of per capita expenditure, a higher proportion of low contributors receive at least one punishment than in large groups. This is the reason that statistically indistinguishable amounts of expenditure on punishment are significantly more effective in the smallgroup treatment than in the lowMPCR largegroup treatment at increasing contributions. It is also the reason that the significantly higher expenditures on punishment observed in the highMPCR largegroup treatment relative to the smallgroup treatment are necessary to produce similar increases in contributions that are statistically indistinguishable from each other.
Finally, it is interesting to examine whether the punishment mechanism is more or less efficient than the standalone VCM in each treatment when both the benefits of contributions and the costs of punishment are taken into account. For each treatment,
Per capita income difference in experimental tokens.
NP Order  PN Order  

Small High MPCR  Large High MPCR  Large Low MPCR  Small High MPCR  Large High MPCR  Large Low MPCR  
Round 1  –5.88  27.00  –5.10  –4.20  –22.50  –5.10 
Round 2  –4.83  21.00  –2.35  –3.36  10.50  –2.75 
Round 3  –2.52  27.00  –1.57  –1.05  22.50  –1.18 
Round 4  –1.26  57.00  –1.18  –0.63  57.00  –1.57 
Round 5  –0.21  93.00  –0.78  –0.42  46.50  –0.39 
Round 6  1.26  124.50  –0.39  –0.42  51.00  –1.96 
Round 7  –0.21  123.00  0.00  1.47  79.50  0.00 
Round 8  5.88  111.00  0.00  2.10  88.50  0.78 
Round 9  5.46  123.00  0.00  2.73  105.00  0.39 
Round 10  2.31  141.00  0.78  2.10  115.50  –0.78 
Sum of 10 rounds  –0.04  846.15  –10.36  –1.45  551.70  –12.48 
The effectiveness of the individual punishment mechanism at increasing contributions to a public good depends critically on what happens to the MPCR of a public good as the potential community of contributors grows. For a pure public good with nonrivalry in consumption, MPCR stays constant and MGR increases proportionally with the size of the community. In this paper, we have demonstrated that the higher MGR produces a significant increase in per capita expenditures on punishment in 40person relative to fourperson groups. At the same time, the larger group creates a coordination problem for the decentralized punishment mechanism, making each dollar spent on punishment less effective at increasing contributions. This occurs because some punishment dollars are inevitably wasted on low contributors who are simultaneously punished sufficiently to increase their contributions by other purchasers of punishment points, while other low contributors escape punishment. In this experimental study, the increase in punishment expenditures was sufficient to offset the reduction in the effectiveness of each punishment dollar. Thus, for a constant MPCR, the individual punishment mechanism proved remarkably robust despite the coordination problems inherent in an institution relying on decentralized individual punishment decisions in the context of a larger group. In fact, despite the rise in cost resulting from the increase in punishment expenditures, the higher MGR on each contribution in the highMPCR largegroup treatment made the punishment mechanism significantly more efficient than in the other two treatments. In fact, the highMPCR largegroup treatment was the only treatment in which aggregate earnings over all ten punishment rounds exceeded aggregate earnings over all ten nonpunishment rounds.
However, if the MGR stays constant, resulting in an MPCR that shrinks with group size, per capita expenditures on punishment do not increase. In this case, the coordination problem associated with the 40person group is not offset by increases in punishment expenditures. This results in the individual punishment mechanism being significantly less effective at increasing contributions for a 40person than for a fourperson community with the same MGR. Examining institutional modifications to mitigate the coordination problem associated with the decentralized individual punishment mechanism is an important issue deserving further study.
Related literatures examine rewards versus punishments (e.g., Rand
We thank an anonymous referee for suggesting this example, which we have slightly modified and expanded upon.
See the experimental instructions associated with each of Fehr and Gächter [
University ethics board requirements made it essential to ensure that no participant left the experiment with less money than when s/he arrived.
Zhijian Wang and Bin Xu jointly designed, tested and implemented the computer program used in this experiment.
Such a regression is completely collinear and thus cannot be run.
As a robustness check, we also employed two alternative estimation techniques: the robust standard error clustering of errors by group and the combination of a random effect for each group plus robust standard error clustering by group. These different estimation techniques yield identical coefficients, but slightly different standard errors. There are no qualitative differences in inferences regarding treatment effects within either the NP or PN orders. To save space, these results are not reported here, but are available from the authors upon request.
See footnote 7.
See footnote 7.
To conserve space, these results are not reported in detail here. They are available from the authors upon request.
The exposure variable adjusts for the differing numbers of low contributions in each group. The proportions for each treatment presented in Table 6 are averages across such proportions, calculated for each group in a treatment. The numerator of each such group proportion is the count of the number of times a low contributor received at least one punishment, while the denominator is the number of times a low contribution occurred aggregated across all rounds.
The detailed results are not reported in order to conserve space. They are available from the authors upon request.
These results are not reported here to save space, but are available from the authors upon request.
We thank Ananish Chaudhuri and Jeffrey Carpenter for helpful comments on an earlier draft of this manuscript. We also acknowledge Qiqi Cheng, Lu Liu, Chao Wang, Tongyu Wu, and Xinchao Zhang for their excellent research assistance, and Zhiwei Fang and Yanmin Qian for their support.