Next Article in Journal
Optimal Control of Heterogeneous Mutating Viruses
Next Article in Special Issue
The Power of Requests in a Redistribution Game: An Experimental Study
Previous Article in Journal
Characterizing Actions in a Dynamic Common Pool Resource Game
Previous Article in Special Issue
This Is How We Do It: How Social Norms and Social Identity Shape Decision Making under Uncertainty
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Signaling Value of Punishing Norm-Breakers and Rewarding Norm-Followers

by
Fabrizio Adriani
1 and
Silvia Sonderegger
2,*
1
Department of Economics, University of Leicester, University Road, Leicester LE1 7RH, UK
2
School of Economics, University of Nottingham, CEDEX, University Park, Nottingham NG7 2RD, UK
*
Author to whom correspondence should be addressed.
Games 2018, 9(4), 102; https://doi.org/10.3390/g9040102
Submission received: 31 August 2018 / Revised: 31 October 2018 / Accepted: 15 November 2018 / Published: 13 December 2018
(This article belongs to the Special Issue Social Norms and Games)

Abstract

:
We formally explore the idea that punishment of norm-breakers may be a vehicle for the older generation to teach youngsters about social norms. We show that this signaling role provides sufficient incentives to sustain costly punishing behavior. People punish norm-breakers to pass information about past history to the younger generation. This creates a link between past, present, and future punishment. Information about the past is important for youngsters, because the past shapes the future. Reward-based mechanisms may also work and are welfare superior to punishment-based ones. However, reward-based mechanisms are fragile, since punishment is a more compelling signaling device (in a sense that we make precise).
JEL Classification:
D82; D83; C72

1. Introduction

It is well known that people are willing punish norm-breakers, even in situations where punishment is costly and does not provide any material benefit to the punisher 1 and literature thereafter. This raises the question: Why should people be willing to incur personal costs in order to punish norm deviators? This paper focuses on an information-based rationale for punishment. The existence of an informational content of punishment is the basis of the so-called denunciation (or expressive) theory in legal philosophy. 2 This theory emphasizes the role of punishment in teaching people what are the social norms and what is not acceptable. Much anecdotal evidence such as the widespread tradition of bringing children to see public executions also points to a potential information role of punishment. 3 Our analysis builds on this idea. We focus on information transmission from parents to children, although it should be quite clear that the analysis applies more broadly to environments in which an “experienced” individual may convey information to a “naive” individual, towards whom he feels altruistic—examples include teacher and pupil, senior worker and junior worker, etc. We argue that the desire to transmit the “correct” information may underpin punishment or ostracism of norm breakers. For instance, a Muslim mother who wears a veil and wishes to raise her daughter along similar lines may be reluctant to engage in friendly relations with unveiled women. Similarly, to signal to his child that pre-marital sex is reprehensible, a parent might demand the removal of an unmarried pregnant teacher working in the child’s school. 4
We build a model of overlapping generations where parental behavior informs children about norms in society. Parents are paternalistic: they evaluate the child’s cost of being punished in the future using the lens of their own (higher) discount factor. This generates a conflict of interests between parents and children that justifies the use of costly signaling for information transmission. We concentrate on norms that are individually costly. Our running example focuses on unproductive behaviors, such as dressing codes, lengthy rituals, or elaborate etiquette, although the analysis can equally apply to behaviors (such as public good provision) that generate externalities. 5 Norm-following is thus purely sustained by fear of punishment. By contrast, the metanorm of sanctioning norm violators is sustained by signaling motives.
The game has an equilibrium where nobody punishes, and nobody follows the norm. However, it also has a punishment equilibrium where norm-following and punishment occur. A central role is played by the notion that parents have better information than children about the history of play. Children know that society may be in one of different regimes, which differ in the extent to which punishment and norm-following are carried out, but they do not know which one. They use their parents’ behavior to update their beliefs about the prevailing regime—and, thus, about the likelihood of being met with punishment if they break the norm. Parents punish past norm-breakers since by doing so they send the correct signal to their child.
An important contribution of the analysis is to highlight the role of history in sustaining norms. In equilibrium, people punish when they observe punishment featuring in recent shared history. Importantly, people (correctly) expect that recent history will shape individual behavior in the future. We show that this effect alone can provide sufficient incentives to sustain an otherwise dominated behavior such as costly punishment.
More precisely, adults active at t know that other t adults will punish past norm breakers, since they have been exposed to the same shared history as they have. At t + 1 , recent history (capturing events at t) will thus contain punishment. In turn, this will induce people in t + 1 to punish as well. The connection between past, present and future punishment means that information about the past is valuable for youngsters, since the past molds the future. It is worth highlighting that in our setup history matters not only because it anchors beliefs but also, more concretely, because the total cost of punishing past norm breakers depends on how many people broke the norm in the previous period (since this determines how many people are to be punished).
Our results underscore the role of continuity, namely the link between past, present and future, to sustain norms (and the metanorm of punishment). The analysis suggests that, if continuity breaks down (as e.g., in the case of an external shock such as a war) this would trigger a process of unraveling that would affect behavior. This regime shift could happen very quickly—in contrast with the predictions of preference-based theories of punishment, where a shock to preferences would translate only very slowly onto behavior (since the older generations would have to die off). 6
We show that the analysis produces non-obvious comparative statics. For instance, we find that the norm being sustained cannot be too cheap. Intuitively, if it was, the conflict of interests between parents and children would vanish, and this would eliminate the need for costly signaling.
While our signaling equilibrium generates interesting insights, any explanation for behavior X that relies on the existence of an equilibrium where X is a signaling device should be taken with caution. Critics might argue that almost any behavior might be rationalized by constructing a setup where failure to adopt that behavior would be interpreted in a sufficiently negative light by the receiver. A convincing account should therefore provide a more compelling rationale why X (rather than some other behavior) should act as a signaling device. To this aim, we construct a setup where signaling operates through an alternative mechanism—namely, costly rewards. 7 Sure enough, we show that the same mechanism at work in a punishment equilibrium may also sustain a reward-based equilibrium. However, a reward equilibrium may emerge only under some conditions. We find that a scenario where the (marginal) cost of rewarding equals the (marginal) benefit from being rewarded, as in the case of monetary transfers, would not work. This is consistent with the observation that, in social interactions, cash rewards are seldom used—in fact, there is almost a taboo against them.
The next step in the analysis is to compare punishments and rewards by constructing a measure of robustness—would a punishment equilibrium be immune to people switching to costly rewards for signaling, and would a reward equilibrium be immune to people starting to use punishment instead of reward? We show that a reward-based equilibrium can generally be “invaded” by punishment, but not vice-versa. Although the reward equilibrium generates more welfare than the punishment equilibrium, it is also more fragile, since punishment is a more compelling signaling method than reward. Intuitively, this is because the cost of punishing decreases in the share of norm compliers—since there are less people to punish. Hence, by using punishment instead of, say, reward, a parent can credibly signal that the norm is sufficiently widespread, while at the same time incurring low signaling costs. By contrast, the cost of rewarding increases in the share of compliers—since there are more people to reward. This makes rewarding norm compliers a less compelling signaling device of the widespread norm than punishing norm breakers.

Related Literature

One possible explanation for costly punishment that has been offered by the literature is that people may have direct in-built preferences for punishing norm-deviators, which may outweigh the material costs involved in punishing.8 This explanation is especially plausible when the norms in question are the result of a joint process of evolution and acculturation, such as may for instance be the case for norms against incest. When this does not apply, however, the direct preference approach leaves some important questions unanswered, since it does not explain why people who exhibit a preference for engaging in costly punishment (and who thus suffer a material disadvantage compared to those who do not exhibit this preference) are not wiped out by evolutionary forces.
Another possible rationale that has been offered is that those who do not punish norm-breakers may themselves be seen as deviators and thus be punished—such as for instance in Akerlof (1976) [8]. 9 Using Axelrod’s (1986) [10] terminology, a norm for X typically requires the existence of a metanorm to sanction people who fail to do X, one to sanction people who fail to sanction those who fail to do X, and so on. The applicability of this explanation depends on the environment. Although norm breaking can often be observed directly (e.g., because an individual fails to dress/behave/speak in a manner that complies with the norm), the failure to punish norm-breakers—and to punish those who fail to punish norm breakers, etc.—are often much harder to identify. 10 It is thus plausible that these metanorms may be sustainable in small communities, where monitoring is easy, but not in larger environments, where it is harder.
The central theme of our analysis is that norms (or, more precisely, metanorms) may be the object of signaling. The notion that signaling-based explanations may help clarify apparent “behavioral puzzles” has been presented elsewhere, such as in Glazer and Konrad (1996), Ellingsen and Johannesson (2008) and Hopkins (2014). Sliwka (2006) and Gneezy and Rustichini (2000) [12,13,14,15,16] explicitly consider norm-signaling. Differently from those works, in our setup parental behavior informs children about the behavior they are likely to encounter in the future in the future. This shares similarities with Adriani and Sonderegger (2018), Adriani et al. (2018) and Kotsidis (2018) [17,18,19]. However, this paper studies the mechanics and the signaling value of punishment versus reward, and is thus very different in focus.
Our work also adds to the literature on intergenerational transmission of information and (self-)signaling. Bénabou and Tirole (2011) [20] consider an anticipatory-utility setup where punishment of deviant behavior helps the individual shield himself from negative anticipatory feelings. Other relevant works include Bėnabou and Tirole (2002, 2004 and 2006) [21,22,23] and Dessí (2009) [24]. These papers emphasize how anticipatory feelings or altruism towards future generations may generate selective memory. Bisin and Verdier (2000) [25], Corneo and Jeanne (2009, 2010) [26,27], Cervellati and Vanin (2013) [28], Carvalho (2013) [29] and Verdier and Zenou (2018) [30] are also related to the present paper. However, these works focus on intergenerational transmission of values or cultural traits (rather than information). Also related is Van der Weele (2012) [31], who studies the signaling role of sanctions when the authorities have private information about the fraction of egoists in society. An important difference with our analysis is that, in our setup, strategic complementarities among parents play a central role, while in his model these complementarities are entirely absent since punishment is fully centralized.
The effectiveness of “stick” versus “carrot” as incentive devices and their interaction are clearly important for understanding norm-following, and have been extensively studied by the experimental literature. 11 However, the theoretical literature has largely remained silent on the subject. The exception is Herold (2012) [33], who proposes a model of evolution of preferences that explicitly compares preferences for punishing with preferences for rewarding. 12 The rationale why preferences for punishing eventually crowd out preferences for rewarding is that, when cooperation is sufficiently widespread, punishing non-cooperators becomes cheaper than rewarding cooperators. Our analysis supplements this intuition with a new, information-based effect. We show that, to successfully “invade”, a signal (punish/reward) should not only be cheap, but it should also be credible. Punishment is a credible signal that the norm is sufficiently widespread because it naturally satisfies incentive compatibility: It is cheaper when the norm is widespread than when it is rare. By contrast, reward follows the opposite path. 13
Finally, our work is related to the literature that studies the interplay between history and norms—the most relevant recent examples for our purposes include Acemoglu and Jackson (2014), Rohner et al., (2013) and Bidner and Francois (2013) [34,35,36]. 14 A central theme of this literature is that the history of past play leaves a lasting legacy, by affecting behavior for a long time afterwards. This is also the case in our model. However, rather than it being a case of past history revealing information about the players’ types, in our model the history of play itself is the object of signaling. By perpetuating the past, people can send information about past history (and, thus, indirectly, about the future, since in equilibrium the past shapes the future).
Similar to us, Acemoglu and Jackson (2014) [34] also assume that players cannot fully observe the history of past play. However, their paper has a rather different focus, since it does not study punishment.
The remainder of the paper is organized as follows. Section 2 introduces our key idea within a very simple setup. Section 3 describes the model, while Section 4 characterizes the punishment equilibrium. Section 5 considers the case where individuals use costly rewards to signal the dominant norm to their child. Section 6 investigates the robustness of the punishment equilibrium to the use of rewards, and the robustness of the reward equilibrium to punishment. Finally, Section 7 provides discussions and offers some final thoughts.

2. A Very Simple Example

Consider the following sequential game between a parent and his child. The parent moves first, and has two actions: Punish past norm-breakers, and Not Punish. Punishing norm-breakers is costly for the parent, while not punishing is not. The child moves second, after having observed the parent’s move. He also has two actions: Follow (some unspecified costly social norm) and Not Follow. The parent possesses private information about the consequences that the child will face if he does not follow the norm. For a start, we take these consequences as being exogenously determined. There are two states of the world: (i) punishment state: if the child breaks the norm, he is punished by a third party and incurs a punishment cost; (ii) no punishment state: if the child breaks the norm, he incurs no punishment. The cost for the child from following the norm is positive but smaller than the cost he incurs if he is punished. Hence, if the child knows that norm-breaking will be punished by the third party, he finds it optimal to follow the norm. However, the child does not know the state of the world, and looks at the parent’s behavior to gain information about it. The parent is altruistic towards the child. We now argue that the following may emerge in equilibrium:
  • The child follows the norm if he observes the parent punishing, breaks the norm otherwise.
  • The parent punishes in state (i) (when the third party punishes), does not punish in state (ii) (when the third party does not punish).
It is clear that, given the parent’s strategy, the child’s strategy prescribes optimal behavior. Consider now the parent. Taking the child’s strategy as given, if the parent does not punish in state (i), then the child will break the norm and be punished. By contrast, if the parent punishes past norm breakers the child will infer that the state is (i) and will therefore follow the norm (and thus avoid being punished). For parameter values, it is clear that the parent may prefer the latter to the former, and would thus find it optimal to punish in state (i), even though punishing is a costly endeavor. 15
In the equilibrium described above, the parent punishes past norm-breakers to signal to his child that, if he breaks the norm, he will be punished. Hence, punishment is motivated by signaling concerns. A limiting feature of this example is that the consequences the child will face if he breaks the norm are exogenous, rather than being derived as an equilibrium feature. However, it is possible to augment the model to allow for that. Suppose that, in each period t, a different parent-child pair plays the game described above. If the child breaks the norm, he may or may not be punished in period t + 1 (by the parent of the t + 1 parent-child pair). Payoffs are the same as before, except that the action of the “third party” is no longer exogenous. The third party is in fact the t + 1 parent, who faces a problem analogous to that of the t-period parent. Following the reasoning above, under some parameter restrictions the t-period parent will then choose to punish if he believes that the t + 1 parent is sufficiently likely to punish. Similarly, the t + 1 parent will choose to punish if he believes that the t + 2 parent is sufficiently likely to punish, and so on.
This simple example describes in a very crude way the type of story we have in mind. However, there are several issues that the example does not address.
(1)
What is the exact nature of the information asymmetry between parents and children?
(2)
Why should the parent use punishment as a way to communicate to his child? Would norm-following by the parent not be enough? Even more fundamentally, why can the parent not just “tell” the child what to do?
Our fully blown model addresses these issues. As we will see, a specific feature of the setup will allow us to address concern (2), namely, the presence of a paternalistic element in the parent’s motives. Furthermore, it is quite clear that the logic we have highlighted would equally work if instead of focusing on punishment, we were to consider a reward-based system. The question then is: are there any differences between these two mechanisms? Our full-blown setup will allow us to meaningfully compare the signaling value of punishments and rewards for information-transmission across generations.

3. Model

We start off by constructing a model aimed at formalizing the intuition sketched in Section 2. As we will argue in Section 5, this setup can also be employed to study the case where signaling operates through rewards, rather than punishment.
Overlapping Generations We consider an environment populated by overlapping generations. Each period t is characterized by a continuum of active parent-child pairs. Period t parents select to follow or not the norm, and to punish or not those individuals who broke the norm in the previous period. Period t children observe their parent’s behavior and decide whether to follow the norm. At t + 1 , t parents and t children become purely passive and may be punished if they broke the norm in the previous period. Just before t + 2 , t parents die. The t children observe the state of the world and recent history (more on this below) and become adults, i.e., the t + 2 parents, with one offspring each.
The following table summarizes our notation for the players’ actions.
n = 0 : Youngster breaks the norm ; n = 1 : Youngster follows the norm . N = 0 : Adult breaks the norm ; N = 1 : Adult follows the norm . M = 0 : Adult fails to punish ; M = 1 : Adult punishes ,
where the letter “M” is a mnemonic for “metanorm”.
Timing Timing is illustrated in the diagram below (Figure 1).
Note that if an individual breaks the norm at t he faces punishment at time t + 1 . Intuitively, this reflects the notion that punishment often takes the form of being confronted with ostracism or bias in future interactions. For instance, someone who gets drunk and behaves obnoxiously at a party will probably find his party invitations greatly reduced in the future. Similarly, someone who fails to respect religious practices may find himself socially shunned in future exchanges with members of his religious community, and so on. This lag in punishment implies that period t adults punish people who broke the norm at time t 1 . In turn, the children who followed/violated the norm at t 1 (and who are punished at t) become adults (and potential punishers) at t + 1 .
Cost of punishing An individual who punishes norm-breakers incurs a cost. A natural benchmark is the case where punishment takes the form of ostracism, in which case the cost incurred arises from the missed opportunity of forming an economically beneficial link. Moreover, as argued by Elster (1989) [38], expressing disapproval is always costly—at the very least, it requires energy and attention that could have been used for other purposes. One might also alienate or provoke the target individual, at some cost or risk to oneself. Letting the mass of norm-breakers be denoted as b 0 , the total cost of punishing is given by b θ + ε , where θ > 0 . Punishment is more expensive the greater the mass of people who must be punished, but also involves a fixed cost, ε which we assume is strictly positive and possibly very small. One way to think about this small fixed cost is as a shortcut to reflect the presence of a small share of “behavioral” agents who always violate the norm. 16 As a result, a parent who chooses to punish will always incur a positive cost, even if b = 0 . This makes it clear that our results do not rely on the cost of punishing being exactly zero in any state of the world.
Following the norm is costly and involves a positive cost. In what follows, when making welfare statements, we will focus on the benchmark case in which the behavior prescribed by the norm generates no externalities, although we will discuss how the presence of positive/negative externalities may affect our results when appropriate.
Availability of the punishment option We allow for the possibility that, for some parents, punishment may actually be unfeasible. Accordingly, we assume that there are two states of the world.
with probability γ , the state of the world is l: the punishment option is available for all parents.
with probability 1 γ the state of the world is h: the punishment option is unavailable to a share q ( 0 , 1 ) of parents and is available for the remaining 1 q share.
For simplicity we assume that the state of the world is the same in all periods. Parents observe the state of the world, while children do not. A natural benchmark is the case where γ is arbitrarily close (but not equal to) to 1, implying that state l is “almost certain”.
Adults We introduce the following additional notation:
m t is the mass of punishers in any given period t , and Θ is the ( marginal ) cos t of being punished .
The material payoff of a t–period adult is summarized below.
Follow the norm ( N i = 1 ) Do not follow the norm ( N i = 0 ) Punish ( M i = 1 ) c θ b t 1 ε θ b t 1 Θ m t + 1 ε Do not punish ( M i = 0 ) c Θ m t + 1
The discount factor is implicitly set equal to one. Throughout the analysis, we assume that Θ c > ε . Note that Θ > c is actually a necessary requirement for punishment to be an effective incentive device for norm-following. The adult’s direct material payoff can equivalently be expressed as
u i ( b t 1 , m t + 1 , N i , M i ) = c N i θ b t 1 + ε M i Θ m t + 1 ( 1 N i ) .
Note that the direct payoff (1) provides an incomplete description of individual welfare, since it omits the utility that an individual derives from altruistic concerns towards his child. A full description of individual welfare is provided below, in (3).
Information Before selecting whether to follow the norm and punish, adults observe m t 1 and b t 1 , namely the mass of adults (from the previous generation) who punished norm deviators in the previous period, and the mass of norm-breakers in the previous period. They also observe the state of the world (l or h) and, in state h, they observe whether, for them, the punishing option is available or unavailable. The realizations of m and b in periods before the last cannot be observed.
Children Each adult i is endowed with one child, denoted by i ¯ . Children do not observe m t 1 , b t 1 , the state of the world (l or h), or whether the punishment option is/is not available to their parents. However, each youngster perfectly observes his parent’s actions ( N i , M i ). After observing his parent’s actions, he chooses n i ¯ { 0 , 1 } (violate the norm/follow the norm). The cost of following the norm for children is c, the same as an adult’s.
If a youngster does not follow the norm, he becomes the object of social punishment. 17 We assume that the stigma associated with norm-breaking does not carry through adulthood. Hence, as adults, all individuals start off with a clean slate, independently of how they behaved as youngsters. 18
The payoff of a t–period child is as follows:
Follow the norm ( n i ¯ = 1 ) : c Do not follow the norm ( n i ¯ = 0 ) : δ Θ m t + 1
where δ < 1 indicates the child’s discount factor. This can equivalently be expressed as
u ¯ i ¯ ( m t + 1 , n i ¯ ) = n i ¯ c δ Θ m t + 1 ( 1 n i ¯ ) .
Note that we assume that children discount the future more heavily than adults. 19 This reflects the idea that age affects preferences and choice. Robson and Samuelson (2007, 2009) [40,41] study the evolution of discount rates and find that, under some natural conditions, these should fall with age.
Adults’ Utility Adults are altruistic towards their children, but this altruism is “impure.” More specifically, we assume that parents evaluate the child’s payoff using their own (higher) discount factor—although they are aware that this differs from the child’s discount factor. This generates what Doepke and Zilibotti (2014) [42] call a paternalistic element in the parents’ motives, which is standard in models of parent-child interactions. 20 The total utility of an adult i is thus given by
U i ( b t 1 , m t + 1 , N i , M i , n i ¯ ) u i + u ¯ i ¯ Θ m t + 1 ( 1 n i ¯ ) 1 δ .
As will become clear below, parental paternalism introduces a conflict of interests between parents and children, which motivates the use of costly signaling for information transmission. Intuitively, if the interests of parents and children were perfectly aligned then parents could just “tell” children what to do (and children would always find it optimal to follow what their parents tell them). The existence of a conflict of interests eliminates this possibility, since parents have an incentive misrepresent their information. The use of costly signaling is then necessary to address this credibility problem.

4. Punishment Equilibrium

We concentrate on Perfect Bayesian Equilibria. The equilibrium concept in our setup thus satisfies the following requirements.
(1)
Children update their beliefs from parent’s action using Bayes rule whenever possible.
(2)
At each information set, each player’s strategy specifies optimal actions, given his beliefs and the strategies of the other players
In what follows, we concentrate on pure strategy symmetric equilibria. We also restrict attention to equilibria that are stationary, in the sense that all individuals follow strategies that independent of calendar time. Finally, to simplify the analysis, we will restrict attention to equilibria in which:
Refinement 1.
Children who observe their parent violating the norm never follow the norm.
Imposing this refinement allows to rule out counterintuitive scenarios at the outset. Note however that the equilibrium we characterize would continue to hold even if Refinement 1 was lifted. Moreover, its key property, namely that those who punish do so to signal that norm-breakers are likely to be punished, would arise more generally also in other possible equilibria. However, a full characterization would be lengthy and, we believe, not very illuminating.
The crucial feature of our setup is that past history of play cannot be perfectly observed. Agents active at t have a common prior over history up to that period. We impose a “grain of truth” restriction on the prior. In particular, we assume that prior beliefs at t assign a strictly positive probability to the actual (real) realization of history up to that period. Adults also obtain some direct evidence of past history, since they observe m t 1 , b t 1 , namely how much norm breaking and how much punishing there was in the previous period. A strategy for parent i maps m t 1 , b t 1 and the state of the world (l or h) into N i , M i . In contrast, children have no direct information about past history, except through their parents’ actions. A strategy for child i ¯ maps N i , M i into n i ¯ . From (2), the child’s optimal action at t depends on his expectation of m t + 1 , namely the share of punishers at t + 1 . The child will decide to follow the norm if and only if 21
δ Θ E m t + 1 N i , M i > c .
Our first lemma establishes a benchmark result.
Lemma 1.
The game has a history-independent equilibrium where nobody follows the norm, and nobody punishes.
Proof. 
Clearly enough, if nobody punishes norm-followers, then norm-following and punishing are dominated actions. We now consider a more interesting scenario, where norm-following and punishing emerge in equilibrium. By Refinement 1, we are restricting attention to environments where the posterior beliefs about m t + 1 of a child who has observed his parent breaking the norm induce him to break the norm too. It remains to address the child’s posterior beliefs when he observes his parent following the norm. □
Lemma 2.
In an equilibrium with punishment, the following must hold. A child’s expectation of m t + 1 when he has observed his parent both following the norm and punishing at t must exceed that of a child who has observed his parent following the norm but not punishing.
Proof. 
For punishment to emerge it must be that, by observing his parent punishing, the child gains useful information about the likelihood of being met with future punishment if he breaks the norm. However, note that punishment conveys information about m t 1 , b t 1 , the parent’s private information. This raises the question: how can information about the past be useful to predict the future? For this to occur, there must be a link between the past and the future, so that, by gaining information about the past from the actions of their parents, children are able to make inferences about the future. In other words, past history must have a bearing in shaping the future.
The next result says that the information conveyed by parental punishment must actually be determinant for the child’s choice. □
Lemma 3.
(Strategy A.) In an equilibrium with punishment, the following must hold. A child who has observed his parent both following the norm and punishing will follow the norm. A child who has observed his parent following the norm but not punishing will break the norm.
Proof. 
Intuitively, if a parent could induce his child to follow the norm by simply following the norm himself, he would never select to punish, as this involves unnecessary punishing costs. Since all parents would follow this reasoning, punishment would disappear. This rules out that, in a punishment equilibrium, children may decide to follow the norm whenever they observe their parent following the norm, with no concern for punishing behavior.
Suppose that Refinement 1 holds, and that children follow strategy A (described in Lemma 3). We now compute the payoffs that a parent active at time t may obtain from each of the available action-pairs. First, the payoff from following the norm and punishing—namely, N i , M i = ( 1 , 1 ) —is given by
U t ( 1 , 1 m t 1 , b t 1 ) = 2 c θ b t 1 ε .
This follows since a parent selecting ( 1 , 1 ) will induce his child to follow the norm. Note that history has a direct bearing on payoffs, and is thus more than a simple correlation device. This is because the cost of punishing is higher the higher the share of past norm breakers.
Second, the payoff from N i , M i = ( 1 , 0 ) —namely, following the norm and failing to punish—is
U t ( 1 , 0 m t 1 , b t 1 ) = c Θ m t + 1 ( m t 1 , b t 1 )
since in that case the child will not follow the norm and will thus be punished by all those adults at t + 1 who choose to punish. The expression m t + 1 ( m t 1 , b t 1 ) gives the value of m t + 1 conditional on the information at the parents’ disposal, namely m t 1 and b t 1 . 22
Third, the payoff from ( 0 , 0 ) —namely, breaking the norm and failing to punish—is
U t ( 0 , 0 m t 1 , b t 1 ) = 2 Θ m t + 1 ( m t 1 , b t 1 )
since in that case both the parent and the child will be punished at t + 1 .
Finally, the payoff from N i , M i = ( 0 , 1 ) —namely, breaking the norm and punishing—is
U t ( 0 , 1 m t 1 , b t 1 ) = θ b t 1 ε 2 Θ m t + 1 ( m t 1 , b t 1 ) .
Children We are now able to describe more precisely how children process the information conveyed by parental actions. Children start with a prior about the history of past play. Upon observing their parent’s behavior ( N i , M i ) , they combine their prior information with the information conveyed by parental behavior to form posterior beliefs about the pair ( m t 1 , b t 1 ) . Since parental strategies depend on recent history, this posterior induces a probability distribution over ( m t , b t ) , which in turn determines a probability distribution over m t + 1 , which is the variable of interests to children—since it determines the payoff from norm-breaking. Intuitively, children look at parental behavior to gain a clue about recent history. This matters, since recent history affects adult behavior in the following periods. □

Steady States

We concentrate on equilibria where economy is in a steady state: The optimal reply by parents to m t 1 = m , b t 1 = b generates m t = m , b t = b . To characterize the possible steady states, we need to check for incentive compatibility. For instance, for m = 1 , b = 0 to be a possible steady state, we need (5) to exceed (6)–(8) whenever we impose the steady state conditions b t 1 = b = 0 and m t + 1 = m = 1 . The following lemma characterizes the possible steady states that may emerge.
Lemma 4.
Suppose that children follow strategy A described in Lemma 3. The possible steady states are:
(i) 
High-punishment: All parents follow the norm and (if they can) punish. As a result, in state l: m = 1 , b = 0 , while in state h: m = 1 q , b = q .
(ii) 
No-punishment: m = 0 , b = 2 . Nobody punishes, nobody follows the norm.
Proof. 
Lemma 4 shows that there are two possible steady states: a no-punishment, essentially equivalent to the history-independent equilibrium, and a high-punishment, where all parents follow the norm and punish norm-breakers (if they can). This multiplicity arises because parent-child signaling generates strong complementarities among parents: People punish norm-breakers because this is what others in society do, and vice-versa. [Further discussion of the mechanism that underpins punishment is provided below, when discussing Proposition 1.]
It is important to note that, as mentioned in the proposition, the actual share of punishers and norm-breakers in steady state may actually take three different values. More specifically:
If the steady state is high-punishment and the state of the world is l: m = 1 , b = 0 . We denote this scenario as regime H.
If the steady state is high-punishment and the state of the world is h: m = 1 q , b = q . We denote this scenario as regime I (for intermediate).
If the steady state is no-punishment: m = 0 , b = 2 . We denote this scenario as regime L.
We now turn to children’s expectations. As a first approximation, it is useful to assume that children have rational expectation, and thus know that the economy is in a steady state (although they do not know which one). Prior beliefs thus assign positive probability to at most three possible regimes. We let the children’s prior assigns a probability p H to regime H, probability p I to regime I and probability and p L = 1 p I p H to regime L. We assume that both p H and p I are strictly greater than zero. The children’s posterior beliefs are then well defined and can be computed from Bayesian updating:
π H 1 , 1 = p H p H + ( 1 q ) p I ; π I 1 , 1 = ( 1 q ) p I p H + ( 1 q ) p I ; π I 1 , 0 = 1 ; π H 1 , 0 = 0 .
To refine the child’s out-of-equilibrium beliefs, we apply Cho and Kreps (1987) [44] Intuitive Criterion: Following an out-of-equilibrium move by the parent, a child assigns zero probability to the deviation emanating from a regime in which the deviation is equilibrium dominated. □
Proposition 1.
Suppose that both p H and p I are strictly greater than zero and that
min { Θ ( 1 q ) ε θ q , p H + ( 1 q ) 2 p I p H + ( 1 q ) p I δ Θ } > c δ Θ ( 1 q ) .
Then punishment may emerge in equilibrium. In the refined steady state equilibrium with punishment, the following holds.
(i) 
Children follow the norm if and only if they observe their parent punishing (Strategy A).
(ii) 
Adults follow the norm and punish (if they can).
Proof. 
The remainder of the proof can be found in Appendix A.
In what follows, we will use the term “punishment equilibrium” to indicate the steady state equilibrium that features punishment. Proposition 1 highlights the role of history in sustaining norms and punishment meta-norms. People look at recent shared history to form their expectations of the future. Adults active at t who have been exposed to a recent history containing punishment have an incentive to punish because they know that other t adults will punish, since they have observed the same shared history as they have. The behavior of other t adults matters because, in the aggregate, it shapes history at t + 1 . If punishment occurs at t, recent history in t + 1 will feature punishment. In turn, this will induce people in t + 1 to punish as well (since they correctly expect that their shared past will be projected into the future). In sum, people punish norm-breakers to pass information about the past to the younger generation. This creates a link between the past and the future which ensures that information about the past is important (since the past shapes the future).
In our analysis, the link between past, present and future is crucial to sustain norms. If this link were to break up (as e.g., in the case of a major disruption such as a war) this might start a process of unraveling that may work very quickly. This remark is also consistent with the observation that, sometimes, seemingly irrelevant events may have rapid and lasting consequences for norms (Bicchieri and Muldoon 2014 [6]). Note that this role of history and continuity is entirely absent from other accounts, such as e.g., preference-based theories of punishment. Indeed, in their survey article, Bicchieri and Muldoon [6] conclude that 23
Studies as disparate as the analysis of Prohibition support, racial integration [and] the sexual revolution in the 1960s (...) all lend credibility to a model of norms grounded on individuals’ (...) expectations of what others will do (...).
Finally, note that, although we do not explore this explicitly, norm-following and punishment may survive only in sufficiently homogeneous environments, where parents share a similar understanding of the prevailing norm and metanorm. Intuitively, that is required to ensure that parental behavior is informative about the behavior of other people in society. This shares similarities with Adriani and Sonderegger (2009) [45] and suggests a possible rationale for favoring homogeneous societies, namely that they may make punishment-sustained pro-social norms easier to maintain. 24
In the benchmark case where γ , the probability that the state of the world is l, is arbitrarily close to (although strictly smaller than) one, p I is arbitrarily small and, thus:
π H 1 , 1 1 and π I 1 , 1 0
As a result, condition (a) simplifies to
min { Θ ( 1 q ) ε θ q , δ Θ } > c δ Θ ( 1 q ) .
This has the advantage of being independent of prior beliefs (which are very difficult to measure). Note that, although children know that state h is extremely unlikely, upon observing ( 1 , 0 ) they will necessarily conclude that society finds itself in regime I, since this is the only scenario where ( 1 , 0 ) may emerge as an equilibrium move.
We now turn to welfare. From Lemma 1, the game supports a history-independent equilibrium where nobody follows the norm and nobody punishes in any period, and where all players obtain a payoff of zero. By contrast, the punishment equilibrium supports payoffs that are below that amount. In fact, if we consider the benchmark case where norm-following generates no direct externalities (and is thus purely wasteful), we can show that, □
Proposition 2.
In any punishment equilibrium material welfare is lower than in the history-independent equilibrium.
Proof. 
Intuitively, this follows because punishment is a very wasteful activity, since it imposes costs on both the punisher and the individual being punished. Clearly enough, however, this observation may be reversed if we were to consider norms that generate positive externalities (such as norms of public good provision). Proposition 2 highlights that positive externalities are a necessary (although not sufficient) condition for a regime featuring punishment to generate higher welfare than the history independent equilibrium. As we will see below in Proposition 4, this result stands in contrast with what happens when reward rather than punishment is used as a signaling device.
We now turn to the conditions that underpin the punishment equilibrium. First, the condition p H + ( 1 q ) 2 p I p H + ( 1 q ) p I δ Θ > c implies that, upon observing his parent following the norm and punishing, a child believes that regime H is sufficiently likely to induce him to follow the norm. Although children fear punishment less than parents, they nonetheless find it optimal to follow the norm if they believe that punishment is sufficiently widespread in society. Consider now a child who observes his parent following the norm but not punishing—in which case the child will infer that society is in regime I. Condition c δ Θ ( 1 q ) ensures that, in that case, the child will not find it worthwhile to follow the norm. Third, the condition Θ ( 1 q ) ε θ q > c ensures that parents who can punish actually choose to punish in regime I (as well as regime H), to induce their children to follow the norm. This creates a wedge between the child’s and the parents’ views. Intuitively, Θ ( 1 q ) ε θ q > c implies Θ ( 1 q ) > c . Hence, in the I regime, parents would like their children to follow the norm, since the (parental perception of the) benefit this involves in terms of spared punishment outweighs the cost of norm-following. However, as we have seen, if the child knew that the regime is I he would not follow the norm. The following corollary summarizes the nature of the conflict of interests between parents and children in the punishment equilibrium. □
Corollary 1.
In the punishment equilibrium, the conflict of interests between parents and children takes the following form: Parents would like their children to follow the norm in both the I and the H regimes. By contrast, children (if they had perfect information) would like to follow the norm only in the H regime.
Note that for the conflict of interests to arise, the child’s discount factor (i.e., δ ) should be sufficiently smaller than the parent’s discount factor (i.e., one). If δ = 1 , condition (a) fails and punishment does not emerge.
Robustness to direct communication Corollary 1 helps us address one of the questions we presented in Section 2, namely, why can the parent not just “tell” the child what to do? Suppose that, instead of punishing norm deviators, a parent may just send a message (involving a positive but arbitrarily small cost) to his child, urging him to follow the norm. 25 Formally, let M { 0 , 1 , 2 } , where M = 0 corresponds to “not punish and not send the message”, M = 1 corresponds to “not punish but send the message.”, and M = 2 corresponds to “punish and do not send the message.” Starting from an equilibrium where communication is not used (i.e., nobody selects M = 1 ), would a parent have an incentive to deviate and use direct communication instead of punishment? 26 In the L regime, sending the message is equilibrium-dominated for the parent. Hence, if we refine out-of-equilibrium beliefs using a standard refinement such as Cho and Kreps’s Intuitive Criterion, the child should rule this regime out. By the same token, it is clear that the parent might have an incentive to send the message both in the H and the I regimes, if he thought the message sufficiently likely to succeed. Note however that the parent’s net return from the deviation in regime I exceeds that in regime H. 27 This follows since the gross payoff from the deviation is the same in both states, while the parent’s equilibrium payoff is actually higher in regime H, implying that the deviation is less appealing. There are thus three cases that may arise. The deviation is either (a) equilibrium-dominated in both regimes, or (b) profitable in both regimes, or (c) it is profitable in the I regime and it is equilibrium-dominated in the H regime. In case (a) the Intuitive Criterion does not pin down the child’s posterior beliefs following the deviation. In case (b), the Intuitive Criterion establishes that, following the deviation, the child may assign positive probability to I and/or H. In case (c), the Intuitive Criterion establishes that, following the deviation, the child should assign probability one to the I regime. Overall, we conclude that, if the child believes that the out-of-equilibrium move originated from regime I with probability one, this is consistent with the Intuitive Criterion. The child’s best reply is then n 1 , 2 = 0 , which of course makes the deviation suboptimal for the parent. The punishment equilibrium is thus robust to direct communication. Note that this argument can be seen as a special case of the analysis provided in Section 6.1 (this is formalized in the Proof of Proposition 6).
Comparative statics Proposition 1 highlights interesting comparative statics. The second inequality in condition (a) requires
c δ Θ ( 1 q ) .
In other words, the cost of following the norm cannot be too low. This may at first appear strange. If a norm is cheaper than another, how could it ever be harder to sustain? The intuition is subtle. If the cost of following the norm is very low, this would eliminate the conflict of interest between adults and youngsters and would thus render signaling redundant. More specifically, if the cost of norm following is very low, then the child would find it optimal to follow the norm even society is in the I regime. By simply following the norm, a parent could then signal that the regime is I, and thus induce the child to follow the norm with no need for additional signaling. This would trigger a process of unraveling which would destroy the use of punishment (and norm-following) in equilibrium.
Note that the RHS of (12) is an increasing function of Θ , the cost of being punished. This implies that a higher cost of being punished raises the lower bound on c that needs to be met for punishment to emerge in equilibrium. The reason is again linked with the necessity of a conflict of interests between parents and children.
These comparative statics set our model apart from other possible explanations for punishment, such as the existence of direct preferences for punishing (which actually delivers no comparative statics at all), or the idea that people may punish to avoid being punished themselves.

5. Reward Equilibrium

In this section, we construct a setup where signaling operates through an alternative mechanism that has been presented by the literature as relevant to sustain norms (e.g., Herold 2012 [33]), namely rewards. At first glance, a full-blown analysis of this case may appear redundant. After all, one may see the failure to punish someone as a “reward”, and the failure to reward someone as a “punishment”. This intuition suggests that there may be a one-to-one correspondence between the two cases. However, this analogy is misleading. To see this, suppose that rewards are costly to give—if they were not, their use to incentivize norm-following would be trivially sustained. In this case, then the act of punishing someone by failing to reward them would actually make the punisher better off, since it would save him the cost of rewarding. 28 This shows that the analogy between punishing and rewarding fails if we are interested in comparing the use of costly rewards and costly punishments as possible incentive devices for norm-following.

5.1. Setup

The setup mirrors that of Section 3 in most respects. Let M i { 0 , 1 } indicate whether an individual follows the metanorm of rewarding ( M i = 1 ) or failing to reward ( M i = 0 ) norm-followers. Similar to existing literature (Herold 2012) we model the act of rewarding as zero-one. The payoff from ( N i , M i ) for a t parent is equal to
u i ( b t 1 , m t + 1 , N i , M i ) = R m t + 1 c N i r M i 2 b t 1
where 2 b t 1 is the mass of t 1 parents and children who followed the norm at t 1 (and who are being rewarded at t) and m t + 1 is the mass of t + 1 parents who reward norm followers. Note that if an individual follows the norm at t, he is rewarded at time t + 1 . Intuitively, the type of rewards we have in mind take the form of receiving costly favors at some point in the future. Examples include preferential treatment when looking for a job or asking for a loan, or more generally receiving help when the need arises. This lag implies that time t adults reward people who followed the norm at time t 1 . In turn, the children who followed/violated the norm at t 1 (and who are rewarded at t) become adults (and potential rewarders) at t + 1 . The marginal benefit from being rewarded for following the norm is R. We assume that R > c to ensure that rewards can be used as an effective incentive for norm-following. The parameter r > 0 captures the marginal cost of rewarding an additional individual.
Availability of the rewarding option Similar to the case of punishing, we allow for the possibility that, for some parents, rewarding may be unfeasible or prohibitively costly. There are two states of the world.
with probability γ , the state of the world is l: the rewarding option is available for all parents.
with probability 1 γ the state of the world is h: the rewarding option is unavailable to a share q ( 0 , 1 ) of parents.
Children For children, the payoff from n i ¯ is given by
u ¯ i ¯ ( m t + 1 , n i ¯ ) = δ R m t + 1 c n i ¯
where δ < 1 is the child’s discount factor (as in Section 3). We thus maintain the assumption that children discount the future more heavily than parents. Parents are altruistic but exhibit imperfect empathy. The total utility of an adult i is thus given by
U i ( b t 1 , m t + 1 , N i , M i , n i ¯ ) u i + u ¯ i ¯ + n i ¯ m t + 1 R 1 δ .
Timing The timing of events is as follows (Figure 2).

5.2. Signaling

Similar to Section 4, we restrict attention to environments where Refinement 1 holds. Similar to the case of punishment, it is easy to show that, in steady state, there are three possible regimes:
H regime: all parents follow the norm and reward norm-followers.
I regime: all parents follow the norm but only a share 1 q rewards since the remaining share q faces prohibitive costs of rewarding
L regime: nobody rewards and nobody follows the norm.
The following proposition identifies the conditions for reward to feature in a refined steady state equilibrium.
Proposition 3.
Suppose that p H and p I are strictly greater than zero and
min { R 2 r , R ( 1 q ) r ( 2 q ) , p H + ( 1 q ) 2 p I p H + ( 1 q ) p I δ R } > c > δ R ( 1 q )
Then rewarding behavior may emerge in equilibrium. In the refined steady state equilibrium with reward, the following holds.
(i) 
Children follow the norm if and only if they observe their parent rewarding.
(ii) 
Adults follow the norm and reward (if they can). If condition ( b ) fails, then the unique refined steady state equilibrium is one where nobody follows the norm and nobody rewards.
Proof. 
In what follows, we will use the term “reward equilibrium” to indicate the steady state equilibrium that features reward. 29
Condition (b) shares many similarities with condition (a). For instance, it shows that, for a reward equilibrium to emerge, the cost of following the norm cannot be too low, similar to what we found in the case of punishment. Another noteworthy feature of condition (b) is the requirement R 2 r > c . Since c > 0 , this requirement implies R > r , i.e., the value of the reward for the person who receives it should exceed the cost for the rewarder. This arises from the parent’s incentive compatibility constraint in the H regime. If it did not hold, then the parent would always be better off by selecting (1,0), i.e., follow the norm but not reward, instead of (1,1), i.e., follow the norm and reward. Rewarding would therefore be a dominated strategy. 30 Intuitively, this is because in that case the costs of rewarding norm-followers would be too high to be worth incurring.
This has implications for the type of reward that may emerge. An arrangement where people reward by donating resources they have been abundantly endowed with and are rewarded by receiving goods they have been scarcely endowed with may fit this description. 31 By contrast, rewards that are costly for those bestowing them but have little value for the receivers would not work. The same would be true of an environment where the cost of rewarding equals the benefit from being rewarded, so that r = R , as for instance may be the case if rewards take the form of monetary transfers. This may help explain why, in social exchanges, people tend to reward others in kind rather than through money. 32 Differently from the punishment equilibrium, in the case of rewards. □
Lemma 5.
In any reward equilibrium, the adult’s material payoff (i.e., calculated by ignoring the child’s welfare) is always larger than zero.
Proof. 
The direct implication is that, even in the benchmark case where norm-following generates no direct externality (and is thus purely wasteful from a welfare viewpoint), the following holds.
Proposition 4.
In any reward equilibrium aggregate material welfare is higher than in the history-independent equilibrium.
Proof. 
Hence, in stark contrast with the punishment case, here any regime that features norm-following is necessarily welfare-superior to the regime where the norm is ignored (and no rewarding takes place). This happens despite the fact that the norm is actually socially wasteful. The intuition is that, as discussed, the net benefits created by the act of rewarding compensate for the welfare loss generated by costly norm-following. Clearly enough, Proposition 4 continues to hold if we allow for direct positive externalities of norm-following, but may cease to hold if we consider a norm which generates sufficiently strong negative direct externalities. □

6. Robustness

This section studies the robustness of the punishment and reward equilibria characterized in Propositions 1 and 3. More specifically, we wish to test whether punishment equilibria are robust to people starting to use rewards instead, and whether reward equilibria are robust to people starting to use punishment. We also wish to establish whether there are asymmetries, i.e., whether one signaling mechanism is somehow more robust than the other.

6.1. Robustness of Reward Equilibrium to Punishment

We start off by considering the robustness of the reward equilibrium to deviations involving punishment. Suppose that, instead of rewarding norm followers, a parent may select to punish norm deviators. Formally, let M { 0 , 1 , 2 } , where M = 0 corresponds to “not reward, not punish”, M = 1 corresponds to “reward, not punish”, and M = 2 corresponds to “not reward, punish.” We wish to ask the following question. Starting from a reward equilibrium (i.e., where no parent selects M = 2 ), and keeping the behavior of all other parents fixed, when would an individual find it optimal to (unilaterally) use punishment rather than reward as a signaling device? As in the previous sections, we apply the Intuitive Criterion. To evaluate whether a deviation may possibly be profitable for a parent, we consider the standard case of unilateral deviations. As in Section 3, we let the cost incurred by punishing be given by θ b + ε .
The first observation is that, in the L reward regime, deviating cannot possibly bring any benefits, since the parent does not actually want his child to follow the norm. By contrast, deviating may possibly be beneficial in the H and I reward regimes, since in those cases the parent does want the child to follow the norm (and ideally would like to achieve that by incurring the smallest possible signaling cost). Consider now a unilateral deviation that consists in following the norm and punishing norm-breakers. Suppose that the deviation induces the child to follow the norm. The key remark is that this deviation would deliver a higher payoff in the H reward regime than the I reward regime (since, in the latter, more people break the norm and should therefore be punished). Moreover, the difference between the two payoffs is increasing in θ , the marginal cost incurred by punishing. Compare now the payoff obtained from deviating with equilibrium payoffs. Clearly enough, if ε is small, a successful deviation would always be profitable in the H reward regime. Moreover, if θ is sufficiently high, we can ensure that the deviation in the I reward regime is equilibrium-dominated. In other words, we can always find suitable parameter values that would induce parents to deviate only in the high-reward state. The deviation is thus a credible signal that the regime is high-reward. Figure 3 illustrates the logic of the argument.
Proposition 5.
For each r > 0 there exist values θ and ε > 0 such that, if θ > θ and ε < ε , the reward equilibrium of the augmented game fails the Intuitive Criterion.
Proof. 
See Appendix A. □

6.2. Robustness of Punishment Equilibrium to Costly Rewards

We now consider the robustness of the punishment equilibrium to deviations involving costly rewards. Let M { 0 , 1 , 2 } , where M = 0 corresponds to “not punish, not reward”, M = 1 corresponds to “punish, not reward”, and M = 2 corresponds to “not punish, reward norm followers.” As in Section 5, we let he marginal cost incurred by rewarding norm followers be given by r.
Clearly enough, in the no-punishment regime, deviating is always dominated. By contrast, deviating may possibly be beneficial in the H and I punishment regimes (since in those regimes the parent actually wants the child to follow the norm). The second observation is that the payoff a parent may obtain by deviating and using rewards is always lower in the H punishment regime than in the I punishment regime. Intuitively, this is because in the H regime more people follow the norm and must therefore be rewarded. 33 Consider now the parent’s equilibrium payoff. It is straightforward to show that the equilibrium payoff in the H punishment regime is always higher than in the I punishment regime (since in the latter there are more people who violate the norm and must therefore be punished). This implies that, if a parent deviates in the H punishment regime, he will also find it optimal to deviate in the I punishment regime. Figure 4 illustrates graphically this logic.
The Intuitive Criterion is thus consistent with the belief that the deviation originated in the I punishment regime. As a result, the child will find it optimal to break the norm upon observing the deviation, thus rendering the deviation suboptimal. The punishment equilibrium thus satisfies the Intuitive Criterion. This is summarized in the next proposition.
Proposition 6.
In the augmented game, the punishment equilibrium satisfies the Intuitive Criterion for all r > 0 .
Proof. 
See Appendix A. □

6.3. Discussion

The intuition for the results in this section can be summarized as follows. Reward does not destabilize the punishment equilibrium because the payoff obtained when using reward out-of-equilibrium (i.e., when everyone else uses punishment) is decreasing in the share of norm-followers. Out-of-equilibrium reward is thus ill-suited to effectively signal that norm compliance is high. By contrast, the payoff obtained when using punishment is always increasing in the share of norm-followers. Punishment is thus a compelling tool for signaling high compliance.
It is important to highlight that the punishment equilibrium is robust to reward no matter how low reward costs may be. To see this, consider an extreme scenario, where reward costs are arbitrarily close to zero. In that case, a successful deviation (i.e., that induces the child to follow the norm) involving rewards always yields a higher payoff than the equilibrium move. However, the parent has an incentive to deviate both in the H- and the I-punishment regimes. In other words, a deviation involving rewards cannot credibly signal that the regime is high punishment. This is the rationale for the punishment equilibrium’s robustness. It is straightforward to see that a similar rationale also guarantees robustness in the case where the parent may deviate by selecting an alternative signal whose cost is independent of the share of norm-followers—e.g., because it involves a fixed cost ϕ , for any ϕ 0 .
Consider now a deviation from the reward equilibrium that involves punishment. If the fixed cost of punishing ( ε ) is sufficiently small, the deviation payoff exceeds the equilibrium payoff in the H-reward regime. However, as we have argued, this is only part of the argument. The additional crucial ingredient is credibility. The payoff from out-of-equilibrium punishment is higher in the H-reward regime than the I regime. For θ high enough, this payoff increases so steeply in the share of norm-followers that it cuts the equilibrium payoff from below, as in Figure 2, thus credibly signaling that the regime is H. In turn, this ensures that, upon observing this deviation, the child finds it optimal to follow the norm, thus rendering the deviation optimal. 34

7. Further Discussion and Concluding Remarks

In this section, we briefly provide further discussions of our results and their implications.

7.1. Which Norms?

Our mechanism continues to work even if we consider norms that do not impose a net individual cost to all, at least as far as adults are concerned. Consider for instance “good lifestyle” behaviors, in matters such as drinking, personal hygiene, sexual promiscuity, eating habits and so on. These behaviors often involve a trade-off, since they require effort, but they also generate benefits, both in the present and in the future (for instance, in the form of improved health). If youngsters discount the future more heavily than adults, they will value the delayed benefits less. Alternatively, adopting the good behavior may require greater effort for youngsters than for adults. It is then possible that, while for parents the net cost of following the good lifestyle norm may be negative (i.e., the benefits more than offset the costs), the opposite may hold for youngsters. Suppose then that we modify our setup as follows. The net cost for parents of following the norm is c p a r e n t s < 0 , while the net cost for youngsters is c c h i l d r e n > 0 . 35 Clearly enough, our results still apply. In this modified setup parents always want the child to follow the norm, even in the no punishment regime (since, in their eyes, the health benefits from following the norm more than justify its costs). The conflict of interests between parents and children is thus even stronger.

7.2. Infinitely Repeated Games

Many “folk-theorem related” models consider environments where deviations trigger a punishment phase. In the most commonly used framework, punishment consists in a permanent reversion to the static Nash equilibrium (see e.g., Friedman 1971 [49]). Clearly enough, in these models, the problem of sustaining punishment does not emerge (since the punishment phase constitutes a Nash equilibrium). Our work concentrates on environments where the existence of punishment cannot be justified on these grounds. Moreover, in models that use Nash reversion as punishment, equilibrium payoffs are always weakly higher than those of the static Nash equilibrium. This stands in contrast with our results where, as we have seen, in the punishment equilibrium people may end up worse off than they would be in the history-independent equilibrium. The history-independent equilibrium is the unique equilibrium that would emerge if the game were finite (and adults knew their position with respect to the last period). 36 The literature on repeated games has, to be sure, identified scenarios where the payoffs obtained in the repeated game may fall below those of the static Nash equilibrium. However, these typically involve so-called “carrot and stick” finite punishment cycles, where the punishment phase is followed by a reward phase (see, e.g., Abreu 1986 [50]). Even leaving all other differences aside, it is clear that our story is fundamentally different.

7.3. More General Parental Utility

In our setup, the parent’s utility is given by the sum of his own payoff and the child’ s payoff (evaluated using the parent’s discount factor). This is however a special case of a more general setup where the child’s payoff is assigned a weight β R > 0 in the parent’s utility, so that (7) becomes
U i ( b t 1 , m t + 1 , N i , M i , n i ¯ ) u i + β u ¯ i ¯ β Θ m t + 1 ( 1 n i ¯ ) 1 δ .
It is easy to verify that this would not affect our findings. In particular, a punishment equilibrium continues to exist, but the necessary and sufficient condition for this is slightly more involved. Condition (a) becomes
min { Θ ε β , β Θ ( 1 q ) ε θ q 1 + β p H + ( 1 q ) 2 p I p H + ( 1 q ) p I δ Θ } > c δ Θ ( 1 q ) .
which of course collapses to (a) when β = 1 . 37 A similar observation applies to the reward equilibrium. Condition (b) becomes
min { R 2 r β , R ( 1 q ) r ( 2 q ) β , p H + ( 1 q ) 2 p I p H + ( 1 q ) p I δ R } > c > δ R ( 1 q )
In fact, all our results (including those on robustness outlined in Section 6) would carry through for all values of β R > 0 , as well as the limiting case β —implying that the parent’s concern for the child’s payoff is infinitely higher than his concern for his own payoff. 38

7.4. Concluding Remarks

Recently, a literature on endogenously derived preferences has sprouted within economics. 39 This literature derives preferences from first principles or ultimate causes, and thus complements the behavioral literature, which focuses on proximate causes. 40 We believe this is an important agenda. One of the advantages of building models of ultimate causes of behavior is that it generates sharper predictions and comparative statics. We hope that this paper may contribute to this literature by showing that, in an environment with asymmetric information, costly punishment of norm-breakers may emerge optimally as a mean to transmit information to the next generation.
Finally, while the model assumes that the marginal cost of punishing is fixed, it is possible to think of environments where this does not hold. For instance, if punishment takes the form of social boycott, then the opportunity cost incurred (in terms of lost networking opportunities or sources of information) when socially excluding a norm-breaker may be lower if others also exclude him. By contrast, the opportunity cost incurred by refusing to, say, offer employment to a past norm-breaker will be higher if others practice the same policy (since in that case he would have lower outside options and would therefore be willing to work for less). This suggests that some forms of punishment (e.g., social boycott) may be more likely to be effective signaling devices than others (e.g., refusing employment). Future work might be devoted to refining this intuition.

Author Contributions

Formal analysis, F.A., S.S.; Writing—Original draft preparation, F.A., S.S.; Writing—Review and editing, F.A., S.S.

Funding

This research received no external funding.

Acknowledgments

We thank two anonymous referees as well as Dough Bernheim, Eddie Dekel, Eugenio Proto, Simon Gächter, Florian Herold, Nick Netzer, Massimo Morelli, Pietro Ortoleva, Luis Rayo, Larry Samuelson, Daniel Seidmann, Joel Sobel, Jean Tirole and seminar audiences at various institutions for comments and discussions. All errors are our own.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Lemma 1.
Let n N , M { 0 , 1 } denote the child’s equilibrium action upon observing ( N , M ) . We now prove that the following strategies constitute an equilibrium: (i) Children select n 0 , 0 = 0 , and n N , M { 0 , 1 } for ( 1 , 0 ) , ( 0 , 1 ) and ( 1 , 1 ) . [Refinement 1 imposes n 0 , 1 = 0 but since this is not a necessary condition for the equilibrium to hold, we provide a more general proof here.] (ii) Parents always select ( 0 , 0 ) . To see why this represents an equilibrium, consider first parents. In the proposed equilibrium, parents expect that m t + 1 = 0 . The expected payoffs from different action-pairs are: U t ( 0 , 0 ) = 0 , U t ( 1 , 0 ) = c 1 + n 1 , 0 , U t ( 0 , 1 ) = c n 0 , 1 θ b t 1 ε , U t ( 1 , 1 ) = c 1 + n 1 , 1 θ b t 1 ε . Clearly, ( 1 , 0 ) , ( 0 , 1 ) and ( 1 , 1 ) are strictly dominated by ( 0 , 0 ) , and this is true independently of the precise values taken by b t 1 , n 1 , 0 , n 0 , 1 or n 1 , 1 . Consider now children. Given the parents’ strategy, the children’s best reply along the equilibrium path is clearly to select n 0 , 0 = 0 . The action-pairs ( 1 , 0 ) , ( 0 , 1 ) and ( 1 , 1 ) are out-of-equilibrium moves, and, thus, the child’s beliefs upon observing them are not well defined. However, as we have seen, the precise values taken by n 1 , 0 , n 0 , 1 or n 1 , 1 are unimportant for the equilibrium to hold. □
Proof of Lemma 2.
Let n N , M { 0 , 1 } be the child’s action upon observing ( N , M ) . In Lemma 3, below, the prove that, in any equilibrium where punishment occurs, n ( 1 , 1 ) = 1 and n ( 1 , 0 ) = 0 . From (2), it is clear that the child will follow the norm iff c < δ Θ E m t + 1 N , M , and will break the norm otherwise. Hence, E m t + 1 1 , 1 > c / δ Θ E m t + 1 1 , 0 . □
Proof of Lemma 3.
We divide the proof in parts. Part (i): In any equilibrium with punishment, youngsters whose parent follows the norm but does not punish break the norm. Let n N , M { 0 , 1 } be the child’s action upon observing ( N , M ) . Consider an equilibrium where n 1 , 0 = 1 . The parent’s expected payoff from ( 1 , 0 ) is then 2 c . Consider now the parent’s payoff from ( 1 , 1 ) . This is
c 1 + n 1 , 1 θ b t 1 ε Θ 1 n 1 , 1 m t + 1 .
It is straightforward to see that if n 1 , 1 = 1 then (A1) < 2 c . If n 1 , 1 = 0 then (A1) becomes
c θ b t 1 ε Θ m t + 1 .
Expression (A2) weakly exceeds 2 c , namely the payoff from ( 1 , 0 ) , iff
c θ b t 1 + Θ m t + 1 + ε
If (A3) does not hold, then ( 1 , 1 ) is dominated by ( 1 , 0 ) . Consider now the parent’s payoff from ( 0 , 0 ) . From refinement 1, n ( 0 , 0 ) = 0 , and, hence, the payoff from ( 0 , 0 ) is
2 Θ m t + 1 .
It is straightforward to see that if (A3) holds, then (A4) is strictly higher than (A2). To recap, when n 1 , 0 = 1 we have two possibilities. If n 1 , 1 = 1 then (1,1) is surely dominated by (1,0). If n 1 , 1 = 0 , then ( 1 , 1 ) is either dominated by ( 1 , 0 ) or it is dominated by ( 0 , 0 ) . Hence, (1,1) cannot be selected in equilibrium. Consider now (0,1). By refinement 1, n 0 , 1 = 0 . This implies that 0 , 1 is strictly dominated by ( 0 , 0 ) . This proves that, if n 1 , 0 = 1 , then punishment cannot emerge in equilibrium. Part (ii). In any equilibrium with punishment, youngsters whose parent both follows the norm and punishes follow the norm. The necessary condition for punishment to emerge is that ( 1 , 1 ) should not be a dominated action. From (i), we know that in any equilibrium with punishment it is necessary that n 1 , 0 = 0 . The parent’s expected payoff from ( 1 , 0 ) is then c Θ m t + 1 . Consider now the parent’s payoff from ( 1 , 1 ) , namely (A1). It is straightforward to see that, if n 1 , 1 = 0 , then ( 1 , 1 ) is strictly dominated by ( 1 , 0 ) . Hence, in any equilibrium with punishment, we need n 1 , 1 = 1 . □
Proof of Lemma 4.
We first establish some general properties. Inspection of (5)–(7) reveals a number of things. First, as already stated, (0,1) is clearly dominated, and, thus, cannot be an equilibrium strategy. In what follows, we will thus ignore the existence of this strategy. Second, we cannot have U t ( 1 , 1 ) = U t ( 1 , 0 ) = U t ( 0 , 0 ) . Third, for punishment to emerge at t, it is necessary that U t ( 1 , 1 ) U t ( 1 , 0 ) —or else punishing would always be dominated. In turn, it is straightforward to see that this requires that U t ( 1 , 0 ) > U t ( 0 , 0 ) must also hold. Hence, if punishment occurs at t, then ( 0 , 0 ) is dominated by (1,0), which implies that all adults must choose to follow the norm. Conversely, for ( 0 , 0 ) not to be dominated, we require U t ( 0 , 0 ) U t ( 1 , 0 ) . In turn, this requires that U t ( 1 , 0 ) > U t ( 1 , 1 ) must hold, i.e., (1,1) is dominated by (1,0). We now explore the implications of these properties for steady state. Above we have concluded that, for ( 0 , 0 ) not to be dominated at t, ( 1 , 1 ) must be dominated by (1,0) and, thus, m t = 0 (recall that (0,1) is always dominated). In steady state, this then implies m* = 0, i.e., nobody punishes. Since m* = 0, it is clear that following the norm cannot possibly be optimal. This proves that 0 , 0 may never coexist with either ( 1 , 1 ) or ( 1 , 0 ) . Hence, ( 0 , 0 ) may emerge in a steady-state equilibrium only if all parents select this action pair (and, consequently, all children break the norm), i.e., m* = 0 and b* = 2. From Lemma 1 , we know that, in that case, the optimal action-pair is ( 0 , 0 ) . Hence, m* = 0 and b* = 2 is a possible steady state regime. The other possible steady states are (i) all adults who can punish select (1,1), while those who cannot punish select (1,0); (ii) all adults select (1,0) (so that by Lemma 3 all children break the norm).
In case (i) we have two possibilities: (1) the state of the world is l. In that case, m* = 1 and b* = 0. Payoffs are: U ( 1 , 1 ) = 2 c ε , U 1 , 0 = c Θ , U 0 , 0 = 2 Θ . It is straightforward to show that, provided that
Θ ε > c
holds, 1 , 0 and 0 , 0 are dominated by ( 1 , 1 ) . Hence, m* = 1 and b* = 0 is a possible steady state regime. The second possibility for case (i) is: (2) the state of the world is h. In that case, m* = 1 q and b* = q. Payoffs are: U ( 1 , 1 ) = 2 c ε θ q , U 1 , 0 = c Θ 1 q , U 0 , 0 = 2 Θ ( 1 q ) . Consider first parents who cannot punish. For them, ( 1 , 1 ) is obviously not an option. Selecting (1,0) yields a higher payoff than (0,0) if
Θ 1 q > c
Consider now those parents who can punish. The necessary and sufficient condition for option ( 1 , 1 ) to dominate both 1 , 0 and 0 , 0 is
Θ 1 q ε θ q > c .
Note that, if (A7) holds, (A6) and (A5) automatically hold too (but not vice-versa).
In case (ii), m = 0 and b = 1 . Payoffs are: U 1 , 1 = 2 c θ ε , U 1 , 0 = c , U 0 , 0 = 0 . Hence, (1,0) is strictly dominated by (0,0). This implies that case (ii) cannot be a steady-state equilibrium. □
Proof of Proposition 1.
(i) We prove the necessity of condition (a). Suppose that parents follow the strategy described in the proposition, and let prior beliefs be p H and p I , both strictly positive. From Lemma 3, we know that in any equilibrium with punishment, n ( 1 , 0 ) = 0 and n ( 1 , 1 ) = 1 must hold. Upon observing (1,1), the child’s posterior expectation of m t + 1 is equal to π H 1 , 1 + ( 1 q ) π I 1 , 1 . To obtain n ( 1 , 1 ) = 1 in equilibrium, we thus require π H 1 , 1 + ( 1 q ) π I 1 , 1 δ Θ < c . Substituting for π H 1 , 1 = p H / p H + ( 1 q ) p I and π I 1 , 1 = ( 1 q ) p I / p H + ( 1 q ) p I and rearranging, we obtain
p H + ( 1 q ) 2 p I p H + ( 1 q ) p I δ Θ > c .
Moreover, we know from the proof of Lemma above that, for parents who can punish, the necessary and sufficient condition for option ( 1 , 1 ) to dominate 1 , 0 and 0 , 0 in both regimes H and I is that
Θ 1 q ε θ q > c .
This explains the LHS of condition (a). Consider now the RHS of (a). Upon observing (1,0), the child’s posterior expectation of m t + 1 is ( 1 q ) . To obtain n ( 1 , 0 ) = 0 in equilibrium, we thus require ( 1 q ) δ Θ c . Rearranging, this gives the RHS of (a). This proves that (a) is necessary for a punishment equilibrium. (ii) We prove the necessity of the requirement p H > 0 and p I > 0 . First, note that, if p H = 0 , then an equilibrium with punishment cannot exist since in that case n ( 1 , 1 ) = 1 and n ( 1 , 0 ) = 0 cannot hold simultaneously. Suppose now that p I = 0 , so that ( 1 , 0 ) is an out-of-equilibrium move by the parent. Let n 1 , 0 { 0 , 1 } be the child’s action following this out-of-equilibrium move. The parent’s payoff from selecting ( 1 , 0 ) in the H regime is
c 1 + n 1 , 0 Θ 1 n 1 , 0 .
The parent’s payoff from selecting ( 1 , 0 ) in the L regime is
c 1 + n 1 , 0 .
It is straightforward to see that ( 1 , 0 ) is necessarily equilibrium-dominated in the L regime (since (A11) < 0 ). However, this is not the case in the H regime: If n 1 , 0 = 1 , then (A10) becomes 2 c and is therefore higher than the equilibrium payoff in the H regime, namely 2 c ε . Under the Intuitive Criterion, upon observing the out-of-equilibrium move ( 1 , 0 ) the child should therefore conclude that the regime is H. The child’s optimal reply is thus n 1 , 0 = 1 . In turn, this makes punishing dominated for parents (which in turn renders norm-following suboptimal as well). This implies that, if p I = 0 , the H regime would unfold. The only possible refined steady state equilibrium would then be one where nobody follows the norm and nobody punishes. We conclude that p H > 0 and p I > 0 are necessary for punishment to emerge in a refined steady state equilibrium. (iii) We show that the punishment equilibrium cannot be upset by out-of-equilibrium moves. Since both p H and p I are strictly positive, the only possible cases of zero-probability events that may arise are N i , M i = ( 0 , 1 ) or N i , M i = ( 0 , 0 ) and p N = 0 . It is straightforward to show that selecting ( 0 , 1 ) is strictly dominated by ( 1 , 1 ) in both the H and the I regimes. Hence, the parent could never gain from selecting this out-of-equilibrium move in either the H or the I regime. Suppose now that p L = 0 and consider now a parent who selects the move ( 0 , 0 ) . In the H regime, this is strictly dominated by ( 1 , 1 ) independently of the precise value of n 0 , 0 . Hence, under the Intuitive Criterion, upon observing (0,0) the child must rule the H regime out. The child’s optimal reply is then n ( 0 , 0 ) = 0 . Given this, it is straightforward to see that a parent could also never gain from selecting the out-of-equilibrium move (0,0) in the I regime. [Note that by refinement 1 we could have imposed n ( 0 , 0 ) = 0 at the outset. However, as we have shown, this is actually not necessary for the argument.] (iv) Finally, we argue that, given p H and p I are strictly positive, (a) is a sufficient condition for the punishment equilibrium to exist. To see this, note that, under (a), the child’s best replies are n ( 1 , 0 ) = 0 and n ( 1 , 1 ) = 1 . Given (A9), the parent’s best reply is (1,1) if he can punish and (1,0) if he cannot. □
Proof of Proposition 2.
Total material welfare in the history-independent equilibrium is zero. Consider now the H regime. In any given period t, the material payoff obtained by each adult is c ε , while the material payoff obtained by each child is c . Since each of these constituents is <0, it is clear that material welfare here is strictly negative. Consider now the I regime. In any given period t, the material payoff obtained by a share 1 q of adults is c θ 1 q ε , while that of the remaining q share is c . The material payoff obtained by a share 1 q of children is c , while that obtained by the remaining q share is Θ ( 1 q ) .41 Since each of these constituents is <0, it is clear that material welfare in the I regime is also strictly negative. □
Proof of Lemma 5.
In the H regime, the parent’s direct payoff is c 2 r + R (since he rewards a mass 2 of people, namely all adults and all children active in the previous generation, and he is rewarded by a mass one of people, namely all adults active next period). By condition (b), this is greater than zero. In the I regime, the parent’s direct payoff is c r ( 2 q ) + R ( 1 q ) if he can punish, and c + R ( 1 q ) otherwise. Under condition (b), both are strictly greater than 0. □
Proof of Proposition 4.
In Lemma 5 we have shown that in any reward equilibrium the material payoff of parents is strictly positive. Consider now children. By breaking the norm, a child knows that he can obtain 0. Hence, in any equilibrium with reward, children must obtain at least 0. [Note that this argument holds independently of whether we use the child’s or the parent’s discount factor to evaluate the child’s material welfare.] Summing up, this implies that in any reward equilibrium welfare is higher than in the history-independent equilibrium. □
Proof of Proposition 5.
We formalize the graphical intuition provided in the text. Recall that equilibrium payoffs for the parent in the different reward regimes are as follows. H reward regime:
2 c 2 r + 2 R
I reward regime (if he can reward):
2 c r ( 2 q ) + 2 R ( 1 q )
if he can not reward c + R ( 1 q )
Consider the out-of-equilibrium move ( 1 , 2 ) , in which the parent follows the norm and punishes norm-breakers (rather than rewarding norm followers). Suppose that n 1 , 2 = 1 . In the H reward regime, the payoff from the deviation is
2 c + 2 R ε ,
in the I reward regime it is
2 c + 2 R ( 1 q ) θ q ε
while in the L reward regime it is 2 c 2 θ ε . Clearly, the deviation is equilibrium-dominated in the L regime, since there the parent obtains 0 in equilibrium. If
θ q + ε 2 q > r > ε 2
then the deviation payoff in the I regime, (A15), is smaller than the equilibrium payoff (A13), while the deviation payoff in the H regime, (A14), exceeds the equilibrium payoff (A12). When (A16) holds, the Intuitive Criterion requires that, upon observing ( 1 , 2 ) , the child must rule the L and I reward regimes out, and must assign probability 1 to the H reward regime. The child’s optimal reply is thus n 1 , 2 = 1 , which renders the deviation 1 , 2 optimal in the H reward regime. The sufficient conditions for (A16) are: (i) θ > r ( 2 q ) / q and (ii) ε < 2 r . We conclude that, for each r > 0 there exist values θ and ε > 0 such that, if θ > θ and ε < ε , the reward equilibrium of the augmented game fails the intuitive criterion. □
Proof of Proposition 6.
Recall the equilibrium payoffs for the parent in the different punishment regimes. H punishment regime:
2 c ε
I punishment regime:
2 c θ q ε
c Θ ( 1 q ) if they can not punish.
(i) Consider first the out-of-equilibrium move ( 1 , 2 ) , in which the parent follows the norm and rewards norm-followers. It is clear that if n 1 , 2 = 0 the deviation cannot be profitable. Suppose now that n 1 , 2 = 1 . In the H punishment regime, the payoff from the deviation is 42
2 c 2 r
in the I punishment regime it is
2 c r 2 q .
while in the L punishment regime it is 2 c . Clearly, the deviation is equilibrium-dominated in the L regime, since there the parent obtains 0 in equilibrium. Consider now the H punishment regime. For the deviation to be profitable, we require (A19) to exceed (A17), the equilibrium payoff. This happens if
ε > 2 r .
Note that if (A21) holds then (A20) necessarily exceeds the equilibrium payoff in the I punishment regime when the parent can punish. 43 This implies that, under the Intuitive Criterion, the child’s posterior beliefs following ( 1 , 2 ) cannot simultaneously rule out the I punishment regime and assign positive probability to the H punishment regime. The deviation is either (a) equilibrium-dominated in both regimes, or (b) profitable in both regimes, or (c) it is profitable in the I punishment regime and it is equilibrium-dominated in the H punishment regime. In case (a) the Intuitive Criterion does not pin down the child’s posterior beliefs following the deviation; In case (b), the Intuitive Criterion establishes that following the deviation the child should rule out regime L (but may assign positive probability to I and/or H); In case (c), the Intuitive Criterion establishes that following the deviation the child should assign probability 1 to the I punishment regime. All three cases (a)–(c) are consistent with the child’s reply following the out-of-equilibrium move being n 1 , 2 = 0 . If we set n 1 , 2 = 0 (which we can do, since as we have just argued this does not contradict the Intuitive Criterion), then the out-of-equilibrium move ( 1 , 2 ) is unprofitable in all regimes. (ii) Consider now the out-of-equilibrium move ( 0 , 2 ) , in which the parent breaks the norm and rewards norm-followers. It is clear that if n 0 , 2 = 0 the deviation cannot be profitable. Suppose then that n 0 , 2 = 1 . In the H punishment regime, the payoff from the deviation is
c 2 r Θ
in the I punishment regime it is
c r 2 q Θ ( 1 q ) .
while in the L punishment regime it is 2 c . Clearly, the deviation is equilibrium-dominated in the L punishment regime. Consider now the H punishment regime. For the deviation to be profitable, we require (A22) to exceed the equilibrium payoff, namely (A17). This happens if
ε > 2 r c + Θ .
Note that if (A24) holds then (A23) necessarily exceeds the equilibrium payoff in the I punishment regime when the parent can punish. 44 This implies that, under the Intuitive Criterion, the child’s posterior beliefs following ( 0 , 2 ) cannot simultaneously rule out the I punishment and assign positive probability to the H punishment regime. The remainder of the argument is then the same as in part (i).
Robustness to direct communication. Consider the out-of-equilibrium move in which the parent follows the norm and sends the message (rather than punish). Let t 0 be the cost of sending the message. If upon receiving the message the child follows the norm, the payoff from deviating for the parent is 2 c t both in the H and the I punishment regimes. Since equilibrium payoff in the H punishment regime exceeds that in I regime, it follows that, under the Intuitive Criterion, the child’s posterior beliefs following the deviation cannot simultaneously rule out the I punishment regime and assign positive probability to the H punishment regime. The case where the parent’s out-of-equilibrium move consists of breaking the norm and sending the message is analogous. □

Appendix B. Reward Equilibrium

To characterize the reward equilibrium, we retrace the steps we took in Section 4.
Lemma A1.
(Strategy B.) In an equilibrium withreward, the following must hold. A child who has observed his parent both following the norm and rewarding will follow the norm. A child who has observed his parent following the norm but not rewarding will break the norm.
Proof. 
The proof of Lemma A1 is a straightforward adaptation of the proof of Lemma 3 and is therefore omitted.
Suppose then that Refinement 1 holds, and that children follow strategy B. Consider a parent active at time t. We now derive his payoff in each of the available action-pairs. First, the payoff from following the norm and rewarding—namely, N i , M i = ( 1 , 1 ) —is given by
U t ( 1 , 1 ) = 2 c τ 2 b t 1 + 2 R m t + 1 .
The payoff from N i , M i = ( 1 , 0 ) —namely, following the norm and failing to reward—is
U t ( 1 , 0 ) = c + R m t + 1 .
Third, the payoff from ( 0 , 0 ) —namely, breaking the norm and failing to reward—is
U t ( 0 , 0 ) = 0 .
Finally, the payoff from N i , M i = ( 0 , 1 ) —namely, breaking the norm and rewarding—is
U t ( 0 , 1 ) = τ 2 b t 1 .
Several remarks are in order. First, (0,1) is clearly a dominated strategy. Second, we cannot have U t ( 1 , 1 ) = U t ( 1 , 0 ) = U t ( 0 , 0 ) . Third, for reward to emerge at t, it is necessary that U t ( 1 , 1 ) U t ( 1 , 0 ) . In turn, it is straightforward to see that this requires that U t ( 1 , 0 ) > U t ( 0 , 0 ) must also hold. Hence, if reward occurs at t, then ( 0 , 0 ) is dominated, so that all adults must choose to follow the norm. Conversely, for ( 0 , 0 ) not to be dominated, we require U t ( 0 , 0 ) U t ( 1 , 0 ) . In turn, this requires that U t ( 1 , 0 ) > U t ( 1 , 1 ) must hold, i.e., (1,1) is dominated. Having established these general properties, we can now characterize possible steady states. □
Lemma A2.
Suppose that (1) children follow strategy B and (2) refinement 1 holds. The possible steady states are:
(i) 
High-reward: All parents follow the norm and (if they can) reward. As a result, in state l: m = 1 , b = 0 , while in state h: m = 1 q , b = q .
(ii) 
No-reward: m = 0 , b = 2 . Nobody rewards, nobody follows the norm.
Proof. 
First, the action-pair (0,1) is always strictly dominated and therefore cannot be an equilibrium move. In what follows, we will therefore ignore strategy (0,1). Second, the existence of a no-reward steady state follows from lemma 1. Moreover, from the argument provided in the text above, we know that 0 , 0 may never coexist with either ( 1 , 1 ) or ( 1 , 0 ) . Hence, the possible steady state regimes are: (i) All adults who can select (1,1), while those with prohibitively costly rewarding costs (if any) select (1,0); (ii) All adults select (1,0).
In case (i), we have two possibilities: (1) The state of the world is l. In that case, m = 1 , b = 0 . Payoffs are U ( 1 , 1 ) = 2 c 2 r + 2 R , U ( 1 , 0 ) = c + R and U ( 0 , 0 ) = 0 . Provided that
R 2 r > c
1 , 0 and (0,0) are dominated by (1,1). This proves that m = 1 , b = 0 is a possible steady state. The second possibility that may arise in case (i) is: (2) the state of the world is l. In that case, m = 1 q , b = q . Payoffs are: U ( 1 , 1 ) = 2 c τ ( 2 q ) + 2 R ( 1 q ) , U ( 1 , 0 ) = c + R 1 q and U ( 0 , 0 ) = 0 . Consider first parents who can reward. Provided that
R ( 1 q ) r ( 2 q ) > c
1 , 0 and (0,0) are dominated by (1,1). Note that when (A30) hold, (A29) automatically holds, but not vice-versa. Consider now parents who cannot reward. It is straightforward to show that, under (A30), they strictly prefer (1,0) to (0,0). In case (ii), m = 0 , b = 1 . As a result, U ( 1 , 1 ) = 2 c τ , U ( 1 , 0 ) = c and U ( 0 , 0 ) = 0 . Clearly, (1,0) is strictly dominated by (0,0). This implies that m = 0 and b = 1 is not a possible steady state. □
We are now able to prove Proposition 3.
Proof of Proposition 3
(i) We first prove that condition (b) is necessary. The necessity of (A30) is proved above in the proof of Lemma 4. We now prove the necessity of the condition
p H + ( 1 q ) 2 p I p H + ( 1 q ) p I δ R > c .
Suppose that parents follow the strategy described in the proposition, and let prior beliefs be p H and p I , both strictly positive. From Lemma 3, we know that in any equilibrium with punishment, n ( 1 , 0 ) = 0 and n ( 1 , 1 ) = 1 must hold. Upon observing (1,1), the child’s posterior expectation of m t + 1 is equal to π H 1 , 1 + ( 1 q ) π I 1 , 1 . To obtain n ( 1 , 1 ) = 1 in equilibrium, we thus require c + π H 1 , 1 + ( 1 q ) π I 1 , 1 δ R > 0 . Substituting for π H 1 , 1 = p H / p H + ( 1 q ) p I and π I 1 , 1 = ( 1 q ) p I / p H + ( 1 q ) p I and rearranging, we obtain (A31). We now prove the necessity of the RHS of condition (b). Upon observing (1,0), the child’s posterior expectation of m t + 1 is 1 q . To obtain n ( 1 , 0 ) = 0 in equilibrium, we thus require c + ( 1 q ) δ R 0 . Rearranging, this gives the RHS of (b). This proves that (b) is necessary for a reward equilibrium. (ii) We prove that p H > 0 and p I > 0 are necessary. First, note that, if p H = 0 , the requirements n ( 1 , 1 ) = 1 and n ( 1 , 0 ) = 0 cannot hold simultaneously. Suppose now that p I = 0 , so that ( 1 , 0 ) is an out-of-equilibrium move. Let n 1 , 0 { 0 , 1 } be the child’s action following this out-of-equilibrium move by the parent. The parent’s payoff in the H regime is
R c 1 + n 1 , 0 .
The parent’s payoff in the L regime is c . It is straightforward to see that ( 1 , 0 ) is necessarily equilibrium-dominated in the L regime (since c < 0 ). However, this is not the case in the H regime: If n 1 , 0 = 1 , then (A10) becomes 2 R c and is therefore higher than the equilibrium payoff in that regime, namely 2 R c τ . Under the Intuitive Criterion, upon observing the out-of-equilibrium move ( 1 , 0 ) the child should therefore conclude that the regime is H. The child’s optimal reply is thus n 1 , 0 = 1 . In turn, this makes rewarding (and, thus, norm-following) dominated for parents. This implies that, if p I = 0 , the H regime would unfold. The only possible refined steady state equilibrium would then one where nobody follows the norm and nobody rewards. We conclude that p H > 0 and p I > 0 are necessary for reward to emerge in a refined equilibrium. (iii) We show that the reward equilibrium cannot be upset by out-of-equilibrium moves. It is straightforward to show that selecting ( 0 , 1 ) is strictly dominated by ( 1 , 1 ) in both the H and the I regimes. Hence, the parent could never gain from selecting this out-of-equilibrium move in either the H or the I regime. Consider now ( 0 , 0 ) . In the H regime, this is strictly dominated by ( 1 , 1 ) , independently of the precise value of n 0 , 0 . Hence, under the Intuitive Criterion, upon observing (0,0) the child must rule the H regime out. The child’s optimal reply is then n ( 0 , 0 ) = 0 . Given this, it is straightforward to see that ( 0 , 0 ) is equilibrium-dominated also in the I regime. The parent can therefore not gain from selecting this out-of-equilibrium move. (iv) We prove that, given p H and p I are strictly positive, condition (b) is sufficient for a reward equilibrium to exist. Under (b), the child’s best replies are n ( 1 , 0 ) = 0 and n ( 1 , 1 ) = 1 . Given (A30), the parent’s best reply is (1,1) if he can reward and (1,0) otherwise. □

References

  1. Fehr, E.; Gächter, S. Cooperation and punishment in public goods experiments. Am. Econ. Rev. 2000, 90, 980–994. [Google Scholar] [CrossRef]
  2. Feinberg, J. Doing and Deserving: Essays in the Theory of Responsibility; Princeton, N.J., Ed.; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
  3. Kahan, D.M. Social meaning and the economic analysis of crime. J. Legal Stud. 1998, 27, 661–672. [Google Scholar] [CrossRef]
  4. Cooter, R.D. Expressive law and economics. J. Legal Stud. 1988, 27, 585–608. [Google Scholar] [CrossRef]
  5. Sunstein, C.R. On the expressive function of law. Univ. Law Rev. 1996, 144, 2021–2031. [Google Scholar] [CrossRef]
  6. Bicchieri, C.; Muldoon, R. Social Norms. In The Stanford Encyclopedia of Philosophy; Zalta Edward, N., Ed.; Spring: Palo Alto, CA, USA, 2014. [Google Scholar]
  7. Rotemberg, J.J. Minimally acceptable altruism and the ultimatum game. J. Econ. Behav. Organ. 2008, 66, 457–476. [Google Scholar] [CrossRef] [Green Version]
  8. Akerlof, G. The Economics of Caste and of the Rat Race and Other Woeful Tales. Q. J. Econ. 1976, 90, 599–617. [Google Scholar] [CrossRef]
  9. Peski, M.; Szentes, B. Spontaneous discrimination. Am. Econ. Rev. 2013, 6, 2412–2436. [Google Scholar] [CrossRef]
  10. Axelrod, R. An evolutionary approach to norms. Am. Political Sci. Rev. 1986, 80, 1095–1111. [Google Scholar] [CrossRef]
  11. Ghosh, P.; Ray, D. Cooperation in community interaction without information flows. Rev. Econ. Stud. 1996, 63, 491–519. [Google Scholar] [CrossRef]
  12. Glazer, A.; Konrad, K. A signaling explanation for private charity. Am. Econ. Rev. 1996, 86, 1019–1028. [Google Scholar]
  13. Ellingsen, T.; Johannesson, M. Pride and prejudice: The human side of incentive theory. Am. Econ. Rev. 2008, 98, 990–1008. [Google Scholar] [CrossRef]
  14. Hopkins, E. Competitive altruism, mentalizing and signalling. Am. Econ. J. 2014, 6, 272–292. [Google Scholar]
  15. Sliwka, D. Trust as a signal of a social norm and the hidden costs of incentive schemes. Am. Econ. Rev. 2006, 97, 999–1012. [Google Scholar] [CrossRef]
  16. Gneezy, U.; Rustichini, A. A fine is a price. J. Legal Stud. 2000, 29, 1–18. [Google Scholar] [CrossRef]
  17. Adriani, F.; Sonderegger, S. Signaling About Norms: Socialization Under Strategic Uncertainty. Scand. J. Econ. 2018, 120, 685–716. [Google Scholar] [CrossRef]
  18. Adriani, F.; Matheson, J.; Sonderegger, S. Teaching by Example and Induced Beliefs in a Model of Cultural Transmission. J. Econ. Behav. Organ. 2018, 145, 511–529. [Google Scholar] [CrossRef]
  19. Kotsidis, V. Call to Action: Intrinsic Motives and Material Interest; Mimeo: New York, NY, USA, 2018. [Google Scholar]
  20. Bėnabou, R.; Tirole, J. Identity, morals and taboos: Beliefs as assets. Q. J. Econ. 2011, 126, 805–855. [Google Scholar] [CrossRef]
  21. Bėnabou, R.; Tirole, J. Self-confidence and personal motivation. Q. J. Econ. 2002, 117, 871–915. [Google Scholar] [CrossRef]
  22. Bėnabou, R.; Tirole, J. Willpower and personal rules. J. Political Econ. 2004, 112, 848–887. [Google Scholar] [CrossRef]
  23. Bėnabou, R.; Tirole, J. Belief in a just world and redistributive politics. Q. J. Econ. 2006, 121, 699–746. [Google Scholar] [CrossRef]
  24. Dessí, R. Collective memory, cultural transmission, and investments. Am. Econ. Rev. 2009, 98, 534–560. [Google Scholar] [CrossRef]
  25. Bisin, A.; Verdier, T. Beyond the Melting Pot: Cultural Transmission, Marriage, and the Evolution of Ethnic and Religious Traits. Q. J. Econ. 2000, 115, 955–988. [Google Scholar] [CrossRef]
  26. Corneo, G.; Jeanne, O. A theory of tolerance. J. Public Econ. 2009, 93, 691–702. [Google Scholar] [CrossRef]
  27. Corneo, G.; Jeanne, O. Symbolic values, occupational choice and economic development. Eur. Econ. Rev. 2010, 54, 241–255. [Google Scholar] [CrossRef]
  28. Cervellati, M.; Vanin, P. Thou shalt not covet: Prohibitions, temptation and moral values. J. Public Econ. 2013, 103, 15–28. [Google Scholar] [CrossRef]
  29. Carvalho, J.-P. Veiling. Q. J. Econ. 2013, 128, 337–370. [Google Scholar] [CrossRef]
  30. Verdier, T.; Zenou, Y. Cultural leaders and the dynamics of assimilation. J. Econ. Theory 2018, 175, 374–414. [Google Scholar] [CrossRef]
  31. Van der Weele, J. The signaling power of sanctions in social dilemmas. J. Law Econ. Organ. 2012, 28, 103–126. [Google Scholar] [CrossRef]
  32. Andreoni, J.; Harbaugh, W.; Vesterlund, L. The carrot or the stick: Rewards, punishments, and cooperation. Am. Econ. Rev. 2003, 93, 893–902. [Google Scholar] [CrossRef]
  33. Herold, F. Carrot or Stick: The evolution of reciprocal preferences in a haystack model. Am. Econ. Rev. 2012, 102, 914–940. [Google Scholar] [CrossRef]
  34. Acemoglu, D.; Jackson, M.O. History, expectations, and leadership in the evolution of social norms. Rev. Econ. Stud. 2014, 82, 423–456. [Google Scholar] [CrossRef]
  35. Rohner, D.; Thoenig, M.; Zilibotti, F. War signals: A theory of trade, trust, and conflict. Rev. Econ. Stud. 2013, 80, 1114–1147. [Google Scholar] [CrossRef]
  36. Bidner, C.; Francois, P. The emergence of political accountability. Q. J. Econ. 2013, 128, 1397–1448. [Google Scholar] [CrossRef]
  37. Tirole, J. A theory of collective reputations (with applications to the persistence of corruption and to firm quality). Rev. Econ. Stud. 1996, 63, 1–22. [Google Scholar] [CrossRef]
  38. Elster, J. Social norms and economic theory. J. Econ. Perspect. 1989, 3, 99–117. [Google Scholar] [CrossRef]
  39. Bidner, C.; Eswaran, M. A Gender-Based Theory On the Origin of the Caste System in India. Unpublished manuscript. 2014. [Google Scholar]
  40. Robson, A.J.; Samuelson, L. The evolution of intertemporal preferences. Am. Econ. Rev. 2007, 97, 496–500. [Google Scholar] [CrossRef]
  41. Robson, A.J.; Samuelson, L. The evolution of time preference with aggregate uncertainty. Am. Econ. Rev. 2009, 99, 1925–1953. [Google Scholar] [CrossRef]
  42. Doepke, M.; Zilibotti, F. Parenting with Style: Altruism and Paternalism in Intergenerational Preference Transmission. Unpublished manuscript. 2014. [Google Scholar] [Green Version]
  43. Bisin, A.; Verdier, T. The economics of cultural transmission and the dynamics of preferences. J. Econ. Theory 2001, 97, 298–319. [Google Scholar] [CrossRef]
  44. Cho, I.-K.; Kreps, D.M. Signaling games and stable equilibria. Q. J. Econ. 1987, 102, 177–222. [Google Scholar] [CrossRef]
  45. Adriani, F.; Sonderegger, S. Why do parents socialize their children to behave pro-socially? An information-based theory. J. Public Econ. 2009, 93, 1119–1124. [Google Scholar] [CrossRef] [Green Version]
  46. Grout, P.; Mitraille, S.; Sonderegger, S. The Costs and Benefits of Coordinating with a Different Group. J. Econ. Theory 2015, 160, 536–556. [Google Scholar] [CrossRef]
  47. Kets, W.; Sandroni, A. A Theory of Strategic Uncertainty and Cultural Diversity; Mimeo: New York, NY, USA, 2018. [Google Scholar]
  48. Ellingsen, T.; Johannesson, M. Conspicuous generosity. J. Public Econ. 2011, 95, 1131–1143. [Google Scholar] [CrossRef] [Green Version]
  49. Friedman, J. A noncooperative equilibrium for supergames. Rev. Econ. Stud. 1971, 38, 1–12. [Google Scholar]
  50. Abreu, D. Extremal equilibria of oligopolistic supergames. J. Econ. Theory 1986, 39, 191–225. [Google Scholar] [CrossRef]
  51. Güth, W.; Yaari, M. An evolutionary approach to explain reciprocal behavior in a simple strategic game. In Explaining Process and Change: Approaches in Evolutionary Economics; Witt, U., Ed.; The University of Michigan Press: Ann Arbor, MI, USA, 1992; pp. 23–34. [Google Scholar]
  52. Robson, A.J. The biological basis of economic behavior. J. Econ. Lit. 2001, 39, 11–33. [Google Scholar] [CrossRef]
  53. Samuelson, L. Information-based relative consumption effects. Econometrica 2004, 72, 93–118. [Google Scholar] [CrossRef]
  54. Samuelson, L.; Swinkels, J. Information, evolution and utility. Theory Econ. 2006, 1, 119–142. [Google Scholar]
  55. Rayo, L.; Becker, G. Evolutionary efficiency and happiness. J. Political Econ. 2007, 115, 302–337. [Google Scholar] [CrossRef]
  56. Netzer, N. Evolution of time preferences and attitudes toward risk. Am. Econ. Rev. 2009, 99, 937–955. [Google Scholar] [CrossRef]
  57. Adriani, F.; Sonderegger, S. Trust, Trustworthiness and the Consensus Effect: An Evolutionary Approach. Eur. Econ. Rev. 2015, 77, 102–116. [Google Scholar] [CrossRef]
  58. Robson, A.J.; Samuelson, L. The evolutionary foundations of preferences. In Handbook of Social Economics; Alberto, B., Matt, J., Eds.; Elsevier: Amsterdam, The Netherlands, 2010; pp. 221–310. [Google Scholar]
  59. Binmore, K.G. Natural Justice; Oxford University Press: Oxford, UK, 2005. [Google Scholar]
1
See e.g., Fehr and Gächter (2000) [1].
2
Advocates of this theory include Feinberg (1970), Kahan (1998), Cooter (1998) and Sunstein (1996) [2,3,4,5].
3
In 18th century Rome, for instance, fathers would bring their sons to see executions and would slap them at moment the executioner would strike his blow, to “keep that vision vivid in their memory.” This tradition was documented for instance in a sonnet by Roman poet G. G. Belli “Er Ricordo” in 1830. Similar practices existed also in other countries such as Great Britain.
4
This is presumably the rationale why a school may end up dismissing a female employee who becomes pregnant while unwed. Recent high-profile (and controversial) instances of this practice can be found at http://www.jewishpress.com/blogs/muqata/school-fires-unmarried-pregnant-teacher-the-whole-story/2013/03/05/ and at http://www.huffingtonpost.com/2013/03/01/teri-james-pregnant-woman-fired-premarital-sex-christian-school_n_2790085.html.
5
Further discussion of the types of norms that fit our analysis can be found in Section 7.1.
6
In their entry on social norms in the Stanford Encyclopedia of Philosophy, Bicchieri and Muldoon (2014) [6] report that one of the defining features of norms is that they can change rather abruptly, with a sudden and unexpected demise of old patterns of behavior.
7
Clearly enough, the absence of punishment may be seen as a reward, and, thus, the punishment equilibrium could be rephrased in those terms. However, in that case the activity of rewarding involves no cost (it is the absence of rewarding that is costly). By contrast, in Section 5 we consider rewards that are costly to those who implement them.
8
See e.g., Rotemberg (2008) [7] for a formal model.
9
See also Peski and Szentes (2013) [9].
10
The notion that, in some instances, information flows may be too restricted for reputation-based mechanisms to work effectively is well recognized, see e.g., Ghosh and Ray (1996) [11].
11
See e.g., Andreoni et al., (2003) [32] and literature thereafter. Interestingly, one of the findings of Andreoni et al. is that people tend to reward less when punishment is also available (but not vice-versa). This is consistent with our results in Section 6, where we show that punishment typically crowds out reward (but not vice-versa).
12
This paper belongs to a wider literature on endogenously derived preferences, discussed in Section 7.
13
It may be worth speculating about the possible implications of our analysis from an evolutionary viewpoint. We have shown that punishing behavior may emerge as the equilibrium of a game involving players with “standard” preferences. This may be interpreted as providing a possible rationale why direct preferences for punishing behavior may emerge—since they generate optimal behavior. In fact, our story predicts the possible coevolution of two sets of preferences: Preferences for punishing behavior (in parents), and preferences for following the diktat of parents, provided that it is backed up by parental behavior that is consistent with it (in children). These preferences, it may be argued, are “mutual best responses” under some conditions.
14
See also Tirole (1996) [37] for a classic account of how past history may affect current incentives.
15
It is straightforward to see that punishing is always dominated in state (ii).
16
We could have modelled this explicitly, by letting the mass of norm violators be b + ϵ , for some ϵ > 0 . This would not affect the results in any meaningful way, but would make the algebra less transparent. This, we feel, would detract attention from the main messages we wish to convey.
17
Another possibility, mentioned for instance in Bidner and Eswaran (2014) [39], is that the parents of violators may be punished, rather than the violators themselves. We do not explore this here.
18
This ensures stationarity of the problem at hand, since the problem an individual faces as an adult is independent of the action he took as a youngster.
19
This assumption is made for concreteness, but our key results would still go through if we assumed that the parent’s discount factor when evaluating his own payoff equals the child’s, namely δ . Rather, what is really crucial is the presence of paternalistic motives (see below), namely that, when evaluating the child’s payoff, the parent should discount the future less heavily than the child.
20
This paternalistic element shares similarity with the concept of imperfect empathy introduced by Bisin and Verdier (2001) [43].
21
We adopt the convention that, if indifferent, the child breaks the norm and, if indifferent, the parent does not punish. This is immaterial for the results.
22
This could in principle be an expected value, although as we will see that will not be the case in equilibrium.
23
References to the studies in question can be found in Bicchieri and Muldoon (2014) [6].
24
See Grout el al., (2015) [46] and Kets and Sandroni (2018) [47] for possible norm-related accounts of why mixed societies may instead by preferable.
25
If the message’s cost is exactly zero then the argument is strengthened, since the child cannot rule out the message emanating from the L regime.
26
Note that the parent has no incentive to deviate by both punishing and sending the message since this would involve unnecessary costs (recall that in the equilibrium we are considering the act of punishing is sufficient to induce the child to follow the norm).
27
This stands in contrast with the net returns from punishing, which are actually greater in regime H: the net gain from choosing (1,1) over (1,0) in regime H is Θ ε c , while in regime I it is Θ ( 1 q ) θ q ε c < Θ ε c . Another difference is that, while the child’s beliefs following the deviation are refined through the Intuitive Criterion, his beliefs following the equilibrium move (1,1) are given by standard Bayesian updating, and assign probability π I ( 1 , 1 ) < 1 to regime I, as shown in (9).
28
In an equilibrium of that nature, the act of rewarding would actually be part of the norm itself. An example may be a norm which consists of “sharing resources with all those who also share their resources”, and where punishment for failing to do so takes the form of being excluded from sharing. In this case, punishment is not a costly act to perform. As a result, its use to sustain norm-following is not surprising from a theoretical viewpoint.
29
Similar to the case of punishment, it is instructive to look at the benchmark case where γ , the probability that the state of the world is l, is arbitrarily close to (although strictly smaller than) one (so p I is arbitrarily small). In that case, condition (b) simplifies to
min { R ( 1 q ) r ( 2 q ) , δ R } > c > δ R ( 1 q ) .
independent of prior beliefs.
30
In the H regime, the payoff from ( 1 , 1 ) is 2 c 2 r + 2 R while that from ( 1 , 0 ) is c + R . If R < c + 2 r , then the latter clearly exceeds the former.
31
This argument assumes that utility from consumption is concave, and that there are non-negligible transaction costs for converting goods to cash.
32
In a different setup, Ellingsen and Johanneson (2011) [48] argue that a signaling explanation may lie at the core of the taboo against rewarding people using cash which we observe in many social occasions. Our story can be seen as complementary to theirs.
33
Note that this stands in contrast with the equilibrium payoffs obtained in the reward equilibrium—these are higher in the H reward regime than the I reward regime. Intuitively, this is because the equilibrium payoff includes the rewards people enjoy when they follow the norm.
34
Note that this argument does not necessarily imply that rewards will never be observed. In particular, one can envisage that, starting from an environment the norm is rare, norm-following may actually spread thanks to the use of rewards. What we argue, however, is that, if it is successful (in the sense of inducing generalized norm-following), the reward mechanism is liable to be supplanted by punishment.
35
Note that imperfect empathy would then take a dual form: (i) Parents evaluate the child’s cost of following the norm using the lens of their own cost; and (ii) parents evaluate the child’s penalty from being punished in case of norm-breaking using their own discount factor—this is the same as in Section 3.
36
Note however that, if we were to consider the setup discussed in Section 7.1, then this would not be the case. More precisely, in that setup, punishment could emerge even if the game were finite (provided that, in the last period, children are unaware that the game is in its last period).
37
The requirement Θ ε > c is not explicitly included in condition (a) since it is assumed at the outset, in the model section.
38
However, if the parent’s utility assigned a weight of exactly zero to his own payoff, then Proposition 5 would no longer hold since the parent would be unconcerned with choosing a deviation that allows to send the signal more cheaply.
39
The literature actually dates to Güth and Yaari (1992) [51], but has recently experienced renewed impetus—examples include Robson (2001) [52], Samuelson (2004) [53], Samuelson and Swinkels (2006) [54], Rayo and Becker (2007) [55], Netzer (2009) [56], Herold (2012) [33], Adriani and Sonderegger (2015) [57]. Robson and Samuelson (2010) [58] provide a comprehensive survey.
40
See e.g., Binmore (2005) [59] for a discussion of ultimate causes and proximate psychological mechanisms.
41
We are using the parent’s discount factor, namely 1, to evaluate the child’s material welfare. This is immaterial.
42
We restrict attention to τ = r since, clearly, the deviation is feasible only if the parent faces non-prohibitive rewarding costs.
43
This follows since r 2 q θ q < 2 r .
44
This follows from r 2 q θ q + Θ ( 1 q ) < 2 r + Θ .
Figure 1. Timing 1.
Figure 1. Timing 1.
Games 09 00102 g001
Figure 2. Timing 2.
Figure 2. Timing 2.
Games 09 00102 g002
Figure 3. Robustness 1.
Figure 3. Robustness 1.
Games 09 00102 g003
Figure 4. Robustness 2.
Figure 4. Robustness 2.
Games 09 00102 g004

Share and Cite

MDPI and ACS Style

Adriani, F.; Sonderegger, S. The Signaling Value of Punishing Norm-Breakers and Rewarding Norm-Followers. Games 2018, 9, 102. https://doi.org/10.3390/g9040102

AMA Style

Adriani F, Sonderegger S. The Signaling Value of Punishing Norm-Breakers and Rewarding Norm-Followers. Games. 2018; 9(4):102. https://doi.org/10.3390/g9040102

Chicago/Turabian Style

Adriani, Fabrizio, and Silvia Sonderegger. 2018. "The Signaling Value of Punishing Norm-Breakers and Rewarding Norm-Followers" Games 9, no. 4: 102. https://doi.org/10.3390/g9040102

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop