The Signaling Value of Punishing Norm-Breakers and Rewarding Norm-Followers

We formally explore the idea that punishment of norm-breakers may be a vehicle for the older generation to teach youngsters about social norms. We show that this signaling role provides sufficient incentives to sustain costly punishing behavior. People punish norm-breakers to pass information about past history to the younger generation. This creates a link between past, present, and future punishment. Information about the past is important for youngsters, because the past shapes the future. Reward-based mechanisms may also work and are welfare superior to punishment-based ones. However, reward-based mechanisms are fragile, since punishment is a more compelling signaling device (in a sense that we make precise).


Introduction
It is well known that people are willing punish norm-breakers, even in situations where punishment is costly and does not provide any material benefit to the punisher 1 and literature thereafter.This raises the question: Why should people be willing to incur personal costs in order to punish norm deviators?This paper focuses on an information-based rationale for punishment.The existence of an informational content of punishment is the basis of the so-called denunciation (or expressive) theory in legal philosophy. 2 This theory emphasizes the role of punishment in teaching people what are the social norms and what is not acceptable.Much anecdotal evidence such as the widespread tradition of bringing children to see public executions also points to a potential information role of punishment. 3Our analysis builds on this idea.We focus on information transmission from parents to children, although it should be quite clear that the analysis applies more broadly to environments in which an "experienced" individual may convey information to a "naive" individual, towards whom he feels altruistic-examples include teacher and pupil, senior worker and junior worker, etc.We argue that the desire to transmit the "correct" information may underpin punishment or ostracism of norm breakers.For instance, a Muslim mother who wears a veil and wishes to raise her daughter along similar lines may be reluctant to engage in friendly relations with unveiled women.
Similarly, to signal to his child that pre-marital sex is reprehensible, a parent might demand the removal of an unmarried pregnant teacher working in the child's school. 4e build a model of overlapping generations where parental behavior informs children about norms in society.Parents are paternalistic: they evaluate the child's cost of being punished in the future using the lens of their own (higher) discount factor.This generates a conflict of interests between parents and children that justifies the use of costly signaling for information transmission.We concentrate on norms that are individually costly.Our running example focuses on unproductive behaviors, such as dressing codes, lengthy rituals, or elaborate etiquette, although the analysis can equally apply to behaviors (such as public good provision) that generate externalities. 5orm-following is thus purely sustained by fear of punishment.By contrast, the metanorm of sanctioning norm violators is sustained by signaling motives.
The game has an equilibrium where nobody punishes, and nobody follows the norm.However, it also has a punishment equilibrium where norm-following and punishment occur.A central role is played by the notion that parents have better information than children about the history of play.Children know that society may be in one of different regimes, which differ in the extent to which punishment and norm-following are carried out, but they do not know which one.They use their parents' behavior to update their beliefs about the prevailing regime-and, thus, about the likelihood of being met with punishment if they break the norm.Parents punish past norm-breakers since by doing so they send the correct signal to their child.
An important contribution of the analysis is to highlight the role of history in sustaining norms.In equilibrium, people punish when they observe punishment featuring in recent shared history.Importantly, people (correctly) expect that recent history will shape individual behavior in the future.We show that this effect alone can provide sufficient incentives to sustain an otherwise dominated behavior such as costly punishment.
More precisely, adults active at t know that other t adults will punish past norm breakers, since they have been exposed to the same shared history as they have.At t + 1, recent history (capturing events at t) will thus contain punishment.In turn, this will induce people in t + 1 to punish as well.The connection between past, present and future punishment means that information about the past is valuable for youngsters, since the past molds the future.It is worth highlighting that in our setup history matters not only because it anchors beliefs but also, more concretely, because the total cost of punishing past norm breakers depends on how many people broke the norm in the previous period (since this determines how many people are to be punished).
Our results underscore the role of continuity, namely the link between past, present and future, to sustain norms (and the metanorm of punishment).The analysis suggests that, if continuity breaks down (as e.g., in the case of an external shock such as a war) this would trigger a process of unraveling that would affect behavior.This regime shift could happen very quickly-in contrast with the predictions of preference-based theories of punishment, where a shock to preferences would translate only very slowly onto behavior (since the older generations would have to die off). 6e show that the analysis produces non-obvious comparative statics.For instance, we find that the norm being sustained cannot be too cheap.Intuitively, if it was, the conflict of interests between parents and children would vanish, and this would eliminate the need for costly signaling.4   This is presumably the rationale why a school may end up dismissing a female employee who becomes pregnant while unwed.Recent high-profile (and controversial) instances of this practice can be found at http: //www.jewishpress.com/blogs/muqata/school-fires-unmarried-pregnant-teacher-the-whole-story/2013/03/05/and at http://www.huffingtonpost.com/2013/03/01/teri-james-pregnant-woman-fired-premarital-sex-christian-school_n_2790085.html.5   Further discussion of the types of norms that fit our analysis can be found in Section 7. 1.   6   In their entry on social norms in the Stanford Encyclopedia of Philosophy, Bicchieri and Muldoon (2014)  [6] report that one of the defining features of norms is that they can change rather abruptly, with a sudden and unexpected demise of old patterns of behavior.
While our signaling equilibrium generates interesting insights, any explanation for behavior X that relies on the existence of an equilibrium where X is a signaling device should be taken with caution.Critics might argue that almost any behavior might be rationalized by constructing a setup where failure to adopt that behavior would be interpreted in a sufficiently negative light by the receiver.A convincing account should therefore provide a more compelling rationale why X (rather than some other behavior) should act as a signaling device.To this aim, we construct a setup where signaling operates through an alternative mechanism-namely, costly rewards. 7Sure enough, we show that the same mechanism at work in a punishment equilibrium may also sustain a reward-based equilibrium.However, a reward equilibrium may emerge only under some conditions.We find that a scenario where the (marginal) cost of rewarding equals the (marginal) benefit from being rewarded, as in the case of monetary transfers, would not work.This is consistent with the observation that, in social interactions, cash rewards are seldom used-in fact, there is almost a taboo against them.
The next step in the analysis is to compare punishments and rewards by constructing a measure of robustness-would a punishment equilibrium be immune to people switching to costly rewards for signaling, and would a reward equilibrium be immune to people starting to use punishment instead of reward?We show that a reward-based equilibrium can generally be "invaded" by punishment, but not vice-versa.Although the reward equilibrium generates more welfare than the punishment equilibrium, it is also more fragile, since punishment is a more compelling signaling method than reward.Intuitively, this is because the cost of punishing decreases in the share of norm compliers-since there are less people to punish.Hence, by using punishment instead of, say, reward, a parent can credibly signal that the norm is sufficiently widespread, while at the same time incurring low signaling costs.By contrast, the cost of rewarding increases in the share of compliers-since there are more people to reward.This makes rewarding norm compliers a less compelling signaling device of the widespread norm than punishing norm breakers.

Related Literature
One possible explanation for costly punishment that has been offered by the literature is that people may have direct in-built preferences for punishing norm-deviators, which may outweigh the material costs involved in punishing. 8This explanation is especially plausible when the norms in question are the result of a joint process of evolution and acculturation, such as may for instance be the case for norms against incest.When this does not apply, however, the direct preference approach leaves some important questions unanswered, since it does not explain why people who exhibit a preference for engaging in costly punishment (and who thus suffer a material disadvantage compared to those who do not exhibit this preference) are not wiped out by evolutionary forces.
Another possible rationale that has been offered is that those who do not punish norm-breakers may themselves be seen as deviators and thus be punished-such as for instance in Akerlof (1976)  [8]. 9 Using Axelrod's (1986) [10] terminology, a norm for X typically requires the existence of a metanorm to sanction people who fail to do X, one to sanction people who fail to sanction those who fail to do X, and so on.The applicability of this explanation depends on the environment.Although norm breaking can often be observed directly (e.g., because an individual fails to dress/behave/speak in a manner that complies with the norm), the failure to punish norm-breakers-and to punish those who fail to punish norm breakers, etc.-are often much harder to identify. 10It is thus plausible that 7 Clearly enough, the absence of punishment may be seen as a reward, and, thus, the punishment equilibrium could be rephrased in those terms.However, in that case the activity of rewarding involves no cost (it is the absence of rewarding that is costly).By contrast, in Section 5 we consider rewards that are costly to those who implement them.
these metanorms may be sustainable in small communities, where monitoring is easy, but not in larger environments, where it is harder.The central theme of our analysis is that norms (or, more precisely, metanorms) may be the object of signaling.The notion that signaling-based explanations may help clarify apparent "behavioral puzzles" has been presented elsewhere, such as in Glazer and Konrad (1996), Ellingsen and Johannesson (2008)  and Hopkins (2014).Sliwka (2006) and Gneezy and Rustichini (2000)  [12-16] explicitly consider norm-signaling.Differently from those works, in our setup parental behavior informs children about the behavior they are likely to encounter in the future in the future.This shares similarities with Adriani and Sonderegger (2018), Adriani et al. (2018) and Kotsidis (2018)  [17-19].However, this paper studies the mechanics and the signaling value of punishment versus reward, and is thus very different in focus.
Our work also adds to the literature on intergenerational transmission of information and (self-)signaling.Bénabou and Tirole (2011) [20] consider an anticipatory-utility setup where punishment of deviant behavior helps the individual shield himself from negative anticipatory feelings.Other relevant works include B ėnabou and Tirole (2002, 2004 and 2006)  [21-23] and Dessí (2009) [24].These papers emphasize how anticipatory feelings or altruism towards future generations may generate selective memory.Bisin and Verdier (2000) [25], Corneo and Jeanne (2009, 2010)  [26,27], Cervellati and  Vanin (2013)  [28], Carvalho (2013)  [29] and Verdier and Zenou (2018)  [30] are also related to the present paper.However, these works focus on intergenerational transmission of values or cultural traits (rather than information).Also related is Van der Weele (2012) [31], who studies the signaling role of sanctions when the authorities have private information about the fraction of egoists in society.An important difference with our analysis is that, in our setup, strategic complementarities among parents play a central role, while in his model these complementarities are entirely absent since punishment is fully centralized.
The effectiveness of "stick" versus "carrot" as incentive devices and their interaction are clearly important for understanding norm-following, and have been extensively studied by the experimental literature. 11However, the theoretical literature has largely remained silent on the subject.The exception is Herold (2012) [33], who proposes a model of evolution of preferences that explicitly compares preferences for punishing with preferences for rewarding. 12The rationale why preferences for punishing eventually crowd out preferences for rewarding is that, when cooperation is sufficiently widespread, punishing non-cooperators becomes cheaper than rewarding cooperators.Our analysis supplements this intuition with a new, information-based effect.We show that, to successfully "invade", a signal (punish/reward) should not only be cheap, but it should also be credible.Punishment is a credible signal that the norm is sufficiently widespread because it naturally satisfies incentive compatibility: It is cheaper when the norm is widespread than when it is rare.By contrast, reward follows the opposite path. 13inally, our work is related to the literature that studies the interplay between history and norms-the most relevant recent examples for our purposes include Acemoglu and Jackson (2014), Rohner et al., (2013) and Bidner and Francois (2013)  [34-36]. 14A central theme of this literature is that the history of past play leaves a lasting legacy, by affecting behavior for a long time afterwards.This is also the case in our model.However, rather than it being a case of past history revealing information about the players' types, in our model the history of play itself is the object of signaling.By perpetuating the past, people can send information about past history (and, thus, indirectly, about the future, since in equilibrium the past shapes the future).
Similar to us, Acemoglu and Jackson (2014) [34] also assume that players cannot fully observe the history of past play.However, their paper has a rather different focus, since it does not study punishment.
The remainder of the paper is organized as follows.Section 2 introduces our key idea within a very simple setup.Section 3 describes the model, while Section 4 characterizes the punishment equilibrium.Section 5 considers the case where individuals use costly rewards to signal the dominant norm to their child.Section 6 investigates the robustness of the punishment equilibrium to the use of rewards, and the robustness of the reward equilibrium to punishment.Finally, Section 7 provides discussions and offers some final thoughts.

A Very Simple Example
Consider the following sequential game between a parent and his child.The parent moves first, and has two actions: Punish past norm-breakers, and Not Punish.Punishing norm-breakers is costly for the parent, while not punishing is not.The child moves second, after having observed the parent's move.He also has two actions: Follow (some unspecified costly social norm) and Not Follow.The parent possesses private information about the consequences that the child will face if he does not follow the norm.For a start, we take these consequences as being exogenously determined.There are two states of the world: (i) punishment state: if the child breaks the norm, he is punished by a third party and incurs a punishment cost; (ii) no punishment state: if the child breaks the norm, he incurs no punishment.The cost for the child from following the norm is positive but smaller than the cost he incurs if he is punished.Hence, if the child knows that norm-breaking will be punished by the third party, he finds it optimal to follow the norm.However, the child does not know the state of the world, and looks at the parent's behavior to gain information about it.The parent is altruistic towards the child.We now argue that the following may emerge in equilibrium:

•
The child follows the norm if he observes the parent punishing, breaks the norm otherwise.

•
The parent punishes in state (i) (when the third party punishes), does not punish in state (ii) (when the third party does not punish).
It is clear that, given the parent's strategy, the child's strategy prescribes optimal behavior.Consider now the parent.Taking the child's strategy as given, if the parent does not punish in state (i), then the child will break the norm and be punished.By contrast, if the parent punishes past norm breakers the child will infer that the state is (i) and will therefore follow the norm (and thus avoid being punished).For parameter values, it is clear that the parent may prefer the latter to the former, and would thus find it optimal to punish in state (i), even though punishing is a costly endeavor. 15n the equilibrium described above, the parent punishes past norm-breakers to signal to his child that, if he breaks the norm, he will be punished.Hence, punishment is motivated by signaling concerns.A limiting feature of this example is that the consequences the child will face if he breaks the norm are exogenous, rather than being derived as an equilibrium feature.However, it is possible to augment the model to allow for that.Suppose that, in each period t, a different parent-child pair plays the game described above.If the child breaks the norm, he may or may not be punished in period t + 1 (by the parent of the t + 1 parent-child pair).Payoffs are the same as before, except that the action of the "third party" is no longer exogenous.The third party is in fact the t + 1 parent, who faces a problem analogous to that of the t-period parent.Following the reasoning above, under some parameter restrictions the t-period parent will then choose to punish if he believes that the t + 1 parent is sufficiently likely to punish.Similarly, the t + 1 parent will choose to punish if he believes that the t + 2 parent is sufficiently likely to punish, and so on.
This simple example describes in a very crude way the type of story we have in mind.However, there are several issues that the example does not address.(1)  What is the exact nature of the information asymmetry between parents and children?(2)  Why should the parent use punishment as a way to communicate to his child?Would norm-following by the parent not be enough?Even more fundamentally, why can the parent not just "tell" the child what to do?
Our fully blown model addresses these issues.As we will see, a specific feature of the setup will allow us to address concern (2), namely, the presence of a paternalistic element in the parent's motives.Furthermore, it is quite clear that the logic we have highlighted would equally work if instead of focusing on punishment, we were to consider a reward-based system.The question then is: are there any differences between these two mechanisms?Our full-blown setup will allow us to meaningfully compare the signaling value of punishments and rewards for information-transmission across generations.

Model
We start off by constructing a model aimed at formalizing the intuition sketched in Section 2. As we will argue in Section 5, this setup can also be employed to study the case where signaling operates through rewards, rather than punishment.
Overlapping Generations We consider an environment populated by overlapping generations.Each period t is characterized by a continuum of active parent-child pairs.Period t parents select to follow or not the norm, and to punish or not those individuals who broke the norm in the previous period.Period t children observe their parent's behavior and decide whether to follow the norm.At t + 1, t parents and t children become purely passive and may be punished if they broke the norm in the previous period.Just before t + 2, t parents die.The t children observe the state of the world and recent history (more on this below) and become adults, i.e., the t + 2 parents, with one offspring each.
The following table summarizes our notation for the players' actions.n = 0: Youngster breaks the norm; n = 1: Youngster follows the norm.N = 0: Adult breaks the norm; N = 1: Adult follows the norm.M = 0: Adult fails to punish; M = 1: Adult punishes, where the letter "M" is a mnemonic for "metanorm".Timing Timing is illustrated in the diagram below (Figure 1).Note that if an individual breaks the norm at t he faces punishment at time t + 1. Intuitively, this reflects the notion that punishment often takes the form of being confronted with ostracism or bias in future interactions.For instance, someone who gets drunk and behaves obnoxiously at a party will probably find his party invitations greatly reduced in the future.Similarly, someone who fails to respect religious practices may find himself socially shunned in future exchanges with members of his religious community, and so on.This lag in punishment implies that period t adults punish people who broke the norm at time t − 1.In turn, the children who followed/violated the norm at t − 1 (and who are punished at t) become adults (and potential punishers) at t + 1.
Cost of punishing An individual who punishes norm-breakers incurs a cost.A natural benchmark is the case where punishment takes the form of ostracism, in which case the cost incurred arises from the missed opportunity of forming an economically beneficial link.Moreover, as argued by Elster  (1989)  [38], expressing disapproval is always costly-at the very least, it requires energy and attention that could have been used for other purposes.One might also alienate or provoke the target individual, at some cost or risk to oneself.Letting the mass of norm-breakers be denoted as b ≥ 0, the total cost of punishing is given by bθ + ε, where θ > 0. Punishment is more expensive the greater the mass of people who must be punished, but also involves a fixed cost, ε which we assume is strictly positive and possibly very small.One way to think about this small fixed cost is as a shortcut to reflect the presence of a small share of "behavioral" agents who always violate the norm. 16As a result, a parent who chooses to punish will always incur a positive cost, even if b = 0.This makes it clear that our results do not rely on the cost of punishing being exactly zero in any state of the world.
Following the norm is costly and involves a positive cost.In what follows, when making welfare statements, we will focus on the benchmark case in which the behavior prescribed by the norm generates no externalities, although we will discuss how the presence of positive/negative externalities may affect our results when appropriate.

Availability of the punishment option
We allow for the possibility that, for some parents, punishment may actually be unfeasible.Accordingly, we assume that there are two states of the world.
with probability γ, the state of the world is l: the punishment option is available for all parents.-with probability 1 − γ the state of the world is h: the punishment option is unavailable to a share q ∈ (0, 1) of parents and is available for the remaining 1 − q share.
For simplicity we assume that the state of the world is the same in all periods.Parents observe the state of the world, while children do not.A natural benchmark is the case where γ is arbitrarily close (but not equal to) to 1, implying that state l is "almost certain".
Adults We introduce the following additional notation: m t is the mass of punishers in any given period t, and Θ is the (marginal) cost of being punished.
The material payoff of a t-period adult is summarized below.
Follow the norm (N i = 1) Do not follow the norm The discount factor is implicitly set equal to one.Throughout the analysis, we assume that Θ − c > ε.Note that Θ > c is actually a necessary requirement for punishment to be an effective incentive device for norm-following.The adult's direct material payoff can equivalently be expressed as Note that the direct payoff (1) provides an incomplete description of individual welfare, since it omits the utility that an individual derives from altruistic concerns towards his child.A full description of individual welfare is provided below, in (3).
Information Before selecting whether to follow the norm and punish, adults observe m t−1 and b t−1 , namely the mass of adults (from the previous generation) who punished norm deviators in the previous period, and the mass of norm-breakers in the previous period.They also observe the state of the world (l or h) and, in state h, they observe whether, for them, the punishing option is available or unavailable.The realizations of m and b in periods before the last cannot be observed.
Children Each adult i is endowed with one child, denoted by i. Children do not observe (m t−1 , b t−1 ), the state of the world (l or h), or whether the punishment option is/is not available to their parents.However, each youngster perfectly observes his parent's actions (N i , M i ).After observing his parent's actions, he chooses n i ∈ {0, 1} (violate the norm/follow the norm).The cost of following the norm for children is c, the same as an adult's.
If a youngster does not follow the norm, he becomes the object of social punishment. 17We assume that the stigma associated with norm-breaking does not carry through adulthood.Hence, as adults, all individuals start off with a clean slate, independently of how they behaved as youngsters. 18he payoff of a t-period child is as follows: where δ < 1 indicates the child's discount factor.This can equivalently be expressed as Note that we assume that children discount the future more heavily than adults. 19This reflects the idea that age affects preferences and choice.Robson and Samuelson (2007, 2009)  [40,41] study the evolution of discount rates and find that, under some natural conditions, these should fall with age.
Adults' Utility Adults are altruistic towards their children, but this altruism is "impure."More specifically, we assume that parents evaluate the child's payoff using their own (higher) discount factor-although they are aware that this differs from the child's discount factor.This generates what Doepke and Zilibotti (2014) [42] call a paternalistic element in the parents' motives, which is standard in models of parent-child interactions. 20The total utility of an adult i is thus given by As will become clear below, parental paternalism introduces a conflict of interests between parents and children, which motivates the use of costly signaling for information transmission.Intuitively, if the interests of parents and children were perfectly aligned then parents could just "tell" children what to do (and children would always find it optimal to follow what their parents tell them).The existence of a conflict of interests eliminates this possibility, since parents have an incentive misrepresent their information.The use of costly signaling is then necessary to address this credibility problem.

Punishment Equilibrium
We concentrate on Perfect Bayesian Equilibria.The equilibrium concept in our setup thus satisfies the following requirements. (1) Children update their beliefs from parent's action using Bayes rule whenever possible.(2)  At each information set, each player's strategy specifies optimal actions, given his beliefs and the strategies of the other players In what follows, we concentrate on pure strategy symmetric equilibria.We also restrict attention to equilibria that are stationary, in the sense that all individuals follow strategies that independent of calendar time.Finally, to simplify the analysis, we will restrict attention to equilibria in which: Refinement 1. Children who observe their parent violating the norm never follow the norm.
Imposing this refinement allows to rule out counterintuitive scenarios at the outset.Note however that the equilibrium we characterize would continue to hold even if Refinement 1 was lifted.Moreover, its key property, namely that those who punish do so to signal that norm-breakers are likely to be punished, would arise more generally also in other possible equilibria.However, a full characterization would be lengthy and, we believe, not very illuminating.
The crucial feature of our setup is that past history of play cannot be perfectly observed.Agents active at t have a common prior over history up to that period.We impose a "grain of truth" restriction on the prior.In particular, we assume that prior beliefs at t assign a strictly positive probability to the actual (real) realization of history up to that period.Adults also obtain some direct evidence of past history, since they observe (m t−1 , b t−1 ), namely how much norm breaking and how much punishing there was in the previous period.A strategy for parent i maps (m t−1 , b t−1 ) and the state of the world (l or h) into (N i , M i ).In contrast, children have no direct information about past history, except through their parents' actions.A strategy for child i maps (N i , M i ) into n i .From (2), the child's optimal action at t depends on his expectation of m t+1 , namely the share of punishers at t + 1.The child will decide to follow the norm if and only if21 Our first lemma establishes a benchmark result.
Lemma 1.The game has a history-independent equilibrium where nobody follows the norm, and nobody punishes.
Proof.In Appendix A.
Clearly enough, if nobody punishes norm-followers, then norm-following and punishing are dominated actions.We now consider a more interesting scenario, where norm-following and punishing emerge in equilibrium.By Refinement 1, we are restricting attention to environments where the posterior beliefs about m t+1 of a child who has observed his parent breaking the norm induce him to break the norm too.It remains to address the child's posterior beliefs when he observes his parent following the norm.Lemma 2. In an equilibrium with punishment, the following must hold.A child's expectation of m t+1 when he has observed his parent both following the norm and punishing at t must exceed that of a child who has observed his parent following the norm but not punishing.

Proof. In Appendix A.
For punishment to emerge it must be that, by observing his parent punishing, the child gains useful information about the likelihood of being met with future punishment if he breaks the norm.However, note that punishment conveys information about (m t−1 , b t−1 ), the parent's private information.This raises the question: how can information about the past be useful to predict the future?For this to occur, there must be a link between the past and the future, so that, by gaining information about the past from the actions of their parents, children are able to make inferences about the future.In other words, past history must have a bearing in shaping the future.
The next result says that the information conveyed by parental punishment must actually be determinant for the child's choice.Lemma 3. (Strategy A.) In an equilibrium with punishment, the following must hold.A child who has observed his parent both following the norm and punishing will follow the norm.A child who has observed his parent following the norm but not punishing will break the norm.

Proof. In Appendix B.
Intuitively, if a parent could induce his child to follow the norm by simply following the norm himself, he would never select to punish, as this involves unnecessary punishing costs.Since all parents would follow this reasoning, punishment would disappear.This rules out that, in a punishment equilibrium, children may decide to follow the norm whenever they observe their parent following the norm, with no concern for punishing behavior.
Suppose that Refinement 1 holds, and that children follow strategy A (described in Lemma 3).We now compute the payoffs that a parent active at time t may obtain from each of the available action-pairs.First, the payoff from following the norm and punishing-namely, (N i , M i ) = (1, 1)-is given by This follows since a parent selecting (1, 1) will induce his child to follow the norm.Note that history has a direct bearing on payoffs, and is thus more than a simple correlation device.This is because the cost of punishing is higher the higher the share of past norm breakers.
Second, the payoff from (N i , M i ) = (1, 0)-namely, following the norm and failing to punish-is since in that case the child will not follow the norm and will thus be punished by all those adults at t + 1 who choose to punish.The expression m t+1 (m t−1 , b t−1 ) gives the value of m t+1 conditional on the information at the parents' disposal, namely m t−1 and b t−1 . 22hird, the payoff from (0, 0)-namely, breaking the norm and failing to punish-is since in that case both the parent and the child will be punished at t + 1.
Finally, the payoff from (N i , M i ) = (0, 1)-namely, breaking the norm and punishing-is Children We are now able to describe more precisely how children process the information conveyed by parental actions.Children start with a prior about the history of past play.Upon observing their parent's behavior (N i , M i ), they combine their prior information with the information conveyed by parental behavior to form posterior beliefs about the pair (m t−1 , b t−1 ).Since parental strategies depend on recent history, this posterior induces a probability distribution over (m t , b t ), which in turn determines a probability distribution over m t+1 , which is the variable of interests to children-since it determines the payoff from norm-breaking.Intuitively, children look at parental behavior to gain a clue about recent history.This matters, since recent history affects adult behavior in the following periods.

Steady States
We concentrate on equilibria where economy is in a steady state: The optimal reply by parents to To characterize the possible steady states, we need to check for incentive compatibility.For instance, for m * = 1, b * = 0 to be a possible steady state, we need (5) to exceed ( 6)-( 8) whenever we impose the steady state conditions b t−1 = b * = 0 and m t+1 = m * = 1.The following lemma characterizes the possible steady states that may emerge.Lemma 4. Suppose that children follow strategy A described in Lemma 3. The possible steady states are: High-punishment: All parents follow the norm and (if they can) punish.As a result, in state l: Nobody punishes, nobody follows the norm.
Proof.In Appendix A. Lemma 4 shows that there are two possible steady states: a no-punishment, essentially equivalent to the history-independent equilibrium, and a high-punishment, where all parents follow the norm and punish norm-breakers (if they can).This multiplicity arises because parent-child signaling generates strong complementarities among parents: People punish norm-breakers because this is what others in society do, and vice-versa.[Further discussion of the mechanism that underpins punishment is provided below, when discussing Proposition 1.]It is important to note that, as mentioned in the proposition, the actual share of punishers and norm-breakers in steady state may actually take three different values.More specifically: -If the steady state is high-punishment and the state of the world is l: m * = 1, b * = 0. We denote this scenario as regime H.
-If the steady state is high-punishment and the state of the world is h: m * = 1 − q, b * = q.
We denote this scenario as regime I (for intermediate).
-If the steady state is no-punishment: m * = 0, b * = 2.We denote this scenario as regime L.
We now turn to children's expectations.As a first approximation, it is useful to assume that children have rational expectation, and thus know that the economy is in a steady state (although they do not know which one).Prior beliefs thus assign positive probability to at most three possible regimes.We let the children's prior assigns a probability p H to regime H, probability p I to regime I and probability and p L = 1 − p I − p H to regime L. We assume that both p H and p I are strictly greater than zero.The children's posterior beliefs are then well defined and can be computed from Bayesian updating: To refine the child's out-of-equilibrium beliefs, we apply Cho and Kreps (1987) [44] Intuitive Criterion: Following an out-of-equilibrium move by the parent, a child assigns zero probability to the deviation emanating from a regime in which the deviation is equilibrium dominated.Proposition 1. Suppose that both p H and p I are strictly greater than zero and that Then punishment may emerge in equilibrium.In the refined steady state equilibrium with punishment, the following holds.

(i)
Children follow the norm if and only if they observe their parent punishing (Strategy A). (ii) Adults follow the norm and punish (if they can).
Proof.The remainder of the proof can be found in Appendix A.
In what follows, we will use the term "punishment equilibrium" to indicate the steady state equilibrium that features punishment.Proposition 1 highlights the role of history in sustaining norms and punishment meta-norms.People look at recent shared history to form their expectations of the future.Adults active at t who have been exposed to a recent history containing punishment have an incentive to punish because they know that other t adults will punish, since they have observed the same shared history as they have.The behavior of other t adults matters because, in the aggregate, it shapes history at t + 1.If punishment occurs at t, recent history in t + 1 will feature punishment.In turn, this will induce people in t + 1 to punish as well (since they correctly expect that their shared past will be projected into the future).In sum, people punish norm-breakers to pass information about the past to the younger generation.This creates a link between the past and the future which ensures that information about the past is important (since the past shapes the future).
In our analysis, the link between past, present and future is crucial to sustain norms.If this link were to break up (as e.g., in the case of a major disruption such as a war) this might start a process of unraveling that may work very quickly.This remark is also consistent with the observation that, sometimes, seemingly irrelevant events may have rapid and lasting consequences for norms (Bicchieri and Muldoon 2014 [6]).Note that this role of history and continuity is entirely absent from other accounts, such as e.g., preference-based theories of punishment.Indeed, in their survey article, Bicchieri and Muldoon [6] conclude that 23 Studies as disparate as the analysis of Prohibition support, racial integration [and] the sexual revolution in the 1960s (...) all lend credibility to a model of norms grounded on individuals' (...) expectations of what others will do (...).
Finally, note that, although we do not explore this explicitly, norm-following and punishment may survive only in sufficiently homogeneous environments, where parents share a similar understanding of the prevailing norm and metanorm.Intuitively, that is required to ensure that parental behavior is informative about the behavior of other people in society.This shares similarities with Adriani and Sonderegger (2009) [45]and suggests a possible rationale for favoring homogeneous societies, namely that they may make punishment-sustained pro-social norms easier to maintain. 24n the benchmark case where γ, the probability that the state of the world is l, is arbitrarily close to (although strictly smaller than) one, p I is arbitrarily small and, thus: As a result, condition (a) simplifies to This has the advantage of being independent of prior beliefs (which are very difficult to measure).Note that, although children know that state h is extremely unlikely, upon observing (1, 0) they will necessarily conclude that society finds itself in regime I, since this is the only scenario where (1, 0) may emerge as an equilibrium move.
We now turn to welfare.From Lemma 1, the game supports a history-independent equilibrium where nobody follows the norm and nobody punishes in any period, and where all players obtain a payoff of zero.By contrast, the punishment equilibrium supports payoffs that are below that amount.In fact, if we consider the benchmark case where norm-following generates no direct externalities (and is thus purely wasteful), we can show that, Proposition 2. In any punishment equilibrium material welfare is lower than in the history-independent equilibrium.

Proof. In Appendix A.
Intuitively, this follows because punishment is a very wasteful activity, since it imposes costs on both the punisher and the individual being punished.Clearly enough, however, this observation may be reversed if we were to consider norms that generate positive externalities (such as norms of public good provision).Proposition 2 highlights that positive externalities are a necessary (although not sufficient) condition for a regime featuring punishment to generate higher welfare than the history independent equilibrium.As we will see below in Proposition 4, this result stands in contrast with what happens when reward rather than punishment is used as a signaling device.
We now turn to the conditions that underpin the punishment equilibrium.First, the condition p H +(1−q) 2 p I p H +(1−q)p I δΘ > c implies that, upon observing his parent following the norm and punishing, a child believes that regime H is sufficiently likely to induce him to follow the norm.Although children fear punishment less than parents, they nonetheless find it optimal to follow the norm if they believe that punishment is sufficiently widespread in society.Consider now a child who observes his parent following the norm but not punishing-in which case the child will infer that society is in regime I. Condition c ≥ δΘ(1 − q) ensures that, in that case, the child will not find it worthwhile to follow the norm.Third, the condition Θ(1 − q) − ε − θq > c ensures that parents who can punish actually choose to punish in regime I (as well as regime H), to induce their children to follow the norm.This creates a wedge between the child's and the parents' views.Intuitively, Θ(1 − q) − ε − θq > c implies Θ(1 − q) > c.Hence, in the I regime, parents would like their children to follow the norm, since the (parental perception of the) benefit this involves in terms of spared punishment outweighs the cost of norm-following.However, as we have seen, if the child knew that the regime is I he would not follow the norm.The following corollary summarizes the nature of the conflict of interests between parents and children in the punishment equilibrium.
Corollary 1.In the punishment equilibrium, the conflict of interests between parents and children takes the following form: Parents would like their children to follow the norm in both the I and the H regimes.By contrast, children (if they had perfect information) would like to follow the norm only in the H regime.
Note that for the conflict of interests to arise, the child's discount factor (i.e., δ) should be sufficiently smaller than the parent's discount factor (i.e., one).If δ = 1, condition (a) fails and punishment does not emerge.
Robustness to direct communication Corollary 1 helps us address one of the questions we presented in Section 2, namely, why can the parent not just "tell" the child what to do? Suppose that, instead of punishing norm deviators, a parent may just send a message (involving a positive but arbitrarily small cost) to his child, urging him to follow the norm. 25Formally, let M ∈ {0, 1, 2}, where M = 0 corresponds to "not punish and not send the message", M = 1 corresponds to "not punish but send the message.",and M = 2 corresponds to "punish and do not send the message."Starting from an equilibrium where communication is not used (i.e., nobody selects M = 1), would a parent have an incentive to deviate and use direct communication instead of punishment? 26 In the L regime, sending the message is equilibrium-dominated for the parent.Hence, if we refine out-of-equilibrium beliefs using a standard refinement such as Cho and Kreps's Intuitive Criterion, the child should rule this regime out.By the same token, it is clear that the parent might have an incentive to send the message both in the H and the I regimes, if he thought the message sufficiently likely to succeed.Note however that the parent's net return from the deviation in regime I exceeds that in regime H. 27 This follows since the gross payoff from the deviation is the same in both states, while the parent's equilibrium payoff is actually higher in regime H, implying that the deviation is less appealing.There are thus three cases that may arise.The deviation is either (a) equilibrium-dominated in both regimes, or (b) profitable in both regimes, or (c) it is profitable in the I regime and it is equilibrium-dominated in the H regime.In case (a) the Intuitive Criterion does not pin down the child's posterior beliefs following the deviation.In case (b), the Intuitive Criterion establishes that, following the deviation, the child may assign positive probability to I and/or H.In case (c), the Intuitive Criterion establishes that, following the deviation, the child should assign probability one to the I 25 If the message's cost is exactly zero then the argument is strengthened, since the child cannot rule out the message emanating from the L regime. 26Note that the parent has no incentive to deviate by both punishing and sending the message since this would involve unnecessary costs (recall that in the equilibrium we are considering the act of punishing is sufficient to induce the child to follow the norm). 27This stands in contrast with the net returns from punishing, which are actually greater in regime H: the net gain from choosing (1,1) over (1,0) Another difference is that, while the child's beliefs following the deviation are refined through the Intuitive Criterion, his beliefs following the equilibrium move (1,1) are given by standard Bayesian updating, and assign probability π I (1, 1) < 1 to regime I, as shown in (9).
regime.Overall, we conclude that, if the child believes that the out-of-equilibrium move originated from regime I with probability one, this is consistent with the Intuitive Criterion.The child's best reply is then n (1,2) = 0, which of course makes the deviation suboptimal for the parent.The punishment equilibrium is thus robust to direct communication.Note that this argument can be seen as a special case of the analysis provided in Section 6.1 (this is formalized in the Proof of Proposition 6).
Comparative statics Proposition 1 highlights interesting comparative statics.The second inequality in condition (a) requires c ≥ δΘ(1 − q). ( In other words, the cost of following the norm cannot be too low.This may at first appear strange.If a norm is cheaper than another, how could it ever be harder to sustain?The intuition is subtle.If the cost of following the norm is very low, this would eliminate the conflict of interest between adults and youngsters and would thus render signaling redundant.More specifically, if the cost of norm following is very low, then the child would find it optimal to follow the norm even society is in the I regime.By simply following the norm, a parent could then signal that the regime is I, and thus induce the child to follow the norm with no need for additional signaling.This would trigger a process of unraveling which would destroy the use of punishment (and norm-following) in equilibrium.
Note that the RHS of ( 12) is an increasing function of Θ, the cost of being punished.This implies that a higher cost of being punished raises the lower bound on c that needs to be met for punishment to emerge in equilibrium.The reason is again linked with the necessity of a conflict of interests between parents and children.
These comparative statics set our model apart from other possible explanations for punishment, such as the existence of direct preferences for punishing (which actually delivers no comparative statics at all), or the idea that people may punish to avoid being punished themselves.

Reward Equilibrium
In this section, we construct a setup where signaling operates through an alternative mechanism that has been presented by the literature as relevant to sustain norms (e.g., Herold 2012 [33]), namely rewards.At first glance, a full-blown analysis of this case may appear redundant.After all, one may see the failure to punish someone as a "reward", and the failure to reward someone as a "punishment".This intuition suggests that there may be a one-to-one correspondence between the two cases.However, this analogy is misleading.To see this, suppose that rewards are costly to give-if they were not, their use to incentivize norm-following would be trivially sustained.In this case, then the act of punishing someone by failing to reward them would actually make the punisher better off, since it would save him the cost of rewarding. 28This shows that the analogy between punishing and rewarding fails if we are interested in comparing the use of costly rewards and costly punishments as possible incentive devices for norm-following.

Setup
The setup mirrors that of Section 3 in most respects.Let M i ∈ {0, 1} indicate whether an individual follows the metanorm of rewarding (M i = 1) or failing to reward (M i = 0) norm-followers.Similar to existing literature (Herold 2012) we model the act of rewarding as zero-one.The payoff from (N i , M i ) for a t parent is equal to where 2 − b t−1 is the mass of t − 1 parents and children who followed the norm at t − 1 (and who are being rewarded at t) and m t+1 is the mass of t + 1 parents who reward norm followers.Note that if an individual follows the norm at t, he is rewarded at time t + 1. Intuitively, the type of rewards we have in mind take the form of receiving costly favors at some point in the future.Examples include preferential treatment when looking for a job or asking for a loan, or more generally receiving help when the need arises.This lag implies that time t adults reward people who followed the norm at time t − 1.In turn, the children who followed/violated the norm at t − 1 (and who are rewarded at t) become adults (and potential rewarders) at t + 1.The marginal benefit from being rewarded for following the norm is R. We assume that R > c to ensure that rewards can be used as an effective incentive for norm-following.The parameter r > 0 captures the marginal cost of rewarding an additional individual.Availability of the rewarding option Similar to the case of punishing, we allow for the possibility that, for some parents, rewarding may be unfeasible or prohibitively costly.There are two states of the world.
with probability γ, the state of the world is l: the rewarding option is available for all parents.-with probability 1 − γ the state of the world is h: the rewarding option is unavailable to a share q ∈ (0, 1) of parents.
Children For children, the payoff from n i is given by where δ < 1 is the child's discount factor (as in Section 3).We thus maintain the assumption that children discount the future more heavily than parents.Parents are altruistic but exhibit imperfect empathy.The total utility of an adult i is thus given by Timing The timing of events is as follows (Figure 2)

Signaling
Similar to Section 4, we restrict attention to environments where Refinement 1 holds.Similar to the case of punishment, it is easy to show that, in steady state, there are three possible regimes: -H regime: all parents follow the norm and reward norm-followers.
-I regime: all parents follow the norm but only a share 1 − q rewards since the remaining share q faces prohibitive costs of rewarding -L regime: nobody rewards and nobody follows the norm.
The following proposition identifies the conditions for reward to feature in a refined steady state equilibrium.Proposition 3. Suppose that p H and p I are strictly greater than zero and Then rewarding behavior may emerge in equilibrium.In the refined steady state equilibrium with reward, the following holds.

(i)
Children follow the norm if and only if they observe their parent rewarding.(ii) Adults follow the norm and reward (if they can).If condition (b) fails, then the unique refined steady state equilibrium is one where nobody follows the norm and nobody rewards.

Proof. See Appendix B.
In what follows, we will use the term "reward equilibrium" to indicate the steady state equilibrium that features reward. 29ondition (b) shares many similarities with condition (a).For instance, it shows that, for a reward equilibrium to emerge, the cost of following the norm cannot be too low, similar to what we found in the case of punishment.Another noteworthy feature of condition (b) is the requirement R − 2r > c.Since c > 0, this requirement implies R > r, i.e., the value of the reward for the person who receives it should exceed the cost for the rewarder.This arises from the parent's incentive compatibility constraint in the H regime.If it did not hold, then the parent would always be better off by selecting (1,0), i.e., follow the norm but not reward, instead of (1,1), i.e., follow the norm and reward.Rewarding would therefore be a dominated strategy. 30Intuitively, this is because in that case the costs of rewarding norm-followers would be too high to be worth incurring.
This has implications for the type of reward that may emerge.An arrangement where people reward by donating resources they have been abundantly endowed with and are rewarded by receiving goods they have been scarcely endowed with may fit this description. 31By contrast, rewards that are costly for those bestowing them but have little value for the receivers would not work.The same would be true of an environment where the cost of rewarding equals the benefit from being rewarded, so that r = R, as for instance may be the case if rewards take the form of monetary transfers.This may help explain why, in social exchanges, people tend to reward others in kind rather than through money. 32Differently from the punishment equilibrium, in the case of rewards.Lemma 5.In any reward equilibrium, the adult's material payoff (i.e., calculated by ignoring the child's welfare) is always larger than zero.

Proof. In Appendix A.
The direct implication is that, even in the benchmark case where norm-following generates no direct externality (and is thus purely wasteful from a welfare viewpoint), the following holds.Proposition 4. In any reward equilibrium aggregate material welfare is higher than in the historyindependent equilibrium.

Proof. In Appendix A.
Hence, in stark contrast with the punishment case, here any regime that features norm-following is necessarily welfare-superior to the regime where the norm is ignored (and no rewarding takes place).This happens despite the fact that the norm is actually socially wasteful.The intuition is that, as discussed, the net benefits created by the act of rewarding compensate for the welfare loss generated by costly norm-following.Clearly enough, Proposition 4 continues to hold if we allow for direct positive externalities of norm-following, but may cease to hold if we consider a norm which generates sufficiently strong negative direct externalities.

Robustness
This section studies the robustness of the punishment and reward equilibria characterized in Propositions 1 and 3.More specifically, we wish to test whether punishment equilibria are robust to people starting to use rewards instead, and whether reward equilibria are robust to people starting to use punishment.We also wish to establish whether there are asymmetries, i.e., whether one signaling mechanism is somehow more robust than the other.

Robustness of Reward Equilibrium to Punishment
We start off by considering the robustness of the reward equilibrium to deviations involving punishment.Suppose that, instead of rewarding norm followers, a parent may select to punish norm deviators.Formally, let M ∈ {0, 1, 2}, where M = 0 corresponds to "not reward, not punish", M = 1 corresponds to "reward, not punish", and M = 2 corresponds to "not reward, punish."We wish to ask the following question.Starting from a reward equilibrium (i.e., where no parent selects M = 2), and keeping the behavior of all other parents fixed, when would an individual find it optimal to (unilaterally) use punishment rather than reward as a signaling device?As in the previous sections, we apply the Intuitive Criterion.To evaluate whether a deviation may possibly be profitable for a parent, we consider the standard case of unilateral deviations.As in Section 3, we let the cost incurred by punishing be given by θb + ε.
The first observation is that, in the L reward regime, deviating cannot possibly bring any benefits, since the parent does not actually want his child to follow the norm.By contrast, deviating may possibly be beneficial in the H and I reward regimes, since in those cases the parent does want the child to follow the norm (and ideally would like to achieve that by incurring the smallest possible signaling cost).Consider now a unilateral deviation that consists in following the norm and punishing norm-breakers.Suppose that the deviation induces the child to follow the norm.The key remark is that this deviation would deliver a higher payoff in the H reward regime than the I reward regime (since, in the latter, more people break the norm and should therefore be punished).Moreover, the difference between the two payoffs is increasing in θ, the marginal cost incurred by punishing.Compare now the payoff obtained from deviating with equilibrium payoffs.Clearly enough, if ε is small, a successful deviation would always be profitable in the H reward regime.Moreover, if θ is sufficiently high, we can ensure that the deviation in the I reward regime is equilibrium-dominated.In other words, we can always find suitable parameter values that would induce parents to deviate only in the high-reward state.The deviation is thus a credible signal that the regime is high-reward.Figure 3  Proposition 5.For each r > 0 there exist values θ * and ε * > 0 such that, if θ > θ * and ε < ε * , the reward equilibrium of the augmented game fails the Intuitive Criterion.
Proof.See Appendix A.

Robustness of Punishment Equilibrium to Costly Rewards
We now consider the robustness of the punishment equilibrium to deviations involving costly rewards.Let M ∈ {0, 1, 2}, where M = 0 corresponds to "not punish, not reward", M = 1 corresponds to "punish, not reward", and M = 2 corresponds to "not punish, reward norm followers."As in Section 5, we let he marginal cost incurred by rewarding norm followers be given by r.
Clearly enough, in the no-punishment regime, deviating is always dominated.By contrast, deviating may possibly be beneficial in the H and I punishment regimes (since in those regimes the parent actually wants the child to follow the norm).The second observation is that the payoff a parent may obtain by deviating and using rewards is always lower in the H punishment regime than in the I punishment regime.Intuitively, this is because in the H regime more people follow the norm and must therefore be rewarded. 33Consider now the parent's equilibrium payoff.It is straightforward to show that the equilibrium payoff in the H punishment regime is always higher than in the I punishment regime (since in the latter there are more people who violate the norm and must therefore be punished).This implies that, if a parent deviates in the H punishment regime, he will also find it optimal to deviate in the I punishment regime.The Intuitive Criterion is thus consistent with the belief that the deviation originated in the I punishment regime.As a result, the child will find it optimal to break the norm upon observing the deviation, thus rendering the deviation suboptimal.The punishment equilibrium thus satisfies the Intuitive Criterion.This is summarized in the next proposition.Proposition 6.In the augmented game, the punishment equilibrium satisfies the Intuitive Criterion for all r > 0.
Proof.See Appendix A.

Discussion
The intuition for the results in this section can be summarized as follows.Reward does not destabilize the punishment equilibrium because the payoff obtained when using reward out-of-equilibrium (i.e., when everyone else uses punishment) is decreasing in the share of norm-followers.Out-of-equilibrium reward is thus ill-suited to effectively signal that norm compliance is high.By contrast, the payoff obtained when using punishment is always increasing in the share of norm-followers.Punishment is thus a compelling tool for signaling high compliance.
It is important to highlight that the punishment equilibrium is robust to reward no matter how low reward costs may be.To see this, consider an extreme scenario, where reward costs are arbitrarily close to zero.In that case, a successful deviation (i.e., that induces the child to follow the norm) involving rewards always yields a higher payoff than the equilibrium move.However, the parent has an incentive to deviate both in the H-and the I-punishment regimes.In other words, a deviation involving rewards cannot credibly signal that the regime is high punishment.This is the rationale for the punishment equilibrium's robustness.It is straightforward to see that a similar rationale also guarantees robustness in the case where the parent may deviate by selecting an alternative signal whose cost is independent of the share of norm-followers-e.g., because it involves a fixed cost φ, for any φ ≥ 0.
Consider now a deviation from the reward equilibrium that involves punishment.If the fixed cost of punishing (ε) is sufficiently small, the deviation payoff exceeds the equilibrium payoff in the H-reward regime.However, as we have argued, this is only part of the argument.The additional crucial ingredient is credibility.The payoff from out-of-equilibrium punishment is higher in the H-reward regime than the I regime.For θ high enough, this payoff increases so steeply in the share of norm-followers that it cuts the equilibrium payoff from below, as in Figure 2, thus credibly signaling that the regime is H.In turn, this ensures that, upon observing this deviation, the child finds it optimal to follow the norm, thus rendering the deviation optimal. 34

Further Discussion and Concluding Remarks
In this section, we briefly provide further discussions of our results and their implications.

Which Norms?
Our mechanism continues to work even if we consider norms that do not impose a net individual cost to all, at least as far as adults are concerned.Consider for instance "good lifestyle" behaviors, in matters such as drinking, personal hygiene, sexual promiscuity, eating habits and so on.These behaviors often involve a trade-off, since they require effort, but they also generate benefits, both in the present and in the future (for instance, in the form of improved health).If youngsters discount the future more heavily than adults, they will value the delayed benefits less.Alternatively, adopting the good behavior may require greater effort for youngsters than for adults.It is then possible that, while for parents the net cost of following the good lifestyle norm may be negative (i.e., the benefits more than offset the costs), the opposite may hold for youngsters.Suppose then that we modify our setup as follows.The net cost for parents of following the norm is c parents < 0, while the net cost for youngsters is c children > 0. 35 Clearly enough, our results still apply.In this modified setup parents always want the child to follow the norm, even in the no punishment regime (since, in their eyes, the health benefits from following the norm more than justify its costs).The conflict of interests between parents and children is thus even stronger.

Infinitely Repeated Games
Many "folk-theorem related" models consider environments where deviations trigger a punishment phase.In the most commonly used framework, punishment consists in a permanent reversion to the static Nash equilibrium (see e.g., Friedman 1971 [49]).Clearly enough, in these models, the problem of sustaining punishment does not emerge (since the punishment phase constitutes a Nash equilibrium).Our work concentrates on environments where the existence of punishment cannot be justified on these grounds.Moreover, in models that use Nash reversion as punishment, equilibrium payoffs are always weakly higher than those of the static Nash equilibrium.This stands in contrast with our results where, as we have seen, in the punishment equilibrium people may end up worse off than they would be in the history-independent equilibrium.The history-independent equilibrium is the unique equilibrium that would emerge if the game were finite (and adults knew their position with respect to the last period). 36The literature on repeated games has, to be sure, identified scenarios where the payoffs obtained in the repeated game may fall below those of the static Nash equilibrium.However, these typically involve so-called "carrot and stick" finite punishment cycles, where the punishment phase is followed by a reward phase (see, e.g., Abreu 1986 [50]).Even leaving all other differences aside, it is clear that our story is fundamentally different.

More General Parental Utility
In our setup, the parent's utility is given by the sum of his own payoff and the child' s payoff (evaluated using the parent's discount factor).This is however a special case of a more general setup where the child's payoff is assigned a weight β ∈ R >0 in the parent's utility, so that (7) becomes It is easy to verify that this would not affect our findings.In particular, a punishment equilibrium continues to exist, but the necessary and sufficient condition for this is slightly more involved.Condition (a) becomes which of course collapses to (a) when β = 1. 37A similar observation applies to the reward equilibrium.
In fact, all our results (including those on robustness outlined in Section 6) would carry through for all values of β ∈ R >0 , as well as the limiting case β → ∞-implying that the parent's concern for the child's payoff is infinitely higher than his concern for his own payoff. 38

Concluding Remarks
Recently, a literature on endogenously derived preferences has sprouted within economics. 39his literature derives preferences from first principles or ultimate causes, and thus complements the behavioral literature, which focuses on proximate causes. 40We believe this is an important agenda.One of the advantages of building models of ultimate causes of behavior is that it generates sharper predictions and comparative statics.We hope that this paper may contribute to this literature by showing that, in an environment with asymmetric information, costly punishment of norm-breakers may emerge optimally as a mean to transmit information to the next generation.
Finally, while the model assumes that the marginal cost of punishing is fixed, it is possible to think of environments where this does not hold.For instance, if punishment takes the form of social boycott, then the opportunity cost incurred (in terms of lost networking opportunities or sources of information) when socially excluding a norm-breaker may be lower if others also exclude him.By contrast, the opportunity cost incurred by refusing to, say, offer employment to a past norm-breaker will be higher if others practice the same policy (since in that case he would have lower outside options and would therefore be willing to work for less).This suggests that some forms of punishment (e.g., social boycott) may be more likely to be effective signaling devices than others (e.g., refusing employment).Future work might be devoted to refining this intuition. 37The requirement Θ − ε > c is not explicitly included in condition (a) since it is assumed at the outset, in the model section. 38However, if the parent's utility assigned a weight of exactly zero to his own payoff, then Proposition 5 would no longer hold since the parent would be unconcerned with choosing a deviation that allows to send the signal more cheaply. 39The literature actually dates to Güth and Yaari (1992)  [51], but has recently experienced renewed impetus-examples include Robson (2001) [52], Samuelson (2004)  [53], Samuelson and Swinkels (2006)  [54], Rayo and Becker (2007)  [55], Netzer (2009)  [56], Herold (2012) [33], Adriani and Sonderegger (2015) [57].Robson and Samuelson (2010)  [58] provide a comprehensive survey. 40See e.g., Binmore (2005)  [59] for a discussion of ultimate causes and proximate psychological mechanisms. in equilibrium.Consider now (0,1).By refinement 1, n (0,1) = 0.This implies that (0, 1) is strictly dominated by (0, 0).This proves that, if n (1,0) = 1, then punishment cannot emerge in equilibrium.Part (ii).In any equilibrium with punishment, youngsters whose parent both follows the norm and punishes follow the norm.The necessary condition for punishment to emerge is that (1, 1) should not be a dominated action.From (i), we know that in any equilibrium with punishment it is necessary that n (1,0) = 0.The parent's expected payoff from (1, 0) is then −c − Θm t+1 .Consider now the parent's payoff from (1, 1), namely (A1).It is straightforward to see that, if n (1,1) = 0, then (1, 1) is strictly dominated by (1, 0).Hence, in any equilibrium with punishment, we need n (1,1) = 1.
Proof of Lemma 4. We first establish some general properties.Inspection of ( 5)-( 7) reveals a number of things.First, as already stated, (0,1) is clearly dominated, and, thus, cannot be an equilibrium strategy.In what follows, we will thus ignore the existence of this strategy.Second, we cannot have U t (1, 1) = U t (1, 0) = U t (0, 0).Third, for punishment to emerge at t, it is necessary that U t (1, 1) ≥ U t (1, 0)-or else punishing would always be dominated.In turn, it is straightforward to see that this requires that U t (1, 0) > U t (0, 0) must also hold.Hence, if punishment occurs at t, then (0, 0) is dominated by (1,0), which implies that all adults must choose to follow the norm.Conversely, for (0, 0) not to be dominated, we require U t (0, 0) ≥ U t (1, 0).In turn, this requires that U t (1, 0) > U t (1, 1) must hold, i.e., (1,1) is dominated by (1,0).We now explore the implications of these properties for steady state.Above we have concluded that, for (0, 0) not to be dominated at t, (1, 1) must be dominated by (1,0) and, thus, m t = 0 (recall that (0,1) is always dominated).In steady state, this then implies m* = 0, i.e., nobody punishes.Since m* = 0, it is clear that following the norm cannot possibly be optimal.This proves that (0, 0) may never coexist with either (1, 1) or (1, 0).Hence, (0, 0) may emerge in a steady-state equilibrium only if all parents select this action pair (and, consequently, all children break the norm), i.e., m* = 0 and b* = 2. From Lemma 1 , we know that, in that case, the optimal action-pair is (0, 0).Hence, m* = 0 and b* = 2 is a possible steady state regime.The other possible steady states are (i) all adults who can punish select (1,1), while those who cannot punish select (1,0); (ii) all adults select (1,0) (so that by Lemma 3 all children break the norm).

Proof of Proposition 1. (i)
We prove the necessity of condition (a).Suppose that parents follow the strategy described in the proposition, and let prior beliefs be p H and p I , both strictly positive.p I > 0 are necessary.First, note that, if p H = 0, the requirements n (1,1) = 1 and n (1,0) = 0 cannot hold simultaneously.Suppose now that p I = 0, so that (1, 0) is an out-of-equilibrium move.Let n (1,0) ∈ {0, 1} be the child's action following this out-of-equilibrium move by the parent.The parent's payoff in the H regime is (R − c) 1 + n (1,0) . (A32) The parent's payoff in the L regime is −c.It is straightforward to see that (1, 0) is necessarily equilibrium-dominated in the L regime (since −c < 0).However, this is not the case in the H regime: If n (1,0) = 1, then (A10) becomes 2 (R − c) and is therefore higher than the equilibrium payoff in that regime, namely 2 (R − c − τ).Under the Intuitive Criterion, upon observing the out-of-equilibrium move (1, 0) the child should therefore conclude that the regime is H.The child's optimal reply is thus n (1,0) = 1.In turn, this makes rewarding (and, thus, norm-following) dominated for parents.This implies that, if p I = 0, the H regime would unfold.The only possible refined steady state equilibrium would then one where nobody follows the norm and nobody rewards.We conclude that p H > 0 and p I > 0 are necessary for reward to emerge in a refined equilibrium.(iii) We show that the reward equilibrium cannot be upset by out-of-equilibrium moves.It is straightforward to show that selecting (0, 1) is strictly dominated by (1, 1) in both the H and the I regimes.Hence, the parent could never gain from selecting this out-of-equilibrium move in either the H or the I regime.Consider now (0, 0).In the H regime, this is strictly dominated by (1, 1), independently of the precise value of n (0,0) .Hence, under the Intuitive Criterion, upon observing (0,0) the child must rule the H regime out.The child's optimal reply is then n (0,0) = 0. Given this, it is straightforward to see that (0, 0) is equilibrium-dominated also in the I regime.The parent can therefore not gain from selecting this out-of-equilibrium move.(iv) We prove that, given p H and p I are strictly positive, condition (b) is sufficient for a reward equilibrium to exist.Under (b), the child's best replies are n (1,0) = 0 and n (1,1) = 1.Given (A30), the parent's best reply is (1,1) if he can reward and (1,0) otherwise.