Indirect Reciprocity with Optional Interactions and Private Information

We consider indirect reciprocity with optional interactions and private information. A game is offered between two players and accepted unless it is known that the other person is a defector. Whenever a defector manages to exploit a cooperator, his or her reputation is revealed to others in the population with some probability. Therefore, people have different private information about the reputation of others, which is a setting that is difficult to analyze in the theory of indirect reciprocity. Since a defector loses a fraction of his social ties each time he exploits a cooperator, he is less efficient at exploiting cooperators in subsequent rounds. We analytically calculate the critical benefit-to-cost ratio above which cooperation is successful in various settings. We demonstrate quantitative agreement with simulation results of a corresponding Wright–Fisher process with optional interactions and private information. We also deduce a simple necessary condition for the critical benefit-to-cost ratio.


Introduction
The evolution of human cooperation is an intensely researched topic in the biological and economic sciences [1][2][3][4].In the classic prisoner's dilemma game, the rational choice for either player is to defect regardless of the other player's choice of strategy.Even in the repeated prisoner's dilemma with exactly m rounds, defection on every round is the only strict Nash equilibrium strategy and the only evolutionarily stable strategy.Indirect reciprocity is an important mechanism for the evolution of cooperation [1].The basic setting of indirect reciprocity is repeated interactions in a group of players [5][6][7].My behavior toward you depends on what you have done to me and to others [8][9][10].
Indirect reciprocity is a generalization of direct reciprocity [1].Direct reciprocity is the phenomenon in which a person behaves toward another individual based on prior personal experience with that individual.For example, if Individual A had a positive interaction with Individual B, then Individual A would be more inclined to cooperate with Individual B in subsequent encounters.Indirect reciprocity works via reputation [11][12][13].For example, if Individual A had a positive interaction with Individual B, then Individual B would achieve a good reputation also in the eyes of an observer, Individual C. Individual C would remember Individual B's good score and be more likely to cooperate with Individual B in subsequent encounters.In this way, people help those who help others, and helpful people have a higher payoff in the end, as is shown in experimental studies [14][15][16][17][18][19][20][21][22].One sees that indirect reciprocity can foster cooperation in the long run because cooperators are able to channel benefits primarily toward other cooperators.
Many theoretical studies have focused primarily on the case of public information [23][24][25].In this setting, all individuals in the population have the same opinion about the reputation of any particular individual.Various subtopics have been researched, including the effects of image scoring and good standing strategies [23], the reputation dynamics that lead to evolution of indirect reciprocity [24,25], involuntary defection [26], games among more than two players [27,28], the ability of cheaters to disrupt stable strategies [29], costly information transfer [30], trinary (instead of binary) reputation models [31], mixing of social norms [32], and others.Public information is a nice modeling simplification, and it can serve as a powerful promoter for the evolution and maintenance of cooperation.
But the original and more general formulation of indirect reciprocity allows for private information, where players differ in their opinion concerning the reputation of others [33][34][35][36][37][38][39].Some of the original studies of indirect reciprocity via image scoring have explored the evolutionary dynamics with private information [38,39].More recent studies have discussed topics of indirect reciprocity with private information, such as the effects of private information on assessment rules [40,41], incomplete observation [42], and assessment errors [43], among others.Importantly, the distribution of reputations in the population, which specifies how individuals perceive each other, can be evolving over time.Basic analytical results with private information-and with an evolving distribution of reputations-are typically difficult to achieve.

Model
We investigate a simple model of indirect reciprocity with private reputation.The identity of a defector may be known to some people, but not to others.We consider a prisoner's dilemma with two strategies: cooperation, C, and defection, D. The payoff matrix is given by Consider a population of N = N C +N D individuals, in which N C individuals are cooperators, and N D individuals are defectors.In a single round, two individuals are chosen randomly, and they are offered a game.A cooperator always plays with a cooperator, and each receives payoff b − c.If a cooperator and a defector are chosen in a round, then the cooperator accepts to play the game only if he or she is unaware of the defector's identity.If the cooperator is cognizant of the defector's identity, then the potential game is rejected.Thus, our model features optional interactions between players [44].If a game is played between an unknowing cooperator and a defector, then the cooperator receives payoff −c, while the defector receives payoff b.If two defectors are chosen in a round, then they play, and each receives payoff 0. (Alternately, one could consider that two defectors simply do not play a game together; since a game between two defectors has no effect on payoffs and results in no information transfer, such a distinction is inconsequential for our model.)Each time a defector plays a game with a cooperator, his or her identity is revealed to each of the N C cooperators in the population with probability p.We assume that this information transfer about the identity of the defector between cooperators is not costly [30].
A total of M rounds occur in a single generation.We use k to denote one of the N individuals in the population.The payoffs from all games of individual k over M rounds are added to obtain that individual's total payoff, S k .Individual k's fitness is computed as F k = exp(βS k ), where β represents the intensity of selection.After M rounds, Wright-Fisher updating is performed.For each individual in the next generation, a parent is chosen from the current generation proportional to fitness.Mutation occurs with probability µ and leads equiprobably either to a cooperator or to a defector.Then M rounds are played in the new generation.Payoff does not accumulate from one generation to the next, but is always reset to zero.
What is the critical value of b/c above which cooperators are more abundant than defectors when their numbers, N C and N D , are averaged over many successive generations of the mutation-selection dynamics?What is the critical value of b/c above which, with no mutation, defection fixes in the population with probability less than 1/N when starting with a single defector?Can these questions be answered analytically?

Single Defector
To make progress analytically, imagine the simpler case of a single defector in an infinitely large population of cooperators.In this simplified setting, the defector participates in m rounds and therefore is offered to play a game m times, each time with a random cooperator.(Note that in the dynamical model, M rounds are run in total between randomly selected pairs of individuals, but the number of rounds that a particular defector in the population participates in is a random variable.In the simplified static model, the sole defector participates in exactly m rounds.)After each exploitation of a cooperator, the exploited cooperator shares the identity of the solitary defector with each of the other uninformed cooperators with probability p = 1 − q.Thus, with probability q, each defector-cooperator link remains active after an exploitation of a cooperator by a defector.Informed cooperators always refuse a potential game with the defector.The fraction of cooperators in the population that are aware of the defector's identity increases with the number of games played.Since informed cooperators always reject a game with the defector, cooperators can achieve a higher total payoff than the defector if many rounds are offered.A sample sequence of rounds is shown in Figure 1.Changes in the fraction of active links to cooperators as games are played are shown schematically in Figure 2.
Figure 1.Schematic illustrating indirect reciprocity with optional interactions and private information in a population with eight cooperators (blue) and a single defector (red).The population size is N = 9.The defector is at the center of each image of the population.Each solid black line connecting the defector with a cooperator represents an active link.In the first round, the defector is offered a game with a random cooperator (light blue with a "?"), and the game is played.The defector gains payoff b.Each of the defector's links is removed with probability p = 0.5.After the first round, four of the defector's links with cooperators are eliminated (thin dotted lines).In the second round, the defector is offered a game with a cooperator that knows the defector's identity, so the potential game is rejected.In the third round, the defector again exploits an ignorant cooperator, and the defector again receives payoff b.Two of the remaining active links are eliminated.In the fourth round, the defector is offered a game with an informed cooperator, and the potential game is rejected.After this sequence of four rounds, the defector gains total payoff 2b.Schematic showing how the fraction of active links between the defector and cooperators decreases as additional exploitations occur.At the start of a generation, all links to cooperators are active.On the first round, the defector plays a game, and a fraction q of the links remain active after the game.On each round thereafter, the defector plays another game with probability q i , where i is the number of games that have been played to that point.
We denote the average number of games played in m rounds by a solitary defector in a population of infinitely many cooperators by F m (q).We must find an expression for F m (q).First, let us derive an expression for x i,m -the probability that the defector plays i games in m rounds.Suppose that the defector plays i times in m rounds.There are two possible cases: (i) the defector plays i times in the first m − 1 rounds and is rejected for play in round m; and (ii) the defector plays i − 1 times in the first m − 1 rounds and is accepted for play in round m.The probability for the first case is (1 The probability for the second case is q i−1 x i−1,m−1 .Therefore, the probability that the defector plays i times in m rounds is This expression holds for any m ≥ i ≥ 2 with the convention that x m,m−1 = 0.For i = 1, the second case does not occur and we have the recurrence formula

Probability of i Games in m Rounds
To solve the recurrence, Equation (1), for x i,m , it is helpful to first solve for its z-transform, which we write as From Equations ( 3) and ( 2), the z-transform of x 1,m is We can consider the summation as a geometric series (with region of convergence |z| > 1 − q), and we obtain From Equations ( 3) and ( 1), the z-transform of We see that xi (z) can be expressed recursively in terms of xi−1 (z): Iterating Equation ( 5) and substituting Equation ( 4), we find Thus, the z-transformed quantity xi (z) has a simple, closed-form expression in terms of q, i, and z.All that remains is to invert the z-transform to obtain x i,m .Intuitively, Equation ( 6) has exactly i simple poles, and each simple pole corresponds to a separate term in the evaluation of x i,m .We have (I denotes the imaginary unit.)Here, C represents a counterclockwise closed contour encircling the origin and enclosing all poles of the integrand.For example, we can take a unit circle for C as all the zeros and poles have modulus less than 1.From the residue theorem, we have We evaluate this as Simplifying Equation (7), we obtain This can be rearranged slightly as Collecting factors of q, we have We make the substitution i = j + k.We also define Equation ( 8) can then be simplified as Equation ( 9) is proven using an alternative method in Appendix A.

Average Number of Games
The average number of games played by the defector, F m (q), is given by From Equations ( 9) and ( 10), and after rewriting the summations, we have In Figure 3, we plot Equation ( 11) in several ways.

Fixed Number of Rounds per Generation
More generally, consider that there are N C cooperators and N D defectors.Two players are selected randomly in each round and are offered a game.Each individual is selected, on average, m times in M total rounds.The total number of rounds, M , is a constant for each generation.

Average Number of Games
Define by f = N C /N the fraction of individuals in the population that are cooperators.A total of M rounds are run in each generation in a population of size N .Here, M can be expressed in terms of m (the average number of rounds that each individual engages in) as M = mN/2.The probability that a given defector engages in a particular round is 2/N .In the limit N → ∞, the binomial distribution for the number of rounds that a given defector engages in becomes Poisson-distributed with expected value (mN/2)(2/N ) = m.Moreover, the average number of times that a defector is selected for a possible game with a cooperator is f m, and this quantity is Poisson-distributed with expected value f m.Thus, we can write the average number of games played by a defector with cooperators in a generation as Here, p = 1 − q is the probability that a link between a cooperator and a defector is eliminated after the defector exploits a cooperator.Figure 3. Average number of games that the defector plays.(a) Expected number of exploitations, F m (q), against the probability, q, to maintain each link when m = 5, 10, and 20 (bottom to top); (b) F m (q) against the number of rounds, m, when q = 0.2, 0.5, and 0.8 (bottom to top); (c) Expected number of exploitations normalized by the number of rounds, F m (q)/m, against the probability, q, to maintain each link when m = 5, 10, and 20 (top to bottom); (d) F m (q)/m against the number of rounds, m, when q = 0.2, 0.5, and 0.8 (bottom to top).

Critical Benefit-to-Cost Ratio
What is the critical benefit-to-cost ratio, b/c, needed to ensure that cooperators receive a higher payoff, on average, than defectors?We reason as follows: Each cooperator plays an average of f m games with other cooperators in a generation, so each cooperator receives a contribution to its payoff in the amount f m(b − c).Each defector plays, on average, G(f, m; p) games in a generation, so each defector receives average payoff bG(f, m; p).Each exploitation by a defector corresponds to a cost incurred by a cooperator without any compensating benefit.The total number of games that are played by defectors, on average, is N (1−f )G(f, m; p).Therefore, the average cost incurred by a cooperator due to exploitations by defectors is (−c)((N (1−f ))/(N f ))G(f, m; p).Putting these pieces together, we find that the average cooperator payoff in a generation exceeds the average defector payoff in a generation if Rearranging Equation ( 12), we find that cooperators achieve a higher payoff that defectors, on average, if

Simulation Results
We next perform simulations for which the total number of rounds per generation, M , is fixed.The success of cooperation depends on the rate, p, at which information about the defectors is transferred among cooperators, on the fraction, f , of cooperators in the population, and on the average number of rounds per generation, m, that an individual engages in.We have that m = 2M/N .
Simulation results for a fixed total number of rounds per generation are shown in Figure 4a,b.In Figure 4a, the total number of rounds per generation, M , is 400.Therefore, the average number of rounds per individual in a generation, m, is equal to 8. In Figure 4b, the total number of rounds per generation, M , is 800.Therefore, the average number of rounds per individual in a generation, m, is equal to 16.
We plot the critical value of b/c needed for cooperators to be more abundant than defectors with nonzero mutation rate, µ, and with weak intensity of selection, β ("+" symbols), for m = 8 and m = 16.We also plot the critical value of b/c above which cooperators achieve a higher payoff in M total rounds, on average, than do defectors if f = 1/2 (green circles) for m = 8 and m = 16.The two sets of data points agree well, as we would expect.In addition, we show the functions A(1/2, 8; p) and A(1/2, 16; p).The agreement between theory and simulation is excellent.
In another measurement, we start with a single defector and allow the dynamics to progress with no mutation (µ = 0) and with weak intensity of selection.We plot the critical value of b/c needed for the fixation probability at the all-defector state to be less than 1/N ("X" symbols) for m = 8 and m = 16 when starting with a single defector.We also plot the critical value of b/c above which cooperators achieve a higher payoff in M total rounds, on average, than do defectors if f = 2/3 (red squares) for m = 8 and m = 16.The two sets of data points agree well, as we would expect from the one-third law of evolutionary dynamics [2,[47][48][49][50][51].In addition, we show the functions A(2/3, 8; p) and A(2/3, 16; p).The agreement between theory and simulation is again excellent.
In yet another measurement, we plot the critical value of b/c above which a solitary defector in a population of cooperators achieves a lower payoff in M total rounds, on average, than does a random cooperator (black triangles).This is the condition for cooperation to be a strict Nash equilibrium.We also show the functions A(1, 8; p) and A(1, 16; p), again demonstrating outstanding agreement between theory and simulation.

Variable Number of Rounds per Generation
The only strict Nash equilibrium strategy and the only evolutionarily stable strategy in the classic prisoner's dilemma with exactly m repeated interactions is to defect in every round [2,12,52].
In realistic settings, an individual may not know at the outset how many rounds it will participate in with potential interaction partners.It is therefore worthwhile to also consider a variable number of rounds per generation.In a variation of our dynamical model, we postulate that another round is run between two randomly chosen individuals with probability W .The average total number of rounds that are run is M = 1/(1 − W ). The average number of rounds that each defector engages in, averaged over generations, is m.

Average Number of Games
The probability that a total of M rounds are run is (1 − W )W M −1 .Analogously to G(f, m; p), we can write the average number of games played by a defector, H(f, m; p), as Intuitively, we are just multiplying the probability that a total of M rounds are offered in a generation with the average number of rounds offered to an individual, G(f, m = 2M/N ; p).We consider the limit of large population size, N → ∞, which is realistic and also admits simplified results.Denote the average number of rounds that an individual engages in by m.The value of W for which the average number of rounds engaged in per individual equals m is W = 1 − 2/(mN ).We have We can write this as In the limit of large population size, N → ∞, the factor in square brackets becomes e −1 .Thus, we have This can be rewritten more suggestively as Here, note that ∆M = 1.(The summation increments M by 1 at each iteration.)Recall that m = 2M/N .Since 2∆M/N = ∆m, we can consider Equation ( 14) as an integration over m.We have Next, we define We have Notice that the integration over ζ is just Γ(k + 1) = k!.Thus, we have Analogously to Equation ( 12), the average cooperator payoff in a generation exceeds the average defector payoff in a generation if Rearranging Equation ( 15), we have We next perform simulations for which the total number of rounds per generation, M , is a random variable.The success of cooperation depends on the rate, p, at which information about the defectors is transferred among cooperators, on the fraction, f , of cooperators in the population, and on the average number of rounds per generation, m, that an individual engages in.The average number of rounds that an individual engages in during any single generation, m, is a random variable and is given by m = 2M/N .
Simulation results for a variable total number of rounds per generation are shown in Figure 4c,d.In Figure 4c, the probability of another round, W , is 399/400.Therefore, the average number of rounds per individual in a generation, m, is equal to 8. We plot B(1/2, 8; p), B(2/3, 8; p), and B(1, 8; p).The agreement between theory and simulation is excellent.In Figure 4d, the probability of another round, W , is 799/800.Therefore, the average number of rounds per individual in a generation, m, is equal to 16.We plot B(1/2, 16; p), B(2/3, 16; p), and B(1, 16; p).The agreement between theory and simulation is again excellent.
Note the subtle distinction between the models investigated in Figure 4a,b and Figure 4c,d.In Figure 4a,b, the total number of rounds per generation is fixed and is equal to M , and the average number of rounds per individual in a generation is equal to m = 2M/N .In Figure 4c,d We expect Equation (17) to yield a necessary condition on b/c for cooperation to be successful.Rearranging, we arrive at the condition If the total number of rounds, M , in a generation is a random variable, then we propose a variation of Equation ( 18 The simple condition, Equation (18), is plotted in Figure 4a,b (solid black curves).Notice that this form has the intuitively correct limiting behavior: The critical value of b/c increases as the rate of information transfer, p, decreases, and becomes infinite in the limit p → 0. The critical value of b/c decreases as the number of rounds, m, increases.As m → ∞, cooperators are able to accumulate an infinitely larger payoff than defectors in a single generation (provided that b − c > 0), and the critical value of b/c approaches 1.The simple condition, Equation (19), is also plotted in Figure 4c,d (solid black curves).

Discussion
We have studied a simple model for evolution of indirect reciprocity with private information.As information about the identity of defectors spreads among cooperators, defectors lose their fitness advantage.Interactions are optional; a cooperator can reject a possible game with a defector if it is cognizant of that defector's identity.Thus, cooperation can eventually prevail.
We have derived exact conditions on the benefit-to-cost ratio for cooperation to dominate.We have also derived exact conditions on the benefit-to-cost ratio for natural selection to oppose the fixation of defectors starting with a single defector.Agreement of our theoretical predictions for either a fixed or a variable total number of rounds (Equations ( 13) and ( 16), respectively) with simulation data is excellent.We have also deduced simple conditions on the benefit-to-cost ratio, Equations ( 18) and (19), which ensure the success of cooperation.
Our calculations and simulations lend insight into indirect reciprocity with an evolving distribution of reputations.Increasing the rate of information transfer among cooperators and increasing the number of games offered to each individual in a generation both facilitate the transition to a population of cooperators.Intriguingly, the distribution of the total number of rounds in a generation significantly affects the critical benefit-to-cost ratio needed for cooperators to be favored.This is evident in Figure 4, where one sees that having a constant probability of another round in a generation (as opposed to having an exactly fixed total number of rounds per generation) is more favorable to cooperation.
Variations of our model are also possible.For example, a defector's identity may be revealed in any round in which it is paired with a cooperator, even if the cooperator rejects the potential game.Notice that this modified model is significantly more conducive to the evolution of cooperation since information about defectors is transferred more rapidly.In this case, our critical benefit-to-cost ratios, Equations ( 13) and ( 16), would represent sufficient conditions for cooperators to succeed.We derive results for the modified model in Appendix B.
In another variation, there is a cost for entering a game.In this case, even defectors would want to avoid playing with other defectors.It is now plausible that also defectors reveal the identities of other defectors that they have met.This modification also allows for easier evolution of cooperation, because the identity of defectors is revealed faster.
There are also possibilities for implementation errors.For example, when a cooperator engages in a round with a defector, the cooperator could mistake the defector for being another cooperator.Such recognition errors may involve new information transfer regarding the defector's identity, which would reduce future exploitations of cooperators by that defector.Implementation errors could also result in erroneous information transfer if cooperators are mistakenly perceived as defectors.In this case, cooperators that are wrongly perceived as defectors might have their links with other cooperators severed for all subsequent rounds, which would make it more difficult for cooperators to thrive.However, a key point is that we endow cooperators with optional interactions.Therefore, on rare occasions, a cooperator might wrongfully view another cooperator as a defector and reject a potential game, but both individuals subsequently retain their strategies as cooperators.For this reason, we expect our derivations of critical benefit-to-cost ratios to be fairly robust provided that implementation errors are rare.Each of these modifications to our model is a potential topic for future study.

B. Modification to the Model: Loss of Links on Every Round
Here, we consider a modification to our model in which there is information transfer every time a cooperator and a defector are paired.A fraction p of the defector's remaining links with cooperators are broken on every round, even if the cooperator that is paired with the defector rejects the potential game.The formula analogous to Equation (1) for describing this modified model is x i,m = (1 − q m−1 )x i,m−1 + q m−1 x i−1,m−1 (B1) Here, x i,m is the probability of the solitary defector playing i games in m rounds when a fraction p of links are lost on every round.Equation (B1) holds for any m ≥ i ≥ 2 with the convention that x m,m−1 = 0.For i = 1, we have the recurrence formula To solve Equation (B1), we introduce a z-transform on the index i: Notice that the only nonzero value of x i,m for m = 1 is x 1,1 = 1.Using Equation (B2), we have From Equations (B2) and (B1), the z-transform of x i,m with m ≥ 2 is xm (z) = (1 − q m−1 )x m−1 (z) + q m−1 z −1 xm−1 (z) We see that xm (z) can be expressed recursively in terms of xm−1 (z): Iterating Equation (B4) and substituting Equation (B3), we find (1 − q n−1 )z + q n−1 From the above formula, we have the following by direct computation: (1 − q n−1 )z + q n−1 | z=0 (B5) Equation (B5) can be simplified: We can compute the average number of games played by the defector by substituting Equation (B6) into Equation (10).In Figure B1, F m (q)/m is plotted for the original model and for the modified model.Notice that F m (q) is larger for the original model than for the modified model for all m ≥ 3.

Figure 2 .
Figure2.Schematic showing how the fraction of active links between the defector and cooperators decreases as additional exploitations occur.At the start of a generation, all links to cooperators are active.On the first round, the defector plays a game, and a fraction q of the links remain active after the game.On each round thereafter, the defector plays another game with probability q i , where i is the number of games that have been played to that point.

Figure 4 .
Figure 4.The critical value of b/c for cooperation to be favored over defection in various settings.The green circles represent measured critical values of b/c for the average cooperator payoff to exceed the average defector payoff when f = 1/2.The "+" symbols represent measured critical values of b/c for cooperators to outnumber defectors with weak selection and with weak mutation when their numbers are averaged over many successive generations.The red squares represent measured critical values of b/c for the average cooperator payoff to exceed the average defector payoff when f = 2/3.The "×" symbols represent measured critical values of b/c for defectors to fix with probability less than 1/N when starting with a single defector with weak selection and with no mutation.The black triangles represent measured critical values of b/c for the average cooperator payoff to exceed the average defector payoff when f = 1 (i.e., a single defector in a large population of cooperators).Panels (a,b) are for a fixed total number of rounds per generation, M , while panels (c,d) are for a variable total number of rounds per generation, M , where the average of M over generations is M = 1/(1 − W ). Error bars are roughly the size of the data points.
, the probability of another round between two random individuals in a generation is equal to W .The total number of rounds per generation, M , for Figure4c,d is therefore a random variable, with average given by M = 1/(1 − W ), and the average number of rounds per individual in a generation is equal to m = 2M /N = 2/(N (1 − W )). Notice that the critical values of b/c are lower in Figure4cthan in Figure4aand are also lower in Figure4dthan in Figure4b.

3. 4 .
Simple Lower Bound on b/c We now seek a simple condition on b/c for cooperation to be successful.Consider an individual in an infinitely large population of cooperators.The individual participates in m + 1 rounds with the infinite population of cooperators and has two choices of strategy: The focal individual may cooperate in all m + 1 rounds, receiving total payoff (b − c)(m + 1).Or the focal individual may defect on the first round but cooperate on all m subsequent rounds, receiving total payoff b + qm(b − c).The strategy for the focal individual to always cooperate delivers a higher payoff after m + 1 rounds if

Figure B1 .
Figure B1.Average number of games that the defector plays in the original model (black curves) and the modified model (red curves).The expected number of exploitations normalized by the number of rounds, F m (q)/m, is plotted against the probability, q, to maintain each link when (a) m = 3 and (b) m = 4.The inset in each panel shows the difference, ∆, between the black and red curves.