Evolution of Groupwise Cooperation : Generosity , Paradoxical Behavior , and Non-Linear Payoff Functions

Evolution of cooperation by reciprocity has been studied using two-player and n-player repeated prisoner’s dilemma games. An interesting feature specific to the n-player case is that players can vary in generosity, or how many defections they tolerate in a given round of a repeated game. Reciprocators are quicker to detect defectors to withdraw further cooperation when less generous, and better at maintaining a long-term cooperation in the presence of rare defectors when more generous. A previous analysis on a stochastic evolutionary model of the n-player repeated prisoner’s dilemma has shown that the fixation probability of a single reciprocator in a population of defectors can be maximized for a moderate level of generosity. However, the analysis is limited in that it considers only tit-for-tat-type reciprocators within the conventional linear payoff assumption. Here we extend the previous study by removing these limitations and show that, if the games are repeated sufficiently many times, considering non-tit-for-tat type strategies does not alter the previous results, while the introduction of non-linear payoffs sometimes does. In particular, under certain conditions, the fixation probability is maximized for a “paradoxical” strategy, which cooperates in the presence of fewer cooperating opponents than in other situations in which it defects.


Introduction
Reciprocity is a key factor in the study of evolution of cooperation [1,2].Evolution of pairwise and groupwise cooperation by reciprocity has been studied by using the two-player and n-player repeated prisoner's dilemma (PD) games [2][3][4][5][6][7][8][9][10][11][12].While a population of reciprocators can be stable against invasion by defectors in repeated PDs, so long as games are repeated sufficiently many times, various additional mechanisms have been proposed to explain the initial emergence of reciprocators in a population of defectors, such as invasion by a mass of reciprocators [13], formation of spatial clusters [14,15], stochastic aggregate payoffs [16], and random drift [17].Of particular relevance to the present study is Nowak et al. (2004).They developed a model of stochastic evolutionary dynamics for the two-player repeated PD and derived the fixation probability with which a single mutant of reciprocator that appears in a population of defectors will eventually replace the whole population.Nowak et al.'s (2004) framework has been extended to include the more general n-player repeated PD [18,19], which has enabled to investigate the evolution of group-wise cooperation [20][21][22] considering the effect of random drift.In this article, we report our further investigation on the stochastic evolutionary model of the n-player repeated PD.
An interesting feature of group-wise cooperation, or more specifically, the n-player repeated PD, is that we can conceive of multiple types of reciprocators that differ in their levels of generosity.Take tit-for-tat (TFT) for example.TFT is a reciprocal strategy in the two-player repeated PD, which cooperates in the first round of a repeated game and, from the second round onward, cooperates if and only if its opponent has cooperated in the previous round.In the n-player repeated PD, n different reciprocal strategies analogous to TFT, or TFT a for a ∈ {0, 1, • • • , n − 1}, are possible, where a represents the strategy's level of generosity [23].Namely, TFT a , cooperates in the first round and after that cooperates if and only if a or more of its opponents have cooperated in the previous round.Thus, TFT a with a smaller value of a is more generous toward the group members' defection.
Suppose that a single mutant of the strategy that always defects (ALLD) appears in a population of TFT a .Obviously, less generous TFT a (i.e., having larger a) is quicker to find ALLDs in the group and withdraw further cooperation and, thus, more likely to be stable against their invasion.On the other hand, for TFT a to be advantageous over ALLD, it is always necessary that each TFT a individual tends to have a or more other TFT a individuals in the same group; otherwise, TFT a cooperates only in the first round of each game to be exploited by ALLDs.This means that more generous TFT a (i.e., having smaller a) may compete better against ALLD, particularly when TFT a is rare in the population; in other words, generosity may serve to facilitate the initial emergence of TFT a .
This intuition has been confirmed by Kurokawa et al. (2010), who compared the fixation probabilities of TFT a s with different values of a when appears as a single mutant in a population of ALLD [19].They found that more generous TFT a can sometimes have a greater fixation probability than less generous ones.They also specified the optimal level of generosity that maximizes the fixation probability, showing that there is a certain threshold level of generosity above which the largest fixation probability can never be attained.Specifically, it was shown that any such TFT a that tolerates defections by a half or more of its opponents can never have the largest fixation probability.
Although Kurokawa et al. (2010) focused on the strategies classified as TFT a , they are not the only ones that can potentially establish mutual cooperation in the n-player repeated PD.In fact, tolerating the presence of some defectors in a group may have a similar enhancing effect on the evolution of reciprocal strategies in general.However, this possibility has not been explored thus far.Here we extend Kurokawa et al.'s (2010) analysis to consider the set of all "reactive strategies" that (i) cooperates with probability d ϕ ∈ {0, 1} in the first round and (ii) cooperates in the kth round (k ≥ 2) with probability d j ∈ {0, 1}, which depends on the number of opponents, j, who have cooperated in the k − 1th round.There are 2 n+1 such strategies in total, including n − 1 TFT a s and 2 n strategies that defect in the first move [24][25][26][27][28].The rest of the strategies in the set are "paradoxical" in the sense that each of them has a certain situation in which it cooperates even though there are fewer cooperators in the group than in other situations in which it withdraws cooperation.It is of interest whether the strategies defecting in the first move or exhibiting paradoxical behavior can have the largest fixation probability among the set of all reactive strategies when invading ALLD.
Another limitation of Kurokawa et al.'s (2010) work is that they only considered the conventional linear payoff functions (i.e., linear public goods games.However, there are at least two reasons why non-linear payoff functions should be also considered.First, in the real world, payoffs are not always linear [29][30][31][32][33][34][35][36][37][38][39] and it has been demonstrated that introduction of non-linearity can affect the outcome of evolutionary models (e.g., [40][41][42][43][44][45][46][47].Second, if payoffs increase non-linearly with the number of cooperating members in the group, it can be said that the efficiency of an act of cooperation varies depending on the number of cooperators.For example, in case payoffs increase more rapidly when cooperators are fewer, an individual's act of cooperation is more efficient in the presence of fewer cooperators.In such a case, more generous strategies may be selectively favored because generosity promotes cooperation when it is efficient.To examine these possibilities, we also extend Kurokawa et al. (2010) in terms of payoff functions.
In addition, Kurokawa et al. (2010) derived their results under the assumption that selection is sufficiently weak.However, the intensity of selection can affect the outcomes of evolutionary models (e.g., [17,[48][49][50].Hence, we investigate to what extent our results based on the weak selection assumption may be affected by the selection intensity. In this paper, we extend Kurokawa et al.'s (2010) analysis by removing these limitations.In particular, taking non-linear payoff functions and paradoxical (i.e., non-tit-for-tat type) strategies into consideration, we ask how non-linear payoffs may facilitate the evolution of excessive generosity and whether any paradoxical strategies can ever attain the highest fixation probability among all reactive strategies considered.Additionally, we examine how our results may depend on the intensity of natural selection.In what follows, we first introduce the n-player repeated PD and our model of stochastic evolutionary dynamics (Model).Then we derive, based on the weak selection approximation, the best reactive strategies, or the strategies maximizing the fixation probability, and provide detailed analyses for linear and non-linear payoff cases.We also examine to what extent our results may be affected if selection is not sufficiently weak (Results).Finally, we summarize the results and discuss some caveats and possibilities of further investigations (Discussion).

Model
We investigate stochastic evolutionary dynamics of a population of individuals whose fitness is determined by n-player repeated PD games.We consider a set of "reactive strategies," as defined below, and compare the fixation probabilities of different reactive strategies when introduced as a mutant into a population of the strategy that always defects, or ALLD.Our goal is to specify the "best reactive strategies," which maximize the fixation probability.

The n-Player Repeated Prisoner's Dilemma
In the n-player repeated PD, a group of n individuals play a game consisting of one or more rounds, in each of which each individual chooses to either cooperate or defect.After one round is finished, another round will be played with probability δ; otherwise, the group will be dismissed (0 < δ < 1.Thus, the expected number of rounds is 1/(1 − δ).Suppose that there are k cooperating and n − k defecting individuals in a given round.The payoffs to cooperating (C) and defecting (D) individuals in that round are denoted by V(C|k) and V(D|k) , respectively (0 ≤ k ≤ n.Note that V(C|0) and V(D|n) are defined arbitrarily and used only for the sake of notational convenience.Following Boyd & Richerson (1988), n-player PD requires the following four conditions to be satisfied: where 0 ≤ k ≤ n − 1. Conditions (1) states that an individual always gains more from defecting than cooperating.Conditions (2) and (3) mean that an individual always gains more when there are more cooperators in the group.Conditions (4) indicates that the total payoff of the group is larger when there are more cooperators in the group.We define a reactive strategy in the n-player repeated PD as follows: an individual with a reactive strategy cooperates with a certain probability in the first round, and from the second round on, cooperates with a certain probability determined by the number of opponents (i.e., group members other than the self) who have cooperated in the preceding round.Any reactive strategy is described by a vector d = d ϕ , d 0 , d 1 , • • • , d n−1 , where d ϕ denotes the probability with which an individual obeying this strategy cooperates in the first move and d j denotes the probability with which he cooperates in a given round, provided that j of the n − 1 opponents have cooperated in the preceding round.We focus on the set, Ω, of all reactive strategies in which d ϕ is either 0 or 1 and d j is either 0 or 1, namely, Obviously, there are 2 n+1 strategies in Ω.
Table 1 shows the payoff matrix of the general two-strategy n-player game, where a l denotes the payoff to an individual with strategy A whose opponents are n − l individuals with strategy A and l − 1 individuals with strategy B; likewise, b l denotes the payoff to an individual with strategy B confronting with the same composition of opponents.Let A and B in Table 1 represent a strategy d in Ω and ALLD, respectively.To describe the game involving d and ALLD, we need to write down all a l and b l in Table 1 in terms of δ, V(C|k), V(D|k), and d.Consider a group consisting of k individuals adopting d and n − k individuals adopting ALLD.We denote by h k the expected number of rounds in which a d individual cooperates before the group is dismissed.Since there are on average 1/(1 − δ) rounds, we have 0 ≤ h k ≤ 1/(1 − δ).Further, since we do not consider any errors in behavior, d individuals perfectly coordinate their behavior, that is, either all or none of them cooperate in a given round.Hence, we obtain: For any group composition, h k is determined by d ϕ , d k−1 , and d 0 in the following manner.First, suppose that d ϕ = d k−1 = 1.In this case, d never defects and, hence, In sum, h k is given by: Therefore, an n-player repeated PD involving d and ALLD is specified using Equations ( 5)- (7).
Table 1.The payoff matrix of the general n-player game.

Evolutionary Dynamics
We use a model of stochastic evolutionary dynamics developed for the general two-strategy n-player game [18].A population consisting of i individuals adopting strategy A and N − i individuals adopting strategy B is considered, where the population size, N, is constant (0 ≤ i ≤ N).Groups of n individuals are formed by choosing individuals at random from the population and a game is played within each of these groups (2 ≤ n ≤ N).The payoffs of the game are as shown in Table 1.Let F i and G i be the expected payoffs to an A individual and a B individual, respectively, when the number of A individuals in the population is i.The fitness of A and B individuals are given by f i = 1 − w + wF i and g i = 1 − w + wG i , respectively, where w is the selection intensity (0 < w < 1).The population dynamics are formulated as a Moran process with frequency-dependent fitness: at each time step, an individual is chosen for reproduction with the probability proportional to its fitness, and then Games 2018, 9, 100 5 of 24 one identical offspring is produced to replace another individual randomly chosen for death with probability 1/N [17,51].Denote by ρ A,B the probability that a single individual with strategy A in a population of N − 1 individuals with strategy B will finally take over the whole population (i.e., the fixation probability).Throughout the paper, we assume that the interaction group is smaller than the population (N > n).Here, we have: where s t represents a binomial coefficient when s ≥ t and is defined zero when s < t.

General Case
We define the best reactive strategy as the strategy (or strategies) that maximizes the fixation probability when introduced as a single mutant in a population of ALLD.The best reactive strategy maximizes the right-hand side of Equation (10) in the context of the n-player repeated PD specified by Equations ( 5)- (7).Equation ( 10) is equivalent to: Putting Equations ( 5) and ( 6) into Equation (11) and explicitly denoting the dependence on strategy d ∈ Ω, we obtain: where: is independent of strategy d.Note that h k (d) is given by Equation ( 7) and is dependent on d in the following manner: Games 2018, 9, 100 6 of 24 The best reactive strategy d * is given by the following equation: As shown in Appendix A, the solution of the maximization problem in Equation ( 15) is given as follows.On the one hand, when: the solution of Equation ( 15) is given by: for 1 ≤ k ≤ n.Note that d * k−1 can be either 0 or 1 in case f (k) = 0. On the other hand, when the inequality in Inequality ( 16) is reversed, the solution of Equation ( 15) is given by: where d * k−1 can be either 0 or 1 for 2 ≤ k ≤ n.In sum, Conditions ( 16)-( 18) specify the best reactive strategy, d * .
Our main result Conditions ( 16)-( 18) states that so long as the game is repeated sufficiently many times (i.e., δ is sufficiently large), each bit, d k−1 , of the best reactive strategy is solely determined by the sign of f (k), which is determined by the payoff structure of the game (and the group size and the population size.In Equation ( 13), the term V(C|k) − V(D|0) is the payoff to a cooperating player in a group where k players cooperate minus the payoff to a defecting player in a group where no one cooperates.Thus, this term represents the benefit of cooperation.The term V(C|k) − V(D|k) represents the payoff difference between a cooperating player and a defecting player in a group where k individuals cooperate.Thus, this negative term represents the cost of cooperation.Weighted sum of them determines whether the best reactive strategy should cooperate if k − 1 players have cooperated in the previous round.
In the following analysis, we consider cases in which Inequality (16) holds unless otherwise stated.

Linear Payoff
Let us begin with a special case where the following conventional linear payoff assumption holds: From Inequalities (1)-( 4), it has to be that 0 < c < b < nc.Putting Equations ( 19) and (20) into Equation (13), we obtain: Games 2018, 9, 100 7 of 24 Since f (k) increases linearly with k in this case, there exists a critical value above which f (k) is positive (Figure 1a.Hence, from Equation ( 17), α is maximized when the following condition is met: where: From Equation ( 23), k * decreases with increasing N, and for an infinitely large population, k * is given by: Since b/(nc) < 1 by assumption, k * > (n − 1)/2 holds.Therefore, any strategy d that cooperates when a half or less of the opponents have cooperated in the previous round cannot be the best reactive strategy.
Games 2018, 9, 100 7 of 23 0, 1, ⋯  − 1 measures the level of generosity [23].Thus, there are  strategies that are classified as TFTa with different levels of generosity.From these, Kurokawa et al. (2010) specified  , the optimal level of generosity, which maximizes the fixation probability.In the terminology of the present study, the  strategies constitute the subset of Ω for which  = 1 and  ≤  ≤ ⋯ ≤  = 1.In addition to these, we consider two kinds of non-TFTa reactive strategies.One is those that defect in the first round (i.e.,  = 0. Our analysis shows that some of this kind can maximize the fixation probability if the expected number of rounds is sufficiently small, partially modifying Kurokawa et al.'s (2010) finding.Another kind is what we call "paradoxical" strategies (see Introduction), in which  >  holds for at least some ; for example,  = (1, 0, 1, 1, 0, ⋯ , 0, 1).It turns out that Equations ( 22)-( 24) is equivalent to the optimal level of generosity obtained by Kurokawa et al. (2010).Thus, their result is not changed by taking paradoxical strategies into consideration.19) and ( 20);  = 24 and  = 1 , (b) A non-linear payoff assumption, Equations ( 42) and ( 43); (c) A non-linear payoff assumption, Equations ( 20) and (52). = 1.

Non-Linear Payoff
The n-player PD can be further subcategorized according to characteristics of the fitness functions [39,46].To illustrate, we define the cost and benefit associated with cooperation as: respectively.From Inequalities (1) and ( 2), () > 0 is required for all .We can measure how the cost and benefit depend on the number of cooperators by considering: ∆() = ( + 1) − ().
A similar, but slightly different, result has been obtained by Kurokawa et al. (2010) [19].They examined the fixation probability of a strategy called TFT a , being introduced as a mutant into a population of ALLD, under the linear payoff assumption as given by Equations ( 19) and (20).An individual adopting TFT a cooperates in the first round, and from the second round on, cooperates if and only if a or more of the opponents have cooperated in the previous round, where a ∈ {0, 1, • • • n − 1} measures the level of generosity [23].Thus, there are n strategies that are classified as TFT a with different levels of generosity.From these, Kurokawa et al. (2010) specified a opt , the optimal level of generosity, which maximizes the fixation probability.In the terminology of the present study, the n strategies constitute the subset of Ω for which d ϕ = 1 and In addition to these, we consider two kinds of non-TFT a reactive strategies.One is those that defect in the first round (i.e., d ϕ = 0. Our analysis shows that some of this kind can maximize the fixation probability if the expected number of rounds is sufficiently small, partially modifying Kurokawa et al.'s (2010) finding.Another kind is what we call "paradoxical" strategies (see Introduction), in which d k > d k+1 holds for at least some k; for example, d = (1, 0, 1, 1, 0, • • • , 0, 1).It turns out that Equations ( 22)-( 24) is equivalent to the optimal level of generosity obtained by Kurokawa et al. (2010).Thus, their result is not changed by taking paradoxical strategies into consideration.

Non-Linear Payoff
The n-player PD can be further subcategorized according to characteristics of the fitness functions [39,46].To illustrate, we define the cost and benefit associated with cooperation as: respectively.From Inequalities (1) and ( 2), c(k) > 0 is required for all k.We can measure how the cost and benefit depend on the number of cooperators by considering: For example, when ∆c(k) = 0 holds for all k, the cost is constant.Let us also consider: for 0 ≤ k ≤ n − 2. It should be intuitively clear that the n-player PD requires ∆b(k) > 0, which is equivalent to (3), while ∆ 2 b(k) can be either positive or negative (or zero.If, for instance, ∆ 2 b(k) ≥ 0 holds for all k, the benefit increases in an accelerative manner with the number of cooperators (e.g., the weakest-link game; [56]), whereas if ∆ 2 b(k) ≤ 0 always holds, the increase is decelerated (e.g., the volunteer's dilemma; [57].The class of n-player PDs for which ∆ 2 b(k) = 0 always holds represent the public goods games in which the amount of public goods increases linearly with the number of cooperators [47].
The linear payoff assumptions in Equations ( 19) and (20) mean that the following conditions are satisfied for all k: Below we examine how evolution of generosity and paradoxical behavior may be affected by relaxing Equations ( 30) and (31).

Evolution of Generosity with Non-Linear Payoff
Constant Cost with Non-Linear Benefit Consider the case when there is no restriction on the benefit function (except Inequalities (1)-( 4)), that is, Equation (30) is not necessarily true.For the moment, we assume Equation (31).From Equations ( 13) and (20), we have: Using Conditions ( 1) and ( 20), we obtain: which gives: Thus, we obtain a necessary condition for f (k) > 0, namely: The right-hand side of Inequality (35) decreases and approaches (n + 1)/2 with increasing N. Hence, from Equation ( 17), d k = 1 holds in the best reactive strategy only if k > (n − 1)/2.This means that any strategy d who cooperates when a half or less of the opponents have cooperated in the previous round cannot be the best reactive strategy.

Variable Cost with Non-Decelerating Benefit
Let us now remove the restriction on the cost function, that is, we consider the case when Equation (31) does not necessarily hold.Instead of Equation ( 30), we assume: which means that the benefit from cooperation increases with the number of cooperators in either linear or accelerative fashion.From Equations ( 13), ( 26) and ( 28), we have: Using Inequality (36), we obtain: From Conditions (37) and (38), we have: Meanwhile, using Conditions (1), ( 25), ( 26) and ( 28), we obtain: Therefore, combining Equation ( 39) and ( 40), we obtain: Hence, as in the case of constant cost, Inequality ( 35) is necessary for f (k) > 0, which again means that the best reactive strategy must defect when a half or more of the opponents have defected in the previous round.

A Numerical Example
The results for the above two special cases are consistent with our finding in the analysis under the linear payoff assumption.However, the result may be altered qualitatively in the absence of any restriction on the benefit or cost functions.As an illustration, consider the following payoff functions: Games 2018, 9, 100 10 of 24 which satisfies Inequalities (1)-( 4), but is not consistent with Equations ( 30) or (31).To be specific, let us set N = 400, n = 30, and δ = 0.8.Putting them into Equation ( 13), we obtain: In this case, Inequality ( 16) is met, and as shown in Figure 1b, f (k) > 0 holds if and only if k ≥ 14.Hence, from Equation ( 17), the best reactive strategy (d * ) satisfies the following: Remember that d k is defined as the probability with which he cooperates in a given round, provided that k of the n − 1 "opponents" have cooperated in the preceding round.This demonstrates that there exist situations in which such a strategy that cooperates when a half or more of the opponents have defected in the previous round can be the best reactive strategy.

Evolution of Paradoxical Behavior with Non-Linear Payoff
Non-Increasing Cost and Non-Linear Benefit Consider the case when there is no restriction on the benefit function (except Inequalities (1)-( 4)) and, thus, Equation ( 30) is not necessarily true.As for the cost function, we partially relax Equation ( 31) by assuming instead: which means that the cost of cooperation either is constant or decreases with increasing number of cooperators.Equation ( 13) changes into: Hence, we have: From Conditions ( 48)-( 50), we have: This means that if f (k) > 0 then f (m) > 0 for any m > k.Hence, from Equation ( 17), a paradoxical strategy cannot be the best reactive strategy.

A Numerical Example
Our analysis thus far has shown that a paradoxical strategy cannot be the best reactive strategy so long as the cost of cooperation is either unaltered or decreased by increasing number of cooperators.However, this may not be the case if the cost of cooperation sometimes increases with the number of cooperators.Let us provide a numerical example in which the best reactive strategy exhibits a paradoxical behavior.We set N = 400, n = 30, δ = 0.8, and b = 1 and assume that Equation ( 30) is met.Instead of Equation ( 31), here we assume: for which Inequalities ( 1)-( 4) are satisfied.This represents a case when the cost of cooperation is larger in the presence of more cooperators in the group.Intuitively, in such a situation, a paradoxical strategy that cooperates only when there are relatively few cooperators may be more advantageous than non-paradoxical strategies that, which cooperate whenever the number of cooperators exceeds a threshold.Indeed, since in this case Inequality ( 16) is met, and f (k) is non-monotonic in this case (Figure 1c), the best reactive strategy (d * ) is given from Equation ( 17) by: That is, the best reactive strategy cooperates when there are 17 or more opponents who have cooperated in the previous round, except when the number of cooperators is 19 or 20, in which case it defects.The relative advantages of cooperation and defection are determined by a balance between the following two factors: On the one hand, obviously, cooperation is more advantageous when the associated cost is smaller.On the other hand, given that the cost is constant, the relative advantage of cooperation increases with the number of cooperators in the group.In the current example, there is a rise of the cost at k = 20 (see Equation ( 52)), which can be compensated for only when the number of cooperating opponents increases to 21 or more.
The second and third best strategies following Equation ( 53) are given respectively by: For w = 0.001, the fixation probabilities of strategies Equations ( 53)-( 55) with the small w approximation are 0.00251170, 0.00251155, and 0.00251154, respectively.

The Best Reactive Strategy under Moderate Selection Intensity
Thus far we have assumed that selection is sufficiently weak (w 1) so that the fixation probabilities can be approximated by Equation ( 9).Here we numerically obtain the exact fixation probabilities using (8) without the weak selection assumption to examine how this may affect the identity of the best reactive strategy.
For simplicity, we make the linear payoff assumption, Equations ( 19) and (20).Figure 2 illustrates how the best reactive strategy (d * ), obtained by using Equation ( 8), changes depending on the values of δ and w for parameter values b = 9, c = 1, N = 50, and n = 10.For this parameter setting, a comparison of fixation probabilities calculated from Equation ( 9) would indicate TFT 6 as the best reactive strategy when δ > δ * holds, where δ * ≈ 0.49, and d ϕ = d 0 = 0 when δ < δ * .This gives a good approximation of the exact numerical solution for w = 0.001 as shown in Figure 2.However, when w is larger, the approximation is no longer valid.Figure 2 shows that there exist situations in which such a strategy that cooperates when a half or more of the opponents have defected in the previous round can be the best reactive strategy, which cannot be the case in the limit of week selection.A reactive strategy's fixation probability is determined by its payoffs relative to the unconditional defector at various frequencies (i = 1, 2, . . ., N − 1 in Equation ( 8).Among the N − 1 values, a relative payoff at a lower frequency has a greater impact on the fixation probability.Hence, if, as numerically suggested by Kurokawa et al. (2010), the relative payoff at a low frequency tends to be larger for more generous reactive strategies, this is likely to favor the evolution of generosity.This effect might be more pronounced when selection, rather than random drift, plays a greater role.These considerations all in all accord with our numerical example suggesting that more generous reactive strategies are selectively favored when selection is more intense.It was also numerically shown that δ can affect the identity of the best reactive strategy, while in the limit of weak selection δ does not affect what the best reactive strategy is as far as it satisfies Inequality (16).In addition, within this particular range of parameter values, we did not find any case in which a paradoxical strategy is the best reactive strategy.

The Best Reactive Strategy under Moderate Selection Intensity
Thus far we have assumed that selection is sufficiently weak (  ≪ 1) so that the fixation probabilities can be approximated by Equation ( 9).Here we numerically obtain the exact fixation probabilities using (8) without the weak selection assumption to examine how this may affect the identity of the best reactive strategy.
For simplicity, we make the linear payoff assumption, Equations ( 19) and (20).Figure 2 illustrates how the best reactive strategy ( * ), obtained by using Equation ( 8), changes depending on the values of  and  for parameter values  = 9,  = 1,  = 50, and  = 10.For this parameter setting, a comparison of fixation probabilities calculated from Equation ( 9) would indicate TFT6 as the best reactive strategy when  >  * holds, where  * ≈ 0.49, and  =  = 0 when  <  * .This gives a good approximation of the exact numerical solution for  = 0.001 as shown in Figure 2.However, when  is larger, the approximation is no longer valid.Figure 2 shows that there exist situations in which such a strategy that cooperates when a half or more of the opponents have defected in the previous round can be the best reactive strategy, which cannot be the case in the limit of week selection.A reactive strategy's fixation probability is determined by its payoffs relative to the unconditional defector at various frequencies ( = 1,2, . . .,  − 1 in Equation ( 8).Among the  − 1 values, a relative payoff at a lower frequency has a greater impact on the fixation probability.Hence, if, as numerically suggested by Kurokawa et al. (2010), the relative payoff at a low frequency tends to be larger for more generous reactive strategies, this is likely to favor the evolution of generosity.This effect might be more pronounced when selection, rather than random drift, plays a greater role.These considerations all in all accord with our numerical example suggesting that more generous reactive strategies are selectively favored when selection is more intense.It was also numerically shown that  can affect the identity of the best reactive strategy, while in the limit of weak selection  does not affect what the best reactive strategy is as far as it satisfies Inequality (16).In addition, within this particular range of parameter values, we did not find any case in which a paradoxical strategy is the best reactive strategy.

Discussion
We have investigated stochastic evolutionary dynamics of a population in which an individual's fitness is determined by the n-player repeated PD.We have compared the fixation probabilities of different reactive strategies when they appear as a rare mutant in a population of unconditional defectors.Reactive strategies in our analysis are described by a vector where d ϕ represents the probability with which the strategy cooperates in the first round and d j represents the probability with which the strategy cooperates in a given round of a repeated game when j of the n − 1 opponents have cooperated in the previous round.We have considered a set, Ω, of all reactive strategies for which d ϕ is either 0 or 1 and d j is either 0 or 1 to specify the best reactive strategy attaining the maximum fixation probability.In a repeated PD, after one round is played, another round will be played with probability δ.Under the assumption of weak selection, we have specified a threshold of δ below which all strategies satisfying d ϕ = d 0 = 0 are the best reactive strategies.We have also found that when δ exceeds the threshold, d k−1 of the best reactive strategy is solely determined by the sign of f (k), as given by Equation ( 13), which is interpreted as a weighted sum of the cost and benefit of cooperation when k − 1 opponents have cooperated in the proceeding round.
The present study extends our previous analysis (Kurokawa et al. (2010), [19]) by considering a broader set of reactive strategies and non-linear payoff functions.We also investigate how robust the indicated identity of the best reactive strategy is to deviations from the weak selection assumption.Kurokawa et al. (2010) specified the best reactive strategy from n reactive strategies classified as TFT a , which constitute a subset of Ω, under the conventional linear payoff assumption.One of their findings was that any TFT a that tolerates defection by more than a half of the group members can never be the best reactive strategy.The present study has shown, on the one hand, that under the linear payoff assumption, the previous finding holds true even for all reactive strategies in Ω.For non-linear payoff functions, on the other hand, we have shown that the previous finding does not always hold, that is, when the benefit increases in a decelerative manner with the number of cooperators, the best reactive strategy tolerates defection by more than a half of the opponents.We have also found that a strategy tolerating defection by more than a half of the opponents can be the best reactive strategy when selection is not weak.
We have also demonstrated that a paradoxical strategy, in which d k > d k+1 holds for at least some k, can be the best reactive strategy when the cost of cooperation increases with the number of cooperators in a group.As far as we know, it has not been pointed out that a conditional cooperator can sometimes do better by behaving paradoxically in competition with defectors.A potentially relevant observation is that human participants in laboratory experiments of the repeated PD sometimes behave as if they were following a paradoxical strategy (e.g., [58][59][60][61]), though these studies assume linear payoff functions.The present study suggests that it may be of interest to investigate the effect of the shape of the payoff functions on the occurrence of paradoxical behaviors in the laboratory setting.
So far, we have examined the fixation probability of strategy d when appeared as a single mutant in a population of ALLD.In Appendix B we also compare the fixation probability of ALLD when introduced as a single mutant in a population of reactive strategy d, across all d in Ω.A reactive strategy associated with a lower fixation probability of ALLD is regarded as more robust against invasion by ALLD.The reactive strategy that minimizes the fixation probability of ALLD turns out to be TFT n-1 , that is, d = (1, 0, 0, • • • , 0, 1), when δ is larger than the threshold specified by Inequality (16), and the strategies satisfying d ϕ = d 0 = 0 when otherwise.Further, we investigate the ratio of the fixation probability of d in a population of ALLD to the fixation probability of ALLD in a population of d, across all d in Ω.As shown in Appendix C, the reactive strategy maximizing the ratio of the fixation probabilities is TFT n-1 when δ is larger than the threshold, and d ϕ = d 0 = 0 when otherwise.These results are consistent with our earlier analysis [19], even though the present study considers paradoxical strategies and non-linear payoff functions, which were not considered in the previous study.
Natural selection is regarded as favoring a mutant reactive strategy replacing a population of ALLD if the fixation probability of the reactive strategy exceeds 1/N, which is the fixation probability for neutral evolution.The reactive strategies that are the best when δ is small so that Inequality (16) is not satisfied (i.e., d ϕ = d 0 = 0) always have the fixation probability 1/N.Thus, when Inequality (16) is not met, the best reactive strategies and ALLD are selectively neutral.On the other hand, when δ is large so that Inequality ( 16) is satisfied, there are other reactive strategies, including the best one, with the fixation probability larger than 1/N.Hence, in this case, the best reactive strategy is always selectively favored over ALLD.See Appendix D for a further analysis on the case assuming linear payoff functions.
Thus far, our focus has been on the best reactive strategy, which maximizes the fixation probability.However, it is also of interest to examine characteristics of other reactive strategies whose fixation probabilities are relatively high and difference in fixation probabilities among those best strategies.For example, Table 2 gives a list of ten best strategies in the case of n = 10 assuming Equations ( 19) and (20).Figure 3 illustrates the fixation probabilities of the 32 possible strategies in the case of n = 4 for various values of δ, under the assumption of Equations ( 19) and (20).The value of δ makes difference in the order of the strategies.Note that as shown in Appendix A, the value of δ does not have an effect on the order of the strategies belonging to subset of Ω d ϕ ,d 0 .

26.03
Games 2018, 9, 100 13 of 23 not satisfied (i.e.,  =  = 0) always have the fixation probability 1  ⁄ .Thus, when Inequality ( 16) is not met, the best reactive strategies and ALLD are selectively neutral.On the other hand, when  is large so that Inequality ( 16) is satisfied, there are other reactive strategies, including the best one, with the fixation probability larger than 1  ⁄ .Hence, in this case, the best reactive strategy is always selectively favored over ALLD.See Appendix D for a further analysis on the case assuming linear payoff functions.Thus far, our focus has been on the best reactive strategy, which maximizes the fixation probability.However, it is also of interest to examine characteristics of other reactive strategies whose fixation probabilities are relatively high and difference in fixation probabilities among those best strategies.For example, Table 2 gives a list of ten best strategies in the case of  = 10 assuming Equations ( 19) and (20).Figure 3 illustrates the fixation probabilities of the 32 possible strategies in the case of  = 4 for various values of , under the assumption of Equations ( 19) and (20).The value of  makes difference in the order of the strategies.Note that as shown in Appendix A, the value of  does not have an effect on the order of the strategies belonging to subset of Ω , .Table 2.A list of the ten best reactive strategies for  = 10 under the linear payoff assumption of Equations ( 19) and (20).Vector  and the corresponding value of  , − 1  ⁄ are given.

Figure 1 .
Figure 1.Functional forms of () for various payoff assumptions.The horizontal and vertical axes represent , the number of cooperating individuals in a given round among the  group members, and (), a function of  as defined by Equation (13), respectively.Parameter values used are  = 400 and  = 30, (a) The conventional linear payoff assumption, Equations (19) and (20);  = 24 and  = 1 , (b) A non-linear payoff assumption, Equations (42) and (43); (c) A non-linear payoff

Figure 1 .
Figure 1.Functional forms of f (k) for various payoff assumptions.The horizontal and vertical axes represent k, the number of cooperating individuals in a given round among the n group members, and f (k), a function of k as defined by Equation (13), respectively.Parameter values used are N = 400 and n = 30, (a) The conventional linear payoff assumption, Equations (19) and (20); b = 24 and c = 1, (b) A non-linear payoff assumption, Equations (42) and (43); (c) A non-linear payoff assumption, Equations (20) and (52).b = 1.

Figure 2 .
Figure 2. The effects of  and  on the identity of the best reactive strategy.The horizontal axis represents  (0.001 to 0.161 with the interval of 0.02) and the vertical axis is  (0.05 to 0.95 with the interval of 0.05.The fixation probabilities of strategies are obtained without assuming weak selection (i.e., using Equation (8).The parameter values used are  = 50,  = 10,  = 9, and  = 1.

Figure 2 .
Figure 2. The effects of w and δ on the identity of the best reactive strategy.The horizontal axis represents w (0.001 to 0.161 with the interval of 0.02) and the vertical axis is δ (0.05 to 0.95 with the interval of 0.05.The fixation probabilities of strategies are obtained without assuming weak selection (i.e., using Equation (8).The parameter values used are N = 50, n = 10, b = 9, and c = 1.

Figure A1 .
Figure A1.Geometric representations of Inequalities (A34) and (A36) for n = 3.Each reactive strategy d = d ϕ , d 0 , d 1 , d 2 is represented by two points on the xy-plane, P(d ϕ , d 0 , d 1 , d 2 ) and Q(d ϕ , d 0 , d 1 , d 2 ).The empty square represents point M, which is given by x = y = 1 − 1/δ and on the line y = x (x < 0) (the broken line).The thick line is the "critical line," which passes M and has the slope b/(nc).For given strategy d = d ϕ , d 0 , d 1 , d 2 , Inequality (A34) is satisfied if and only if P(d ϕ , d 0 , d 1 , d 2 ) is below the critical line, showing that d = (1, 0, 0, 1) and (1, 0, 1, 1) are the only reactive strategies that are selectively favored when replacing a population of ALLD.Similarly, Inequality (A36) holds true if and only if Q(d ϕ , d 0 , d 1 , d 2 ) is below the critical line.Parameter values used are δ = 0.8 and b/c = 2.4.The asterisks indicate that the corresponding element of d can be either 0 or 1.