# When and How Does Mutation-Generated Variation Promote the Evolution of Cooperation?

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Model

**C**) or to defect (

**D**). Both players execute their action simultaneously. Individuals receive payoff dependent on their own choice, as well as the choice of their opponent, as given in the following payoff matrix:

C | D | |

C | R | S |

D | T | P |

**T**>

**R**>

**P**>

**S**and 2

**R**>

**T**+

**S**. The consequence is that defection generates higher individual payoffs

**T**(>

**R**) and

**P**(>

**S**), while mutual cooperation maximizes the common payoff. In the frPD game, the Prisoner’s Dilemma game is repeated a fixed number (=r) of times. The total number of deterministic strategies adjusted to an frPD game is finite (i.e., ${2}^{{2}^{r}-1}$; [19]). Cressman [19] provides a general analysis of evolutionary frPD games in absence of mutation. The strategy that defects (=d) and the strategy that cooperates (=c) are the two strategies in a one-shot game (r = 1). It follows from the payoff matrix that for any population composition of the two strategies, individuals with strategy d always generate a higher average payoff than individuals with strategy c. Consequently, strategy d evolves towards fixation in infinite populations, i.e., the state in which mutual defection is observed in all PD games. For much the same reason, in mutation-free evolutionary frPD games (r > 1) polymorphic populations composed of players with all deterministic strategies evolve towards states in which the players exclusively defect [18,19]. Unconditional defectors (AllD) obtain fitness dominance (w

_{AllD}≥ w

_{i}for all strategies i; w

_{i}is the fitness of strategy i) during this process of convergence towards full defection [19].

_{ij}is the payoff that a strategy i individual generates in interactions with a strategy j individual, f

_{j}is the frequency of strategy j, and s represents the strategy set (i.e., the collection of the considered strategies). For generality, we assume that the average fitness of strategy i individuals is determined by the sum of payoff from the game and payoff from other “background” activities, i.e., ${w}_{i}=\overline{{p}_{i}}+K$, where background fitness K is the game-unassociated fitness component (in the simulations, we used positive integers for K) which in this paper is assumed identical for each individual. Fitness determines reproductive success but not survival abilities. Consequently, all players have, independent of their performance, the same expected number of pairwise interactions over their life time in the game.

#### 2.1. Strategies

**C**and thereafter repeats the previous action of the opponent, is contained in ‘TfTx’ sets. The strategy TfTx behaves as TfT in the first x rounds and unconditionally defects in the remaining (r − x) rounds. ‘TfTx’ sets contain all TfTx strategies with x = {0, 1, …, r − 1, r}. The extremes represent AllD (x = 0) and TfT (x = r). In absence of mutation, AllD evolves towards fixation in any polymorphism of the ‘TfTx’ strategies [18]. Hence, in our analyses below, we use the fitness of AllD as the benchmark for measuring the relative success of other strategies when mutation-generated variation is introduced in the model.

**D**,

**C**} in response to specific perceived actions of an opponent. The length of the string of letters depends on the number of rounds of the frPD game. The first letter of the code specifies the initial action (i.e., the action played in the first round of the game). The second and third letters of the code represent the responses to the initial action

**D**, respectively

**C**, of an opponent. If a third round is played in the game, we need to add four more positions to code for the response of a strategy to any of the four possible combinations of actions by the opponent in the first and second round. For this example of the 3-round game, we describe the four possible combinations as ‘action sequences’ {

**DD**,

**DC**,

**CD**,

**CC**}, where the action sequence describes the actions taken by the opponent in the order they occurred, i.e.,

**DC**means that the opponent played defect in the first round and cooperate in the second round. Hence, the letters at the fourth to seventh positions in the code represent the responses to action sequences {

**DD**,

**DC**,

**CD**,

**CC**}. In a similar vein, for the 4-round game, letters at the eighth to fifteenth positions of the code represent responses to action sequences {

**DDD**,

**DDC**,

**DCD**,

**DCC**,

**CDD**,

**CDC**,

**CCD**,

**CCC**}, etc. As an example for the game with r = 3, the strategy cddcdcc cooperates in the first round (first letter of the string is c), unconditionally defects in the second round (second and third letters of the string are both d), and defects in the third round only if the opponent defected in the first round and cooperated in the second round (i.e., played action sequence

**DC**) (fourth to seventh letters of the string are all c, except for the fifth letter which codes for action

**D**when the opponent played action sequence

**DC**). In notations of strategy groups, we use dots to mark code positions at which the strategies of the group can differ, i.e., in this notation a dot can be replaced with either d or c. A special example is formed by the strategies ({X1, X2, X3, X4} $\to $ {d, dd., dd.d…, dd.d…d… …}) which signify the groups of strategies that exclusively defect in interactions among each other. We call these strategies “defectors”. Populations exclusively composed of defectors are the only type of populations in which action

**C**is never executed (i.e., full defection by all players).

#### 2.2. Mutation

_{i}as the fraction of mutants that express strategy i. The distributions of such fractions (u) are constants if the probabilities with which mutants express any of the possible strategies in the given strategy set are independent of the parental strategy. The u distributions are dependent variables of population compositions otherwise. We refer to the latter as variable u distributions.

#### 2.3. Evolutionary Dynamics

_{i}′ of strategy i in the next generation is determined as

_{i}is the fraction of mutants that carry strategy i. The frequency dynamics in the continuous-generation model (cgm) follows the replicator-mutator dynamics [21]:

_{AllD}≥ w

_{i}of the evolutionary attractor to set a benchmark for the evolutionary effect of mutation: recurrent mutation significantly affects evolutionary frPD games whenever the fitness relation w

_{i}> w

_{AllD}is either persistently or periodically observed for at least one strategy i. Behavioral differences cause the differences in fitness between strategies i and AllD. Therefore, w

_{i}> w

_{AllD}implies—as AllD is the strategy which always defects—that strategy i employs at least some cooperation. Beyond that the observation w

_{i}> w

_{AllD}does not carry information about the frequency with which strategy i or the remainder of the population executes cooperation. However, frequencies of cooperative behaviors may positively correlate with mutation rates without challenging the fitness dominance of AllD [22]. For example, mutation could frequently generate the strategy unconditional cooperator (AllC). The expected outcome of an increase in mutation rate would then be an increased execution of cooperation but also an increased fitness of AllD. If significant effects (w

_{i}> w

_{AllD}) and increased cooperation co-occur, we take this as an indication that cooperation evolves due to a change in the direction of selection. Please note that significant effects might emerge without the consequence that cooperation is amply executed.

#### 2.4. Simulation Statistics

_{x}

_{,ij}= 1 if strategy i cooperates in round x against strategy j and a

_{x}

_{,ij}= 0 otherwise. The average cooperation per frPD game is given by $\overline{C}={\displaystyle \sum _{i=1}^{r}\overline{{C}_{i}}}$ (in case of non-equilibrium dynamics, $\overline{C}$ is averaged over specified ranges of generations). As 0 ≤ $\overline{C}$ ≤ r, a population with $\overline{C}$ ~ 1 is interpreted as fairly cooperative if r = 1 and as fairly uncooperative if r = 100. For comparisons of evolutionary frPD with different r-values, we thus use the average number of

**C**executions per Prisoner’s Dilemma game, ${\scriptscriptstyle \frac{\overline{C}}{r}}$. The average payoff per frPD game generated with payoff

**P**is given by $\overline{{p}_{P}}=P{\displaystyle \sum _{i,j\in s}{a}_{P,ij}{f}_{i}}{f}_{j}$ whereby a

_{P}

_{,ij}is the number of times strategy i generates payoff

**P**from strategy j. For payoffs

**T**,

**S,**and

**R**, we analogously define the averages $\overline{{p}_{T}}$, $\overline{{p}_{S}}$, and $\overline{{p}_{R}}$.

## 3. Results

_{i}> w

_{AllD}). In the first subsection, we analyze the contributions to the significant impacts that result from direct effects of mutation and from the indirect effects of mutation-induced population compositions. In the second subsection, simulations of Xr-populations are used to assess the relevance of indirect effects in absence of direct effects.

#### 3.1. Effects of Mutation on the Evolution of ‘TfTx’ Sets and of Xr Sets

_{i}= $\sum _{j\in s}{u}_{j}{p}_{ij}$ from interactions with mutants. The average fitness of strategy i individuals is w

_{i}= (1 − μ)θ

_{i}+ μπ

_{i}+ K, where θ

_{i}is the average payoff generated from interactions with non-mutants. We interpret the occurrence of inequalities π

_{i}> π

_{AllD}as direct effects and of inequalities θ

_{i}> θ

_{AllD}as indirect effects if they coincide with the observation of significant effects (w

_{i}> w

_{AllD}). Direct effects and indirect effects are not mutually exclusive. In the following, the discussion on indirect effects focuses on their emergence for cases when direct effects are excluded (i.e., π

_{AllD}≥ π

_{i}for all i $\in $ s and at all population compositions).

_{i}, i.e., the payoff from interactions with mutants is independent of the population composition. Then, direct effects can be excluded if (π

_{i}– π

_{AllD}) ≤ 0 for all strategies i. Direct effects can occur (and are inevitable for sufficiently high μ-values) if (π

_{i}– π

_{AllD}) > 0 for at least one strategy i. For variable u distributions, the averages π

_{i}are functions of population compositions. These compositions are also functions of the mutation rate μ. As a consequence, there is no simple expression for when direct effects can emerge. We focus on analytical results assuming constant u distributions and only briefly discuss the more complicated case of variable u distributions.

_{TfTx}− π

_{AllD}= $\sum _{i=0}^{x-1}({\pi}_{TfTi+1}-{\pi}_{TfTi}})$ (note that in ‘TfTx’ notation, AllD is TfT0). The adjacent strategies TfTx−1 and TfTx perform identically with mutants expressing strategies TfTy for which y ≤ x − 2. TfTx individuals generate one additional round of mutual cooperation from interactions with mutants expressing strategies TfTy for which y ≥ x. TfTx individuals are exploited by TfTx−1 mutants at a single occasion, and they do not exploit TfTx mutants in round x + 1. Consequently, strategy TfTx is more effective in interactions with mutants than TfTx−1 (π

_{TfTx}− π

_{TfTx}

_{−1}> 0) if

_{TfT}

_{0}= u

_{TfT}

_{1}= … = u

_{TfTr}), the distribution of the π values has a single peak at π

_{TfTx}whereby x is the highest integer for which inequality (r + 1 − x) (

**R**−

**P**) >

**T**−

**S**is satisfied. As a consequence, direct effects can be obtained for uniform distributions by manipulating μ if r(

**R**−

**P**) >

**T**−

**S**(i.e., π

_{TfT}

_{1}− π

_{AllD}> 0). The expectation that changes in conditions yielding increased x-values also result in increased execution of cooperation at the evolutionary equilibrium, was confirmed in a set of simulations.

_{ij}is the action sequence (of length r) that strategy i triggers from strategy j, and ρ

_{i}is the entire set of responses ρ

_{ij}(j $\in $ Xr) of strategy i. Given the comprehensiveness of Xr sets it follows that—for arbitrary set ρ

_{i}(i $\in $ Xr)—the same number of respective responses is found for each of the 2

^{r}action sequences (i.e., ρ

_{i}and ρ

_{j}(i ≠ j) are two permutations of the same set of sequences). The consequence is that, with uniform u distributions, the mean behaviors of mutants are not influenced by the strategy of the opponents. In that case, it can be inferred from the payoff dominance of

**D**over

**C**that AllD generates the absolute highest mean payoff from mutants (π

_{AllD}> π

_{i}).

_{i}values for the conditional strategies is symmetric because the behavior of unconditional strategies is not influenced by the opponent. In the Appendix A, we define the space of symmetric u distributions. Note, for both symmetric and asymmetric distributions, increasing the share of unconditional strategies tends to favor π

_{AllD}as AllD expresses best response behavior to unconditional strategies. As outlined for the uniform distributions, direct effects can be excluded for all symmetric distributions. Hence, direct effects emerge only if strategies can trigger distinct mean mutant behaviors (i.e., the key characteristic of asymmetric distributions).

_{i}values of ‘TfTx’ sets are formed from u distributions of Xr sets by setting the u

_{i}values to zero for strategies outside the ‘TfTx’ sets). It is apparent, for direct effects to occur, that average encounters with mutants should be inefficient for AllD but efficient for certain other strategies, i.e., mutants should tend to conditionally defect in interactions with AllD and should tend to conditionally cooperate in interactions with certain other strategies. Examples are distributions (such as ‘TfTx’) for which mutants tend to express reciprocal behaviors [20,23].

#### 3.2. Simulations of Xr-Populations

**T**,

**S**,

**R**} = {5, 0, 3}, while varying mutual defection payoff

**P**= {0.05, 0.3, 1} and mutation rates μ = {0.0001, 0.001, 0.01, 0.1}. For these parameter combinations, Table 2 shows whether populations evolve to an equilibrium or not (equilibrium conditions are described in the Appendix B). For all settings, {X1, X2}-populations (i.e., playing the one-round and the two-round game) evolve to equilibrium (Table 2). The table shows that for

**P**= {0.05, 0.3}, no equilibrium is attained in the evolution of certain X3-populations and of certain X4-populations.

_{AllD}> f

_{i}for i ≠ AllD)—this characteristic applies to all observed equilibrium populations in our study. Furthermore, all observed equilibrium strategy frequencies f

_{i}are identical for both continuous and discrete-generation models. At equilibrium, dominance of AllD implies that the strategy also has fitness dominance. We do not find persistent indirect effects in the populations that do not reach equilibrium. Consequently, we do not find persistent indirect effects in the simulations.

**P**= 1, the {X1, X2, X3, X4}-populations evolve to equilibrium for all mutation rates (Table 2). For rates μ = {0.001, 0.01, 0.1}, Table 3a shows the average number of

**C**executions per Prisoner’s Dilemma game (${\scriptscriptstyle \frac{\overline{C}}{r}}$) in these equilibrium populations. For each setting, these averages increase with mutation rates. The ${\scriptscriptstyle \frac{\overline{C}}{r}}$-values of X1-populations (Table 3a) are only slightly higher than the inflow of cooperator (c) mutants (~0.5μ). The execution of cooperation can thus be attributed to c-mutants. The table shows for each mutation rate that ${\scriptscriptstyle \frac{\overline{C}}{r}}$-values of {X2, X3, X4}-populations are approximately three times higher than those of X1. We attribute this difference to the fact that sets {X2, X3, X4} contain conditional strategies. Table 3b shows that evolution to an equilibrium is found in simulations of X3-populations using the two background fitness values K = {0, 5, 20}. Along K = {0, 5, 20} we find an increase in mean cooperation for each rate μ (Table 3b).

**P**= 0.05), {X3, X4}-populations do not converge to equilibrium in the simulations with the two lowest mutation rates. For the intermediate

**P**-value of Table 2, this phenomenon is also observed for X3-populations at the lowest rate and for X4-populations at the three lowest rates. With its 256 times smaller set size, the X3-populations are more convenient to study. This is why we mainly study non-equilibrium behavior in X3-populations.

**P**= 0.05, Figure 1a shows the mean execution of cooperation per frPD game ($\overline{C}$) along μ = {0.00001, 0.0001, 0.001, 0.01, 0.1}. For the lowest rate and for the two highest rates, these means are sampled at equilibrium. As mentioned, the equilibrium frequencies are not affected by the choice of the generation model (i.e., dgm or cgm). Hence, the $\overline{C}$-values are identical in Figure 1a for each of these rates. After transient phases, the populations at rates μ = {0.0001, 0.001} evolve in cycles. As an example, consider the strategy dynamics at rate μ = 0.001 in Figure 1b for dgm and in Figure 1c for cgm. Table 4 lists the strategies with max(f

_{i}) > 0.1 during the cycles for these two figures. For mutation rates μ = {0.0001, 0.001}, the $\overline{C}$-values in Figure 1a are averaged over one cycle period. The $\overline{C}$-values are identical if populations are initialized with f

_{AllD}= 1 and with a uniform frequency distribution. For both mutation rates, the averages $\overline{C}$ are higher if sampled over dgm-cycles than if sampled over cgm-cycles (Figure 1a). The figure also shows that for both models, the $\overline{C}$-values are higher in the cycling populations than for the equilibrium populations at μ = 10

^{−5}. The $\overline{C}$-values are higher than the equilibrium-values found at the higher rate μ = 0.01 for the dgm at rates μ = {0.0001, 0.001} and for the cgm at rate μ = 0.001 (Figure 1a). Consequently, for both types of generation models, an optimum in $\overline{C}$ exists within the interval 10

^{−5}< μ < 0.01.

**C**actions for each round of the game ($\overline{{C}_{i}}$, i = {1, 2, 3}). Cooperation is more intensively executed during TfT1 phases, especially in round 1 (Figure 2a). The relatively longer TfT1 phase durations in the dgm-populations (compare Figure 1b with Figure 1c) explain that $\overline{C}$-values are higher in dgm-populations than in corresponding cgm-populations (Figure 1a at μ = {0.0001, 0.001}).

**P**,

**T**, and

**R**, Figure 2b shows the dynamics of the mean payoff values per frPD game (i.e., $\overline{{p}_{P}}$, $\overline{{p}_{T}}$, and $\overline{{p}_{R}}$). Steep increases in the generation of

**T**-payoffs (Figure 2b) mark the onset of invasions by TfT1 (Figure 1b). Defectors like AllD generate this payoff in the first round when interacting with TfT1 and defectors are the dominant opponents of this strategy at the onset of invasions (Figure 1b). The increase in the generation of

**T**-payoffs is therefore partly explained by defectors triggering this payoff from TfT1. For TfT1, these first round interactions seem disadvantageous, but this disadvantage is evidently compensated because TfT1 invades.

**P**,

**T**, or

**R**(i.e., $P{\displaystyle \sum _{i\in \{1,2,3\}}{(1-\overline{{C}_{i}})}^{2}}$, $T{\displaystyle \sum _{i\in \{1,2,3\}}(1-\overline{{C}_{i}})\overline{{C}_{i}}}$, and $R{{\displaystyle \sum _{i\in \{1,2,3\}}\overline{{C}_{i}}}}^{2}$). For payoff

**T**, the observed value is higher than the expected value (Figure 2b) over the dominance phase of TfT1 (Figure 1b). These differences between observed and expected values are caused by the conditional behaviors in rounds 2 and 3. Hence, we propose that the invasions of TfT1 are fueled by triggering

**T**-payoffs in these rounds. At the onset of invasions, AllD is the dominant strategy (Figure 1b) and defection is the predominant behavior (Figure 2). Defectors (in contrast to non-defectors) are not penalized when interacting with AllD and they can therefore be expected to perform better than other strategies in AllD-dominated populations. The strategy TfT1 generates

**T**-payoffs from the twelve defectors {ddcd…, dddd.c.}. Game interactions between these defectors and TfT1 indeed significantly contribute (data not shown) to the increases of $\overline{{p}_{T}}$ (Figure 2b).

**P**−

**S**)/(

**T**−

**P**) (~0.01 in the simulation of Figure 1b). This condition is fulfilled over the entire cycle period in Figure 1b, but the population state deviates from full defection due to mutation. In this state, AllD obtains the highest benefit from interactions with mutants (i.e., μ(π

_{AllD}− π

_{TfT}

_{1}) > 0). Thus, the invasion conditions in the simulations should be more stringent than those derived in the appendix. Before the onset of the invasions, the population does converge towards a state of full defection (Figure 2) and thus towards the conditions underlying the analysis in the appendix. In our opinion, the invasions of TfT1 in the simulations are fueled by interactions with defectors {ddcd…, dddd.c.}, just like in the analysis. That AllD subsequently regains dominance, thereby closing the cycle (Figure 1b,c), is in line with the expectations from the selection dynamics of evolutionary frPD games [18,19].

_{AllD}− π

_{TfT}

_{1}) > 0).

^{4}generations due to constraints on computation time. Consequently, the data obtained do not allow definitive conclusions on the nature of non-equilibrium X4-dynamics. Over the simulation periods, chaotic dynamics occurs for the X4-populations with non-equilibrium dynamics in Table 2. For example, in Figure 3, the X4-frequency dynamics at {

**P**, μ} = {0.3, 0.01} exhibits a transient period of ~2000 generations, after which alternations of dominance by strategies {AllD, ddcdddddddddddd, TfT1, dddddcddddddddd} emerge. The population therefore expresses periodic indirect effects. As in Figure 1b,c, strategies AllD and TfT1 in Figure 3 become periodically dominant, with dominant phases of strategies {AllD, TfT1, dddddcddddddddd} that have fairly regular phase lengths (Figure 3).

**T**-payoff from X2-defector ddc, two

**T**-payoffs from X3-defectors ddcdc.., and three

**T**-payoffs from X4-defectors ddcdc..d…c….

## 4. Discussion

#### 4.1. Direct and Indirect Effects of Mutation-Generated Variation

_{i}) from the interactions with mutants. In nature, u distributions are probably variable; for example, the u distribution is variable if mutation swaps single code positions rather than modifies entire codes/strategies (as in our study). Variable u distributions would complicate the analysis of direct effects because the population composition has to be considered (whereby genotype x mutation interactions, i.e., genotypes differ in their propensity to mutate, would further complicate this analysis).

_{i}-values. For example, it is not clear to us whether π

_{i}-values are still constants with constant u distributions. The fraction of mutants would definitely deviate from μ. However, if we had considered differences in viability (rather than fertility), then we cannot think of a reason our qualitative findings with respect to the ‘TfTx’ sets, the Xr sets, and the symmetric u distributions would have changed.

#### 4.2. Direct Effects as a Mechanism Promoting the Evolution of Cooperation

#### 4.3. Indirect Effects as a Mechanism Promoting the Evolution of Cooperation

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. The Symmetric u Distributions

_{x}

_{1}, unconditional cooperation with probability u

_{x}

_{2}, and conditional behavior with probability u

_{x}

_{3}(=1 − u

_{x}

_{1}− u

_{x}

_{2}). In case of the latter mutation event, the probability that the mutant carries either code determining conditional round x behavior requires the following property: the chance that the mutant defects (cooperates) in this round is independent of the previous action sequence played by the opponent. This condition is given if either of these codes is carried by the mutant with equal probability.

_{xD}(1 − u

_{xC}) and mutates otherwise. Conditional round x behavior always mutates. Dependent on the parental genotype, we therefore find three forms of mutation events. For each form, mutation proceeds analogously as described for constant symmetric u distributions (whereby the three probabilities {u

_{x}

_{1}, u

_{x}

_{2}, u

_{x}

_{3}} can be distinct between the forms).

## Appendix B. Equilibrium Conditions

_{i}) in the dgm are less than 10

^{−8}between consecutive generations. Note, the corresponding cgm-populations evolve to the same frequency distributions for all observed equilibria. The approached equilibria are identical for initial X1-populations {f

_{D}= 0, f

_{D}= 0.5, f

_{D}= 1}. The equilibrium {X2, X3}-populations are identical if initiated with f

_{AllD}= 1 or if initiated with uniform frequency distributions. Only the dgm-model is implemented to simulate X4-populations and all X4-populations are initiated with f

_{AllD}= 1. X4-equilibrium is assumed if all differences ${f}_{i}\prime $ − f

_{i}decrease over consecutive generations for a period of 2000 generations.

## Appendix C. Invasion Condition of TfT1 in X3-Defector Populations

**S**+ 2

**P**) from interactions with defector A individuals, payoff (

**S**+ 2

**T**) from interactions with defector B subgroup ddcd.c., and payoff (

**S**+

**P**+

**T**) from interactions with the other defector B subgroup. The combined frequency of defectors B strategies is defined as f

_{B}. We calculate the most stringent condition for invasion of TfT1, i.e., assuming that the defectors ddcd.c., from which TfT1 generates the higher payoff, are absent. Then, TfT1 individuals should obtain above average payoffs if f

_{B}(

**S**+

**P**+

**T**) + (1 − f

_{B}) (

**S**+ 2

**P**) > 3

**P**. Consequently, negligibly small f

_{TfT}

_{1}-values increase if f

_{B}> (

**P**−

**S**)/(

**T**−

**P**).

## References

- Abrams, P.A. Modelling the adaptive dynamics of traits involved in inter- and intraspecific interactions: An assessment of three methods. Ecol. Lett.
**2001**, 4, 166–175. [Google Scholar] [CrossRef] - Barta, Z. Individual variation behind the evolution of cooperation. Philos Trans. R. Soc. B
**2016**, 371, 20150087. [Google Scholar] [CrossRef] [PubMed][Green Version] - McNamara, J.M.; Barta, Z.; Houston, A.I. Variation in behaviour promotes cooperation in the prisoner’s dilemma game. Nature
**2004**, 428, 745–748. [Google Scholar] [CrossRef] [PubMed] - McNamara, J.M.; Barta, Z.; Fromhage, L.; Houston, A.I. The coevolution of choosiness and cooperation. Nature
**2008**, 451, 189–192. [Google Scholar] [CrossRef] [PubMed] - Eriksson, A.; Lindgren, K. Cooperation driven by mutations in multi-person Prisoner’s Dilemma. J. Theor. Biol.
**2005**, 232, 399–409. [Google Scholar] [CrossRef] [PubMed] - Traulsen, A.; Hauert, C.; De Silva, H.; Nowak, M.A.; Sigmund, K. Exploration dynamics in evolutionary games. Proc. Natl. Acad. Sci. USA
**2009**, 106, 709–712. [Google Scholar] [CrossRef] [PubMed][Green Version] - McNamara, J.M.; Leimar, O. Variation and the response to variation as a basis for successful cooperation. Philos Trans. R. Soc. B
**2010**, 365, 2627–2633. [Google Scholar] [CrossRef][Green Version] - Wedekind, C.; Milinski, M. Human cooperation in the simultaneous and the alternating Prisoner’s Dilemma: Pavlov versus Generous Tit-for-Tat. Proc. Natl. Acad. Sci. USA
**1996**, 93, 2686–2689. [Google Scholar] [CrossRef] - Fischbacher, U.; Gächter, S.; Fehr, E. Are people conditionally cooperative? Evidence from a public goods experiment. Econ. Lett.
**2001**, 71, 397–404. [Google Scholar] [CrossRef][Green Version] - Camerer, C.F. Behavioral Game Theory: Experiments in Strategic Interaction; Princeton University Press: Princeton, NJ, USA, 2003. [Google Scholar]
- Camerer, C.F.; Fehr, E. When does “economic man” dominate social behavior? Science
**2006**, 311, 47–52. [Google Scholar] [CrossRef] - Henrich, J.; McElreath, R.; Barr, A.; Ensminger, J.; Barrett, C.; Bolyanatz, A.; Cardenas, J.C.; Gurven, M.; Gwako, E.; Henrich, N.; et al. Costly punishment across human societies. Science
**2006**, 312, 1767–1770. [Google Scholar] [CrossRef] [PubMed] - Herrmann, B.; Thöni, C.; Gächter, S. Antisocial punishment across societies. Science
**2008**, 319, 1362–1367. [Google Scholar] [CrossRef] [PubMed] - van den Berg, P.; Molleman, L.; Junikka, J.; Puurtinen, M.; Weissing, F.J. Human cooperation in groups: Variation begets variation. Sci. Rep.
**2015**, 5, 16144. [Google Scholar] [CrossRef] [PubMed] - Dantzer, B.; Rubenstein, D.R. Introduction to Symposium: The Developmental and Proximate Mechanisms Causing Individual Variation in Cooperative Behavior. Integr. Comp. Biol.
**2017**, 57, 560–565. [Google Scholar] [CrossRef] [PubMed] - Ule, A.; Schram, A.; Riedl, A.; Cason, T.N. Indirect Punishment and Generosity toward Strangers. Science
**2009**, 326, 1701–1704. [Google Scholar] [CrossRef] - Swakman, V.; Molleman, L.; Ule, A.; Egas, M. Reputation-based cooperation: Empirical evidence for behavioral strategies. Evol. Hum. Behav.
**2016**, 37, 230–235. [Google Scholar] [CrossRef] - Nachbar, J.H. Evolution in the finitely repeated prisoner’s dilemma. J. Econ. Behav. Organ.
**1992**, 19, 307–326. [Google Scholar] [CrossRef] - Cressman, R. Evolutionary stability in the finitely repeated prisoner’s dilemma game. J. Econ. Theory
**1996**, 68, 234–248. [Google Scholar] [CrossRef] - Axelrod, R.; Hamilton, W.D. The evolution of cooperation. Science
**1981**, 211, 1390–1396. [Google Scholar] [CrossRef] - Page, K.M.; Nowak, M.A. Unifying Evolutionary Dynamics. J. Theor. Biol.
**2002**, 219, 93–98. [Google Scholar] [CrossRef][Green Version] - Willensdorfer, M.; Nowak, M.A. Mutation in evolutionary games can increase average fitness at equilibrium. J. Theor. Biol.
**2005**, 237, 355–362. [Google Scholar] [CrossRef] [PubMed][Green Version] - Trivers, R.L. The evolution of reciprocal altruism. Q. Rev. Biol.
**1971**, 46, 35–57. [Google Scholar] [CrossRef] - Axelrod, R. The Evolution of Cooperation; Basic Books: New York, NY, USA, 1984. [Google Scholar]
- Lindgren, K. Evolutionary phenomena in simple dynamics. In Artificial Life II; Farmer, J.D., Langton, C.G., Rasmussen, S., Taylor, C.A., Eds.; Addison-Wesley: Redwood City, CA, USA, 1992; pp. 295–312. [Google Scholar]
- Hauert, C.; Schuster, H.G. Effects of increasing the number of players and memory steps in the iterated Prisoner’s Dilemma, a numerical approach. Proc. R. Soc. Lond. B
**1997**, 264, 513–519. [Google Scholar] [CrossRef] - Van Veelen, M.; García, J.; Rand, D.G.; Nowak, M.A. Direct reciprocity in structured populations. Proc. Natl. Acad. Sci. USA
**2012**, 109, 9929–9934. [Google Scholar] [CrossRef] [PubMed][Green Version] - Van Veelen, M. Robustness against indirect invasions. Games Econ. Behav.
**2012**, 74, 382–393. [Google Scholar] [CrossRef] - Clutton-Brock, T. Cooperation between non-kin in animal Societies. Nature
**2009**, 462, 51–57. [Google Scholar] [CrossRef] [PubMed] - Nowak, M.A. Five rules for the evolution of cooperation. Science
**2006**, 314, 1560–1565. [Google Scholar] [CrossRef] [PubMed] - Newton, J. Evolutionary game theory—A renaissance. Games
**2018**, 9, 31. [Google Scholar] [CrossRef] - Angus, S.D.; Newton, J. Emergence of shared intentionality is coupled to the advance of cumulative culture. PLoS Comput. Biol.
**2015**, 11. [Google Scholar] [CrossRef] [PubMed] - Newton, J. Shared intentions: The evolution of collaboration. Games Econ. Behav.
**2017**, 104, 517–534. [Google Scholar] [CrossRef]

**Figure 1.**(

**a**) Average execution of cooperation per game ($\overline{C}$) as function of mutation rate in X3-simulations of dgm-populations (white) and of cgm-populations (gray). For rates μ = {10

^{−5}, 0.01, 0.1}, averages are sampled at equilibrium, and for rates μ = {0.0001, 0.001}, averages are sampled over a cycle period (see panel (

**b**,

**c**)). Fixed parameters:

**P**= 0.05,

**T**= 5,

**S**= 0,

**R**= 3, K = 0; (

**b**) For the conditions of panel (

**a**), the frequency dynamics of an evolving dgm-population at rate μ = 0.001. 1: ddddddd (AllD); 2: cdddddd (TfT1); 3: ddddddc; 4: ddcdddd; 5: cdcdddd (TfT2); 6: dcddddd; 7: dccdddd; (

**c**) For the conditions of panel (

**a**), the frequency dynamics of an evolving cgm-population at rate μ = 0.001. Panel (

**b**,

**c**) use the same line code.

**Figure 2.**Both panels show behavioral statistics of the evolving populations in Figure 1b. (

**a**) Average execution of cooperation in round 1 (${\overline{C}}_{1`}$: red), in round 2 (${\overline{C}}_{2`}$ : blue), and in round 3 (${\overline{C}}_{3`}$ : green) as function of time; (

**b**) Average payoff generated per game with payoff

**P**(red), payoff

**T**(green), and payoff

**R**(blue) as function of time. Dashed lines represent respective expected values (i.e., $P{\displaystyle \sum _{i\in \{1,2,3\}}{(1-\overline{{C}_{i}})}^{2}}$, $T{\displaystyle \sum _{i\in \{1,2,3\}}(1-\overline{{C}_{i}})\overline{{C}_{i}}}$, and $R{{\displaystyle \sum _{i\in \{1,2,3\}}\overline{{C}_{i}}}}^{2}$) and solid lines represent respective observed values (i.e., $\overline{{p}_{P}}$, $\overline{{p}_{T}}$, and $\overline{{p}_{R}}$).

**Figure 3.**The frequency dynamics of an X4-simulation at {

**P**, μ} = {0.3, 0.01}; other conditions as in Figure 1. The four vertical lines indicate the position of a respective dominance phase of strategies {AllD, ddcdddddddddddd, TfT1, dddddcddddddddd}.

f_{i} | frequency of strategy i |

p_{ij} | payoff of a strategy i individual from interactions with a strategy j individual, |

s | the strategy set (i.e., the collection of the considered strategies), |

${\overline{p}}_{i}={\displaystyle \sum _{j\in s}{p}_{ij}{f}_{j}}$ | average payoff from game interactions for a strategy i individual, |

${w}_{i}=\overline{{p}_{i}}+K$ | average fitness of strategy i individuals, |

K | game-unassociated fitness component, i.e., background fitness, |

$\overline{w}={\displaystyle \sum _{i}{f}_{i}{w}_{i}}$ | average fitness of the population, |

μ | mutation rate, |

u_{i} | fraction of mutants that carry strategy I, |

r | number of rounds in the frPD. |

**Table 2.**Observed type of dynamics in the final phases of simulations of four Xr sets (top row) at three mutual defection payoffs (first column). For each {Xr,

**P**} combination, simulations were performed at the mutation rates μ = {0.0001, 0.001, 0.01, 0.1}. In this alignment, letters {n, e, E} of the four-digit strings represents the dynamics found in simulations at respective rate; n: non-equilibrium dynamics, e: equilibrium in which only defectors obtain above average fitness, and E: equilibrium in which non-defectors obtain above average fitness (for example, eeeE means that equilibrium is found at all four rates whereby non-defectors attain above average fitness only at rate μ = 0.1). Fixed parameters:

**T**= 5,

**S**= 0,

**R**= 3, K = 0.

X1 | X2 | X3 | X4 | |

P = 0.05 | eeee | eeee | nnEE | nnEE |

P = 0.3 | eeee | eeee | nEEE | nnnE |

P = 1 | eeee | eeee | eeeE | eeeE |

**Table 3.**Average amount of executed

**C**actions per PD game (${\scriptscriptstyle \frac{\overline{C}}{r}}$) found in populations at equilibrium, as a function of mutation rate. The top row gives mutation rates, first column the Xr sets, and second column the background fitness values K. Part a: varying strategy set for K = 0; part b: varying background fitness for strategy set X3. The ${\scriptscriptstyle \frac{\overline{C}}{r}}$-values from equilibrium populations in which non-defectors obtain above average fitness are given in italic. Fixed parameters:

**P**= 1,

**T**= 5,

**S**= 0,

**R**= 3.

μ = 0.001 | μ = 0.01 | μ = 0.1 | |||

a | |||||

X1 | K = 0 | 0.0005 | 0.0051 | 0.0577 | |

X2 | K = 0 | 0.012 | 0.012 | 0.141 | |

X3 | K = 0 | 0.0014 | 0.0148 | 0.1844 | |

X4 | K = 0 | 0.0013 | 0.013 | 0.1712 | |

b | |||||

X3 | K = 0 | 0.0014 | 0.0148 | 0.1844 | |

X3 | K = 5 | 0.0038 | 0.0382 | 0.3294 | |

X3 | K = 20 | 0.0109 | 0.1002 | 0.43 |

**Table 4.**List of strategies that obtain peak frequencies higher than 0.1 (max(f

_{i}) > 0.1) within the cycle phases of the dgm-dynamics depicted in Figure 1b. Code representation (conventional name in brackets) of the strategies is given in the second column. The peak frequency within the cycle phases is given in the third column. The peak frequency within the cycle phases of the cgm-dynamics of Figure 1c is given in the fourth column.

Strategy | max(f_{i}) | max(f_{i}) | |
---|---|---|---|

1 | ddddddd (AllD) | 0.921 | 0.921 |

2 | cdddddd (TfT1) | 0.544 | 0.544 |

3 | ddddddc | 0.201 | 0.201 |

4 | ddcdddd | 0.168 | 0.168 |

5 | cdcdddd (TfT2) | 0.166 | 0.165 |

6 | dcddddd | 0.159 | 0.157 |

7 | dccdddd | 0.103 | 0.102 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Spichtig, M.; Egas, M. When and How Does Mutation-Generated Variation Promote the Evolution of Cooperation? *Games* **2019**, *10*, 4.
https://doi.org/10.3390/g10010004

**AMA Style**

Spichtig M, Egas M. When and How Does Mutation-Generated Variation Promote the Evolution of Cooperation? *Games*. 2019; 10(1):4.
https://doi.org/10.3390/g10010004

**Chicago/Turabian Style**

Spichtig, Mathias, and Martijn Egas. 2019. "When and How Does Mutation-Generated Variation Promote the Evolution of Cooperation?" *Games* 10, no. 1: 4.
https://doi.org/10.3390/g10010004