When and how does mutation-generated variation promote the evolution of cooperation?

It has been hypothesized that mutation-generated variation in behavior can drive the evolution of cooperation. In this article, we distinguish between two effects of mutation that can generate this in evolutionary games of the finitely repeated Prisoner’s Dilemma in infinite asexual populations. First, we consider a particular setting with strategies that play Tit-for-Tat for the first x rounds and unconditionally defect thereafter, and where mutation only alters x. For this regime, we show how cooperation can evolve through the direct effect of mutation, i.e., the instantaneous impact of mutational variation before selection may act upon these mutants. This direct effect suffices to explain earlier findings that behavioral variation can promote the evolution of cooperation. However, we question the generality of these findings because for many (if not most) mutation regimes mutation generates the highest direct effect on unconditional defectors (AllD). Overall, mutation is therefore expected to hamper rather than promote the evolution of cooperation. We identify special conditions (e.g., intermediate mutation rates) for which this negative direct effect of mutation on cooperation can be countered by an indirect effect of mutation, i.e., the fitness impact that individuals obtain from interactions with descendants of mutants. Simulations for these conditions show that populations dominated by AllD are invaded by cooperative strategies despite the positive direct effect of mutation on AllD. Thus, here it is the indirect effect of mutation on cooperative strategies that drives the evolution of cooperation. The population evolves towards a state where the action ‘cooperate’ is more frequently executed, but this is not achieved by individuals triggering reciprocity (‘genuine altruism’), but by individuals exploiting the willingness of others to cooperate (‘exploitative altruism’). By highlighting the distinction between direct and indirect effect of mutation we provide a new perspective on how mutation-generated variation alters frequency-dependent selection. M. Spichtig, M.W. Sabelis, M. Egas Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, P. O. Box 94248, 1090 GE Amsterdam, The Netherlands


Introduction
Evolution is driven by selection acting on heritable phenotypic variation. The amount of phenotypic variation can be described as a function of selection modulating the variation generated by mutation, recombination, and environmental factors. If selection is frequency-dependent, it can in turn be described as a function of the distribution of phenotypic variants in the population. A standard approach in evolutionary ecology is to ignore selection effects of phenotypic variation by reducing the analysis to interaction dynamics of an invader phenotype in an otherwise homogeneous resident population (Abrams 2001). This approach is defendable if the addition of other phenotypic variants constitutes no more than noise on the selective interactions between these two phenotypes. Under a regime of frequency-dependent selection, however, increasing phenotypic variation may alter the outcome of evolutionary analyses qualitatively. For example, increasing (mutation-generated) variation in behavior can promote the evolution of cooperation (McNamara et al. 2004(McNamara et al. , 2008Eriksson and Lindgren 2005;Traulsen et al. 2009;McNamara and Leimar 2010).
The existence of behavioral variation is generic in studies of the evolution of cooperation (e.g., Wedekind and Milinski 1996;Fischbacher et al. 2001;Camerer 2003;Camerer and Fehr 2006;Henrich et al. 2006;Herrmann et al. 2008). Hence, it may provide a general explanation for the evolution of cooperation. However, this prediction by McNamara et al. (2004McNamara et al. ( , 2008, Eriksson and Lindgren (2005), and Traulsen et al. (2009) is generated from simulating asexual populations. Each of these simulations is based on a particular choice of mutation regime, i.e., a mutation rate (μ), a set of strategies that mutation can generate, and a rule of how mutant strategies tend to differ from the parental strategies (variation from recombination and from environmental factors is ignored). Given the relative lack of knowledge on the genetic underpinning of behavioral traits, arbitrary choices are inevitable. In asexual organisms, mutation is the source of heritable variation and the mutation regime determines the expressed (heritable) variation. Therefore, we investigate whether variation-promoted evolution of cooperation is robust to the choice of mutation regime.
To analyze the effects of mutation regime on the evolutionary dynamics we start from a narrow-sense definition of mutants: mutants are the individuals with a genotype distinct from their parent, i.e., a mutation event occurred (transforming its genotype) during the generation of the individual. As a consequence of this narrow-sense definition, faithfully replicated offspring of mutants are not mutants themselves (in contrast to the use of the term in 'mutant-wild type' contexts where faithfully replicated offspring of mutants remain categorized as the mutant type). Because selection acts on variation after it is generated by mutation, mutants constitute the fraction of the population that has not yet been affected by selection. We refer to the direct effect of mutation when it concerns the average fitness contributions (π) generated from interactions with the narrow-sense mutants. Of course, the fitness impact of mutation goes beyond that of the mutants alone as long as they (faithfully) produce descendants. We refer to this impact as the indirect effect of the mutants when it concerns the average fitness contributions (θ) generated from interaction with non-mutants (each with an ancestry of variable length to a mutant). Thus, π refers to the direct effect of mutation, i.e., the fitness impact of the mutation-generated variation yet unaffected by selection, and θ refers to the indirect effect of mutation, i.e., the fitness impact of the mutationgenerated variation after being subject to selection.
In this paper we show that the distinction between direct and indirect fitness effects is useful in providing insight in the impact of mutation-generated variation on the evolution of cooperation. Our results show that (1) earlier findings on the impact of behavioral variation on the evolution of cooperation are no more than special cases where there is a positive direct effect of mutation, (2) cooperation can evolve through an indirect effect of mutation-generated variation even if unconditional defectors benefit most from the direct effect, and (3) in cases with positive indirect effects of mutation, the selected strategies in the population exhibit exploitation of cooperative acts.

Model
We use the evolutionary game version of the finitely repeated Prisoner's Dilemma game (frPD ;Nachbar 1992;Cressman 1996) to study the impact of different mutation regimes. In the Prisoner's Dilemma (PD) game, individuals engage in pairwise interactions where they choose to cooperate (C) or to defect (D). Both players execute their action simultaneously. Individuals receive payoff dependent on their own choice as well as the choice of their opponent, as given in the following payoff matrix: The payoffs follow the relations T > R > P > S and 2 R > T + S. The consequence is that defection generates higher individual payoffs T (> R) and P (> S), while mutual cooperation maximizes the common payoff. In the frPD game, the Prisoner's Dilemma game is repeated for a fixed number (= r) of times. The total number of deterministic strategies adjusted to an frPD game is finite (i.e., 1 2 2 − r ; Cressman 1996). Cressman (1996) provides a general analysis of evolutionary frPD games in absence of mutation. The strategy that defects (= d) and the strategy that cooperates (= c) are the two strategies in a one-shot game (r = 1). It follows from the payoff matrix that for any population composition of the two strategies, individuals with strategy d always generate a higher average payoff than individuals with strategy c. Consequently, strategy d evolves towards fixation in infinite populations, i.e., the state in which mutual defection is observed in all PD games. For much the same reason, in mutation-free evolutionary frPD games (r > 1) polymorphisms composed of all deterministic strategies evolve towards states in which the players exclusively defect (Nachbar 1992;Cressman 1996). Unconditional defectors (AllD) obtain fitness dominance (w AllD ≥ w i for all strategies i; w i is the fitness of strategy i) during this process of convergence towards full defection (Cressman 1996).
In our model, we assume infinite, asexual populations composed of deterministic strategies. The code used for the strategies is described below. Strategy i individuals generate average payoff ∈ = ∑ ij j i j s p p f from game interactions; p ij is the payoff that strategy i individuals generate in interactions with strategy j individuals, f j is the frequency of strategy j, and s represents the strategy set (i.e., the collection of the consid-ered strategies). The average fitness of strategy i individuals is defined by where background fitness K is the game-unassociated fitness component (in the simulations, we used positive integers for K) which is identical for each individual. Fitness determines reproductive success but not survival abilities. Consequently, all players have, independent of their performance, the same expected number of pairwise interactions over their life time in the game.
We analyze evolutionary changes in this model using both discrete generations and continuous (overlapping) generations. In the discrete-generation model (dgm), the frequency f i ' of strategy i in the next generation is determined as is the average fitness of the population, μ is the mutation rate, and u i is the fraction of mutants that carry strategy i. The frequency dynamics in the continuous-generation model (cgm) follows the replicator-mutator dynamics (Page and Nowak 2002):

Analysis
We use the characteristic w AllD ≥ w i of the evolutionary attractor to set a benchmark for the evolutionary effect of mutation: recurrent mutation significantly affects evolutionary frPD games whenever the fitness relation w i > w AllD is either persistently or periodically observed for at least one strategy i. Behavioral differences cause the differences in fitness between strategies i and AllD. Therefore, w i > w AllD implies -as AllD is the strategy which always defects -that strategy i employs at least some cooperation. Beyond that the observation w i > w AllD does not carry information about the frequency with which strategy i or the remainder of the population executes cooperation. However, frequencies of cooperative behaviors may positively correlate with mutation rates without challenging the fitness dominance of AllD (Willensdorfer and Nowak 2005). For example, mutation could frequently generate the strategy unconditional cooperator (AllC). The expected outcome of an increase in mutation rate would then be an increased execution of cooperation but also an increased fitness of AllD. If significant effects (w i > w AllD ) and increased cooperation co-occur, we take this as an indication that cooperation evolves due to a change in the direction of selection. Note that significant effects might emerge without the consequence that cooperation is amply executed.

Statistics
In our simulations, we sample the following behavioral statistics. The average cooperation in round x is given by whereby a x,ij = 1 if strategy i cooperates in round x against strategy j and a x,ij = 0 otherwise. The average cooperation per frPD game is given by (in case of non-equilibrium dynamics, C is averaged over specified ranges of generations). As 0 ≤ C ≤ r, a population with C~ 1 is interpreted as fairly cooperative if r = 1 and as fairly uncooperative if r = 100. For comparisons of evolutionary frPD with different r-values, we thus use the average number of C executions per Prisoner's Dilemma game, r C . The average payoff per frPD game generated with payoff P is given by whereby a P,ij is the number of times strategy i generates payoff P from strategy j. For payoffs T, S, and R, we analogously define the averages T p , S p , and R p .

Strategies
We assume genotypes determine strategies. In certain mutation regimes neutral mutations can occur, i.e., parent and mutant offspring do not share the genotype but they do share the strategy. The mutation rate μ determines the fraction of offspring that do not carry the parental genotype. We refer to these offspring as mutants. We define u i as the fraction of mutants that express strategy i. The distributions of such fractions (u) are constants if the probabilities with which mutants express any of the strategies are independent of the parental strategy. The u distributions are dependent variables of population compositions otherwise. We refer to the latter as variable u distributions.
Strategies are defined by their responses to each possible sequence of actions of an opponent in the game. We encode strategies by bit strings of digits {d, c} which represent actions {D, C} in response to the perceived actions of an opponent. The first position of the code gives the initial action. Second and third positions represent responses to the initial action D, respectively C, of an opponent. Fourth to seventh position represent responses to action sequences {DD, DC, CD, CC}, eighth to fifteenth positions represent responses to action sequences {DDD, DDD, DCD, DCD, CDD, CDC, CCD, CCC}, etc. As an example of the game with r = 3, the strategy cddcdcc cooperates in the first round, unconditionally defects in the second round, and defects in the third round only if the opponent defected in the first round and cooperated in the second round (i.e., played action sequence DC). In notations of strategy groups, we use dots to mark code positions at which the strategies of the group can differ, i.e., in this notation a dot can be replaced with either d or c. A special example is formed by the strategies ({r1, r2, r3, r4} → {d, dd., dd.d…, dd.d…d……..}) which signify the groups of strategies that exclusively defect in interactions among each other. We call these strategies 'defectors'. Populations exclusively composed of defectors are the only type of populations in which action C is never executed (i.e., full defection by all players).
With the code explained above all deterministic frPD strategies can be described. We use the name rX for the strategy sets that contain all deterministic strategies, whereby X represents the number of rounds of the frPD game. Simulations comprise the rX sets {r1, r2, r3, r4} which contain respectively {2, 8, 128, 32768} strategies. For technical reasons due to the size of the set, only dgm-simulations were performed for the r4 set.
Besides the rX sets, we use a type of strategy sets proposed by Nachbar (1992). We term these sets 'TfTx'. A strategy contained in 'TfTx' sets is 'Tit for Tat' (TfT; Axelrod and Hamilton 1981) which starts with C and thereafter repeats the previous actions of opponents. The strategy TfTx behaves as TfT in the first x rounds and unconditionally defects in the remaining (r -x) rounds. 'TfTx' sets contain all TfTx strategies with x = {0, 1, …, r -1, r}. The extremes represent AllD (x = 0) and TfT (x = r). In absence of mutation, AllD evolves towards fixation in any polymorphism of the 'TfTx' strategies (Nachbar 1992). McNamara et al. (2004) altered the frPD game in that players can end the game after any round (whereby both players receive an identical payoff for each unplayed round). We do not implement this altered frPD game to avoid the consequent increase in the number of deterministic strategies and to take the work of Nachbar (1992) and Cressman (1996) as a starting point of our analysis (assuming it can be extrapolated to the game of McNamara et al. 2004). Hence, we consider 'TfTx' sets as analogous to the sets used by McNamara et al. (2004). In order to confirm this analogy, we repeated the simulations of McNamara et al. (2004) using 'TfTx' sets and retrieved qualitatively the same results in these simulations as McNamara et al. (2004) did with their sets (Spichtig, data not shown).

Results
In absence of recurrent mutation, AllD obtains fitness dominance at the evolutionary attractors for 'TfTx' strategy sets (Nachbar 1992) and for rX strategy sets (Nachbar 1992;Cressman 1996). Hence, in our analyses we use the fitness of AllD as the benchmark for measuring the relative success of other strategies when mutationgenerated variation is introduced in the model. Specifically, the evolutionary impacts of recurrent mutation are significant whenever strategies persistently or periodically exceed the fitness of AllD (w i > w AllD ). In the first subsection, we analyze the contributions to the significant impacts that result from direct effects of mutation and from the indirect effects of mutation-induced population compositions. In the second subsection, simulations of rX-populations are used to assess the relevance of indirect effects in absence of direct effects.

Effects of mutation on the evolution of 'TfTx' sets and of rX sets
Populations can be subdivided in fractions μ of mutants and (1 -μ) of non-mutants.
Strategy i generates average payoff π i = ∑ ∈s j ij j p u from interactions with mutants. The average fitness of strategy i individuals is w i = (1 -μ)θ i + μπ i + K, where θ i is the average payoff generated from interactions with non-mutants. We interpret the occurrence of inequalities π i > π AllD as direct effects and of inequalities θ i > θ AllD as indirect effects if they coincide with the observation of significant effects (w i > w AllD ). Direct effects and indirect effects are not mutually exclusive. We expect indirect effects generally to co-emerge with direct effects. In the following, the discussion on indirect effects focuses on their emergence for cases when direct effects are excluded (i.e., π AllD ≥ π i for all i ∈ s and at all population compositions).
For constant u distributions, strategies generate fixed returns π i , i.e., the payoff from interactions with mutants is independent of the population composition. Then, direct effects can be excluded if (π i -π AllD ) ≤ 0 for all strategies i. Direct effects can occur (and are inevitable for sufficiently high μ-values) if (π i -π AllD ) > 0 for at least one strategy i. For variable u distributions, the averages π i are functions of population compositions. These compositions are also functions of the mutation rate μ. As a consequence, there is no simple expression for when direct effects can emerge. We focus on analytical results assuming constant u distributions and only briefly discuss the more complicated case of variable u distributions.
For 'TfTx' sets, the difference in performance between TfTx and AllD in interactions with mutants can be given by the recursion π TfTx -π AllD = ) ( TfTi TfTi π π (Note that in 'TfTx' notation, AllD is TfT0). The adjacent strategies TfTx-1 and TfTx perform identically with mutants expressing strategies TfTy for which y ≤ x -2. TfTx individuals generate one additional round of mutual cooperation from interactions with mutants expressing strategies TfTy for which y ≥ x. TfTx individuals are exploited by TfTx-1 mutants at a single occasion, and they do not exploit TfTx mutants in round x + 1. Consequently, strategy TfTx is more effective in interactions with mutants than TfTx-1 From this inequality it follows that for uniform 'TfTx'-u distributions (i.e., u TfT0 = u TfT1 = … = u TfTr ), the distribution of the π values has a single peak at π TfTx whereby x is the highest integer for which inequality (r + 1 -x) (R -P) > T -S is satisfied. As a consequence, direct effects can be obtained for uniform distributions by manipulating μ if r(R -P) > T -S (i.e., π TfT1 -π AllD > 0). The expectation that changes in conditions yielding increased x-values also result in increased execution of cooperation at the evolutionary equilibrium, was confirmed in a set of simulations (Spichtig, data not shown).
For rX sets, the following property should be noted. The response ρ ij is the action sequence (of length r) that strategy i triggers from strategy j, and ρ i is the entire set of responses ρ ij (j ∈ rX) of strategy i. Given the comprehensiveness of rX sets it follows that -for arbitrary set ρ i (i ∈ rX) -the same number of respective responses is found for each of the 2 r action sequences (i.e., ρ i and ρ j (i ≠ j) are two permutations of the same set of sequences). The consequence is that, with uniform u distributions, the mean behaviors of mutants are not influenced by the strategy of the opponents. In that case, it can be inferred from the payoff dominance of D over C that AllD generates the absolute highest mean payoff from mutants (π AllD > π i ).
For the uniform distributions analyzed above, mean behaviors of mutants are not influenced by the strategy of the opponents. We refer to such u distributions with unbiased average mutant behaviors as symmetric and to alternative u distributions with biased average mutant behaviors as asymmetric. This distinction is useful because not only uniform u distributions of rX sets are symmetric. For example, any distribution with uniform u i values for the conditional strategies is symmetric because the behavior of unconditional strategies is not influenced by the opponent. In Appendix 3-1, we define the space of symmetric u distributions. Note, for both symmetric and asymmetric distributions, increasing the share of unconditional strategies tends to favor π AllD as AllD expresses best response behavior to unconditional strategies. As outlined for the uniform distributions, direct effects can be excluded for all symmetric distributions. Hence, direct effects emerge only if strategies can trigger distinct mean mutant behaviors (i.e., the key characteristic of asymmetric distributions).
Symmetric u distributions for rX sets are a special case. The 'TfTx' sets -as shown above -allow for direct effects, and they represent asymmetric distributions (the u i values of 'TfTx' sets are formed from u distributions of rX sets by setting the u i values to zero for strategies outside the 'TfTx' sets). It is apparent, for direct effects to occur, that average encounters with mutants should be inefficient for AllD but efficient for certain other strategies, i.e., mutants should tend to conditionally defect in interactions with AllD and should tend to conditionally cooperate in interactions with certain other strategies. Examples are distributions (such as 'TfTx') for which mutants tend to express reciprocal behaviors (Trivers 1971;Axelrod and Hamilton 1981).

Simulations of rX-populations
In order to gain insight in indirect effects, we performed simulations using the rX strategy sets {r1, r2, r3, r4} with uniform u distributions. As discussed in the previous subsection, direct effects are excluded with uniform u distributions. For these parameter combinations, Table 3-1 shows whether populations evolve to an equilibrium or not (equilibrium conditions are described in Appendix 3-1). For all settings, {r1, r2}-populations (i.e., playing the one-round and the two-round game) evolve to equilibrium (Table 3-1). The table shows that for P = {0.05, 0.3}, no equilibrium is attained in the evolution of certain r3-populations and of certain r4populations.
The equilibrium populations described in Table 3-1 are dominated by AllD (i.e., f AllD > f i for i ≠ AllD) -this characteristic applies to all observed equilibrium populations in our study. Furthermore, all observed equilibrium strategy frequencies f i are identical for both continuous and discrete-generation models. At equilibrium, dominance of AllD implies that the strategy also has fitness dominance. We do not find persistent indirect effects in the populations that do not reach equilibrium. Consequently, we do not find persistent indirect effects in the simulations.
The observed cooperation in the populations of Table 3-2 is maintained by mutation-selection balance because direct and indirect effects are absent. This interpretation of the r C -data is straightforward for the r1-populations. For the populations with repeated games, cooperation can be argued to be disadvantageous because 'non-AllD'individuals would increase their fitness by substituting their strategy for AllD. However, we emphasize that in several {r3, r4}-populations, non-defectors obtain above average fitness at equilibrium (Tables 3-1 and 3-2). The potential for the evolution of conditional behavior in repeated games ({r2, r3, r4}) seems to reduce selection against cooperation (as cooperation levels are higher for these sets than in r1; see Table 3-2a). As expected, a similar effect can be attributed to increasing background fitness K (Table 3-2b). Table 3-1 shows that for the lowest mutual defection payoff (P = 0.05), {r3, r4}populations do not converge to equilibrium in the simulations with the two lowest mutation rates. For the intermediate P-value of Table 3-1, this phenomenon is also observed for r3-populations at the lowest rate and for r4-populations at the three lowest rates. With its 256 times smaller set size, the r3-populations are more convenient to study. This is why we mainly study non-equilibrium behavior in r3-populations.  strings represents the dynamics found in simulations at respective rate; n: non-equilibrium dynamics, e: equilibrium in which only defectors obtain above average fitness, and E: equilibrium in which nondefectors obtain above average fitness (for example, eeeE means that equilibrium is found at all four rates whereby non-defectors attain above average fitness only at rate μ = 0.1). Fixed parameters: T = 5, S = 0, R = 3, K = 0. R1 r2 r3 r4 P = 0.05 eeee eeee nnEE nnEE P = 0.3 eeee eeee nEEE nnnE P = 1 eeee eeee eeeE eeeE  For P = 0.05, Fig. 3-1a shows the mean execution of cooperation per frPD game ( C ) along μ = {0.00001, 0.0001, 0.001, 0.01, 0.1}. For the lowest rate and for the two highest rates, these means are sampled at equilibrium. As mentioned, the equilibrium Hence, the C -values are identical in Fig. 3-1a for each of these rates. After transient phases, the populations at rates μ = {0.0001, 0.001} evolve in cycles. As an example, consider the strategy dynamics at rate μ = 0.001 in Fig. 3-1b for dgm and in Fig. 3-1c for cgm. Table 3-3 lists the strategies with max(f i ) > 0.1 during the cycles for these two figures. For mutation rates μ = {0.0001, 0.001}, the C -values in Fig. 3-1a are averaged over one cycle period. The C -values are identical if populations are initialized with f AllD = 1 and with a uniform frequency distribution. For both mutation rates, the averages C are higher if sampled over dgm-cycles than if sampled over cgm-cycles ( Fig. 3-1a). The figure also shows that for both models, the C -values are higher in the cycling populations than for the equilibrium populations at μ = 10 -5 . The C -values are higher than the equilibrium-values found at the higher rate μ = 0. 01 for the dgm at rates μ = {0.0001, 0.001} and for the cgm at rate μ = 0.001 (Fig. 3-1a). Consequently, for both types of generation models an optimum in C exists within the interval 10 -5 < μ < 0.01. The strategy dynamics in Fig. 3-1b,c resemble those in the corresponding simulations with the lower mutation rate μ = 0.0001. All four cycles show (as in Fig. 3-1b,c) alterations of phases with dominance of AllD followed by phases with dominance of TfT1 (cdddddd). As can be inferred from these dynamics, AllD respectively TfT1 have the highest fitness when invading the populations. Consequently, these populations express periodic indirect effects. In Fig. 3-2, we give behavioral statistics from the simulation of Fig. 3-1b. Fig. 3-2a shows the dynamics of the mean number of executed C actions for each round of the game ( i C , i = {1, 2, 3}). Cooperation is more intensively executed during TfT1 phases, especially in round 1 (Fig. 3-2a). The relatively longer TfT1 phase durations in the dgm-populations (compare Fig. 3-1b with 3-1c) explain that C -values are higher in dgm-populations than in corresponding cgmpopulations ( Fig. 3-1a at μ = {0.0001, 0.001}). Table 3-3 List of strategies that obtain peak frequencies higher than 0.1 (max(f i ) > 0.1) within the cycle phases of the dgm-dynamics depicted in Fig. 3-1b. Code representation (conventional name in brackets) of the strategies is given in the second column. The peak frequency within the cycle phases is given in the third column. The peak frequency within the cycle phases of the cgm-dynamics of Fig. 3-1c  For the three payoffs P, T, and R, Fig. 3-2b shows the dynamics of the mean payoff values per frPD game (i.e., P p , T p , and R p ). Steep increases in the generation of T-payoffs (Fig. 3-2b) mark the onset of invasions by TfT1 (Fig. 3-1b). Defectors like AllD generate this payoff in the first round when interacting with TfT1 and defectors are the dominant opponents of this strategy at the onset of invasions (Fig. 3-1b). The increase in the generation of T-payoffs is therefore partly explained by defectors triggering this payoff from TfT1. For TfT1, these first round interactions seem disadvantageous, but this disadvantage is evidently compensated because TfT1 invades. Fig. 3-2b additionally shows the dynamics of expected average payoff values generated per game from receiving payoff P, T, or R (i.e., (1 ) . For payoff T, the observed value is higher than the expected value ( Fig. 3-2b) over the dominance phase of TfT1 (Fig. 3-1b). These differences between observed and expected values are caused by the conditional behaviors in rounds 2 and 3. Hence, we propose that the invasions of TfT1 are fueled by triggering T-payoffs in these rounds. At the onset of invasions, AllD is the dominant strategy (Fig. 3-1b) and defection is the predominant behavior ( Fig. 3-2). Defectors (in contrast to nondefectors) are not penalized when interacting with AllD and they can therefore be expected to perform better than other strategies in AllD-dominated populations. The strategy TfT1 generates T-payoffs from the twelve defectors {ddcd…, dddd.c.}. Game interactions between these defectors and TfT1 indeed significantly contribute (data not shown) to the increases of T p (Fig. 3-2b).
In Appendix 3-1, we derive the invasion condition for a single TfT1-individual in a population state with full defection. We find that such invasion occurs if the combined frequency of defectors {ddcd…, dddd.c.} exceeds (P -S)/(T -P) (~ 0.01 in the simulation of Fig. 3-1b). This condition is fulfilled over the entire cycle period in Fig. 3-1b, but the population state deviates from full defection due to mutation. In this state, AllD obtains the highest benefit from interactions with mutants (i.e., μ(π AllD -π TfT1 ) > 0).
Thus, the invasion conditions in the simulations should be more stringent than those derived in Appendix 3-1. Before the onset of the invasions, the population does converge towards a state of full defection (Fig. 3-2) and thus towards the conditions underlying the analysis in Appendix 3-1. In our opinion, the invasions of TfT1 in the simulations are fueled by interactions with defectors {ddcd…, dddd.c.}, just like in the analysis. That AllD subsequently regains dominance, thereby closing the cycle (Fig. 3-1b,c), is in line with the expectations from the selection dynamics of evolutionary frPD games (Nachbar 1992;Cressman 1996).
In Tables 3-1 and 3-2, we mark the equilibria (E in Table 3-1 and italic numbers in Table 3-2) in which non-defectors have above average fitness. The strategy TfT1 has the highest fitness among the non-defectors in these equilibria. Furthermore, these equilibria emerge at the higher mutation rates (Tables 3-1 and 3-2) possibly because mutation benefits TfT1 (e.g., by generating defectors {ddcd…, dddd.c.} opponents) in these equilibria. However, invasion by this strategy is prevented also because AllD is the strategy that benefits most from interactions with mutants (μ(π AllD -π TfT1 ) > 0).
The r4-simulations are more computation-intensive than the r3-simulations and we restricted these simulations to 10 4 generations due to constraints on computation time. Consequently, the data obtained do not allow definitive conclusions on the nature of non-equilibrium r4-dynamics. Over the simulation periods, chaotic dynamics occurs for the r4-populations with non-equilibrium dynamics in Table 3- Fig. 3-1b,c, strategies AllD and TfT1 in Fig. 3-3 become periodically dominant with dominance phases of strategies {AllD, TfT1, dddddcddddddddd} that have fairly regular phase lengths (Fig. 3-3).
For {r1, r2, r3, r4}, Table 3-1 shows that all populations evolve to equilibrium in the two smallest sets {r1, r2}, and non-equilibrium dynamics occurs more frequently when going from set r3 to set r4 (e.g., the r3-population evolves to equilibrium under the conditions of Fig. 3-3). We interpret this observation as an indication that increasing the number of rounds (r) increases the parameter range for which periodic indirect effects emerge. This interpretation meets our intuition because strategy TfT1 generates one T-payoff from r2-defector ddc, two T-payoffs from r3-defectors ddcdc.., and three T-payoffs from r4-defectors ddcdc..d…c….

Discussion
Whereas empirical ecologists typically observe wide behavioral variation, theoretical ecologists tend to ignore or minimize behavioral variation in their models in order to make their analyses tractable. In this paper, we provide a method to analyze effects of behavioral variation on evolutionary dynamics, and apply it to the evolution of cooperation. We present a model in which behavioral variation is on the one hand subject to a restriction because probabilistic strategies are excluded, yet on the other hand comprehensive because all deterministic strategies are taken into account [see Axelrod (1984), Lindgren (1992), Hauert and Schuster (1997) and van Veelen and Garcia (2011) for earlier studies of evolutionary repeated games with comprehensive strategy sets]. We first discuss the method of analysis, and then when and how behavioral variation affects the evolution of cooperation.

Direct and indirect effects of mutation-generated variation
We consider direct and indirect effects of mutation in frequency dependent selection environment. Direct effects are fitness effects that emerge from (game) interactions with mutants and indirect effects are fitness effects that emerge from interactions with descendants of mutants. Direct effects are mainly determined by the mutation regime (as reflected in the u distribution of mutants over all possible strategies). Indirect effects are additionally influenced by selection on the progeny of mutants. Both in direct and indirect effects invading strategies depend on the presence of other strategies [see van Veelen (2012) for analysis of invasions that depend on other strategies].
In our analysis of direct and indirect effects of mutation in asexual populations, we take advantage of simplifying cladistics: each mutant is the founder of a clade whereby the clade constitutes the clonal descendants that faithfully inherit the genome of the mutant. Direct effects are caused by interactions with founders and indirect effects are caused by interactions with descendants. Would we have modeled sexual reproduction, then this would complicate the cladistics as clades crossover thereby generating new behavioral variants in another way than by mutation. Cultural transmission, however, could have similarly simple cladistics as our asexual model: a system with innovators and imitators could to some degree be analogous to our system with mutants and descendants (non-mutants).
To analyze direct effects for large strategy sets, we assume constant u distributions resulting in constant returns (π i ) from the interactions with mutants. In nature, u distributions are probably variable: for example, the u distribution is variable if mutation swaps single code positions rather than modifies entire codes/strategies (as in our study). Variable u distributions would complicate the analysis of direct effects because the population composition has to be considered (whereby genotype × mutation interactions, i.e., genotypes differ in their propensity to mutate, would further complicate this analysis).
We also assume that fitness differences concern differences in fertility. Would we have considered fitness differences in viability, this would complicate the determination of the π i -values. For example, it is not clear to us whether π i -values are still constants with constant u distributions. The fraction of mutants would definitely deviate from μ. However, would we consider differences in viability (rather than fertility), then we cannot think of a reason why our qualitative findings with respect to the 'TfTx' sets, the rX sets, and the symmetric u distributions would change.
Indirect effects require sufficiently strong frequency-dependent selection: they only emerge when the strength of selection is high. Indeed, indirect effects do not emerge in our simulations above certain values of background fitness K. In order to demonstrate indirect effects our method is best applied to a system with a unique attractor in the selection environment in absence of mutation. This attractor in turn is best represented by a strategy that benefits most from interactions with the mutants, because, otherwise, direct effects could blur indirect effects.

Direct effects as a mechanism promoting the evolution of cooperation
We use an evolutionary game version of the finitely repeated Prisoner's Dilemma (frPD), because it has a unique attractor in absence of mutation: unconditional defector (AllD) (Cressman 1996). This provides a straightforward criterion to test what happens when mutation is included: mutation has significant effects whenever at least one strategy persistently or periodically achieves a higher fitness than that of AllD.
We show conditions for direct effects when mutation produces 'TfTx' strategies (Nachbar 1992), which permit the evolution of cooperation. We further show that the evolution of cooperation through direct effects are excluded for rX strategy sets (Cressman 1992) with a uniform u distribution. For the latter analysis, we define a class of u distributions, the symmetric u distributions (see Appendix 3-1: uniform u distributions of rX sets are examples of symmetric u distributions), for which the average (conditional) behavior of mutants is independent of the opponent strategy. Whenever the opponent strategy does not influence the average behavior of mutants, then AllD is the opponent strategy that receives the highest payoff from the interactions with mutants. Hence, direct effects are excluded for symmetric u distributions.
The 'TfTx' sets exemplify how to create mutation regimes that can elicit direct effects resulting in the evolution of cooperation. The TfTx strategies are contained in the corresponding rX set and the rX sets are subsets of the (infinite) space of probabilistic strategies adjusted to frPD games. We introduce the symmetric u distributions because they constitute a boundary in the u distribution space: they separate the subspace where mutants cooperate most often with AllD from the subspace where mutants cooperate most often with another strategy. Direct effects can only emerge in the latter subspace, depending on the conditions of the frPD game and the mutation rate. Note that the emergence of direct effects does not imply that the population evolves to a state where cooperation is amply executed (e.g., if the strategy benefiting most from the mutants is a defector).
Direct effects that result in the evolution of cooperation emerge only in a fraction of the u distribution space. We can only speculate about the size of this fraction. Even if this fraction is tiny, it may be important for the evolution of cooperation if u distributions in natural systems would fall into this category. Nevertheless, the empirical evidence for direct reciprocity (in general) is scarce (Clutton-Brock 2009), let alone evidence for mutation regimes. In any case, for evolutionary games with a clear attractor strategy like AllD for evolutionary frPD games, we conjecture that this strategy is most likely to benefit most from the mutants. Consequently, the evolution of cooperation by direct effects is possible but we predict it is not very likely in general.
The study of McNamara et al. (2004) inspired our definition of direct effects. In our view, the evolution of cooperation in the studies of McNamara et al. (2008), Eriksson and Lindgren (2005), and Traulsen et al. (2009) is explained by direct effects. The strategy sets used in these studies are only part of a much larger set of (deterministic) strategies, just like the 'TfTx' sets in relation to the rX sets. Furthermore, if their mutation regimes were replaced by regimes comprising broader strategy sets, then we would expect that the unconditional defectors in their evolutionary games benefit most from interactions with mutants. Hence, we think that the mutation-induced promotion of the evolution of cooperation, as described by McNamara et al. (2004McNamara et al. ( , 2008, Eriksson and Lindgren (2005), and Traulsen et al. (2009), is a rather special outcome. If, however, behavioral variation is not caused by mutation, but culturally-inherited, then cooperation may evolve under a wider set of conditions if it is true that humans choose from 'TfTx' strategy sets and disregard most rX strategies.

Indirect effects as a mechanism promoting the evolution of cooperation
In simulations of rX sets with uniform u distributions (i.e., a condition without direct effects), we observe periodic indirect effects (Fig. 3-1b,c and 3-3). We find several conditions where such effects emerge (Table 3-1) at intermediate mutation rates (Fig.  3-1a). The periodic indirect effects all show a similar pattern (Fig. 3-1b,c and 3-3): a population dominated by AllD is invaded by strategy TfT1 and vice versa, giving rise to cycles of alternating dominance. We suggest that the invasion of TfT1 is due to a group of defectors defined by {ddcd…, dddd.c.} (see also Appendix 3-1C). The behavior of these defectors towards other defectors is similar to the dominant behavior in AllD-dominated populations: play defect in all rounds of the game. Therefore, these defectors are less vulnerable to the (AllD-influenced) selection in such populations and decrease less (due to mutation-selection balance) than other rX strategies. TfT1 exploits the behavioral deviations that certain defectors have from the behavior of AllD (see Appendix 3-1C). As a consequence, the execution of cooperation increases during TfT1-invasions (Fig. 3-2a) and the average execution of cooperation can exceed the execution value expected from a mutation-selection balance (Fig. 3-1a).
We only found periodic indirect effects. We suspect that -for evolutionary frPD games (without direct effects) -persistent indirect effects emerge only for special parameter regions, if they exist at all. This is because they require a strong enough effect of the AllD individuals on the fitness of others already before this strategy fully achieves fitness dominance.
The periodic indirect effects observed in our study resulted in periodic increases in the execution of the action 'cooperation' (Fig. 3-2a). But one may ask whether this increase constitutes co-operation in the sense of individuals mutually helping each other. If TfT invades a population otherwise composed by AllD then after the first round the TfT-players cooperate only with other TfT-players. Cooperation therefore mostly takes place among individuals with the same phenotype (i.e., TfT players) and the mutual cooperation payoff is generated more often than expected. Such positive assortment is known as a fundamental principle for the evolution of cooperation (Nowak 2006). In our simulations, we find the opposite: decisive executions of cooperation take place between individuals with different phenotypes, i.e., TfT1 and defectors {ddcd…, dddd.c.}, and the mutual cooperation payoff is generated less often than expected (Fig. 3-2b). Furthermore, in the interactions between TfT1 and these defectors it is not beneficial for the defectors to stick to their strategy, as they would fare better by playing unconditional defection. On the other hand, it is typical for the evolution of cooperation that the average payoff increases, as observed during TfT1 invasions (Fig.  3-2b).
Strategy TfT invades in the large set of strategies that invade in the wake of TfT1 invasions ( Fig. 3-1b,c and 3-3). This conditional cooperator might play a more pronounced role in frPD games with more than four rounds (as increasing r promotes the performance of TfT). Unfortunately, we cannot check this prediction because the sheer size of the corresponding rX sets (e.g., 2147483648 strategies in a game with five rounds) makes running simulations with five or more rounds unfeasible.

Conclusions
By analyzing direct effects, we explain the phenomenon of mutation-promoted evolution of cooperation, described earlier by McNamara et al. (2004McNamara et al. ( , 2008, Eriksson and Lindgren (2005) and Traulsen et al. (2009). However, we argue that this phenomenon is probably a rather special case: the strategy favored by selection without mutation -AllD in our study -is most likely also the strategy with the highest benefit from interactions with mutants (direct effect). In such cases, cooperation can still evolve as an indirect effect of mutation-generated variation under a limited set of conditions. The resulting cooperation dynamics, however, shows exploitation of cooperative acts rather than mutual cooperation.
The study of McNamara et al. (2004) is seminal in highlighting the importance of behavioral variation in evolutionary dynamics. Theoreticians have avoided this topic because behavioral variation complicates model analysis. To facilitate such analysis, our method to separate direct and indirect effects of behavioral variation is a useful