Multi-Lineage Evolution in Viral Populations Driven by Host Immune Systems

Viruses evolve in the background of host immune systems that exert selective pressure and drive viral evolutionary trajectories. This interaction leads to different evolutionary patterns in antigenic space. Examples observed in nature include the effectively one-dimensional escape characteristic of influenza A and the prolonged coexistence of lineages in influenza B. Here, we use an evolutionary model for viruses in the presence of immune host systems with finite memory to obtain a phase diagram of evolutionary patterns in a two-dimensional antigenic space. We find that, for small effective mutation rates and mutation jump ranges, a single lineage is the only stable solution. Large effective mutation rates combined with large mutational jumps in antigenic space lead to multiple stably co-existing lineages over prolonged evolutionary periods. These results combined with observations from data constrain the parameter regimes for the adaptation of viruses, including influenza.


Introduction
Different viruses exhibit diverse modes of evolution [1][2][3][4], from relatively slowly evolving viruses that show stable strains over many host generations such as measles [5], to co-existing serotypes or strains such as noroviruses [6] or influenza B [7,8], and quickly mutating linear strains such as most known variants of influenza A [9]. Despite the different patterns of evolutionary phylogenies and population diversity, all viruses share the common feature that they co-evolve with their hosts' immune systems. The effects of the co-evolution depend on the mutation timescales of the viruses and the immune systems, the ratio of which varies for different viruses. However, in the simplest setting, the population of hosts exerts a selective pressure on the viral population, resulting in the evolution of the viral population towards more distant areas of antigenic space from the host population. Here, we explore this mutual dynamics in a model of viruses that evolve in the background of host immune systems. While several previous studies of pathogen-immune dynamics have focused on specific systems [2,4,5,8,[10][11][12][13][14][15][16], here we study generic evolutionary patterns [3,17]. Specifically, we are interested in how the host immune cross-reactivity and memory control the patterns of viral diversity.
These evolutionary processes lead to joint dynamics that has often been modeled by so-called Susceptible-Infected-Recovered (SIR) approaches to describe the host population [18,19], possibly coupled with a mutating viral population. In their simplest form, these models have successfully explained and predicted the temporal and historical patterns of infections, such as measles [5], where there are little mutations, or dengue, where enhancement between a small number of strains can lead to complex dynamics [20]. These methods have been important in helping develop vaccination and public health policies.
Apart from a large interest in the epidemiology of viruses [10], a large extension of SIR models has also tackled questions on the role of complete and partial cross-coverage, and how that explains infection patterns for different viruses [2,11], the role of spatial structure on infections [8], as well as antigenic sin [12,21]. Most of these questions were asked with the goal of explaining infection and evolutionary patterns of specific viruses, such as dengue [11,12], influenza [4,[13][14][15] or Zika [16]. Here, we take a more abstract approach, aimed at understanding the role of immunological cross-reactivity and mutation distance in controlling the evolutionary patterns of diversity.
At the same time, the wealth of samples collected over the years, aided by sequencing technologies, has allowed for data analysis of real evolutionary histories for many types of viruses. One of the emerging results is the relatively low dimensionality of antigenic space-an effective phenotypic space that recapitulates the impact of host immune systems on viral evolution. Antigenic mapping, which provides a methodology for a dimensionality reduction of data [9] based on phenotypic titer experiments, such as Hemaglutanin Inhibition (HI) assays for influenza [22], has shown that antigenic space is often effectively low-dimensional. For example, influenza A evolution is centered on a relatively straight line in antigenic space [15]. This form suggests that at a given time influenza A strains form a quasispecies of limited diversity in antigenic space, with escape mutations driven by antigenic pressure moving its center of mass [3,8,17].
We focus on a simplified model of viral evolution in a finite-dimensional space that delineates evolutionary patterns with different complexity of coexisting lineages. Recent models of these dynamics have focused only on the linear evolutionary regime relevant of influenza A [8] or have used an infinite-dimensional representation of antigenic space [3]. Here, we also model immune memory in more detail, while keeping a simplified infection dynamics with a small number of model parameters. Unlike in previous approaches, we assume a finite memory timescale. While our treatment does not account for many features of host-immune dynamics (as discussed in Sections 2 and 4), it offers a stepping stone to future more in-depth analysis of the role of host repertoires. Our analysis is motivated by different evolutionary trends observed in influenza: the single strain of influenza A compared to the two stably co-existing lineages of influenza B. Using these observations as a starting point, we study a generic model which assumes that immune receptors can recognize and remember several viruses, additionally to virus mutation and immune-driven selection and we show that these elements are sufficient to obtain specific evolutionary patterns. This model is stripped of many of the details that are undoubtably important for the specific case of influenza, such as seasonal variability, geographic and temporal niches, cross-infections between species, etc. However, thanks to its generality, our model shows that the different evolutionary trends can be obtained without calling upon niches or subpopulations, and it can be generalized to a range of fast-evolving viruses that cause acute, single species host infections. Our goal is not to model the evolution of any specific virus but to identify the conditions under which different evolutionary trends emerge.

The Model
We implement a stochastic agent based simulation scheme to describe viral evolution in the background of host immune systems. Its main ingredients are sketched in Figure 1. We fix the number of hosts to describe a large reservoir N = 10 7 and do not consider host birth-death dynamics. The number of hosts is chosen to be large, since we are not considering the possibility of extinction of the host reservoir. Hosts can get infected by a given viral strain if they are not already infected by it (equivalently to susceptible individuals in SIR models) in a way that the infection probability depends on the hosts' infection history. Hosts are defined by the set of immune receptors they carry.
We work in a two-dimensional, antigenic space, where each viral strain and each immune receptor in every host is a point in a 2D phenotypic space. This phenotypic space is motivated both by antigenic maps [9] and shape space used in immunology to describe the effective distance between immune receptors and antigen [23][24][25][26][27][28][29]. The recognition probability of viruses by immune receptors is encoded in a cross-reactivity kernel f (r) that depends on the distance between the virus and the receptor in this effective 2D space. We take f (r) = e −r/d to be an exponential function with parameter d that determines the cross-reactivity-the width of immune coverage given by a specific receptor [14].    Figure 1. Phenotypic space and key ingredients of the evolutionary model. During an infection, a virus attempts to infect on average R 0 hosts, however not all infections are successful. The immune repertories of some hosts can clear the virus (case of host 3) since their cross-reactivity kernels from existing memory receptors confer protection. However, if the host does not have protection against the infecting virus (case of host 2), the host becomes infected. After the infection, this host acquires immunity against the infecting virus. Since the virus can mutate within a given host (host 1), the infecting virus can be a mutated variant (case of host 2) with probability P mut = 1 − e −µt I and the ancestral strain that infected host 1 with rate 1 − P mut = e −µt I (case of host 3). The cross-reactivity kernel is taken to be an exponential function f (r) = exp(− r d ), meaning that viruses are recognized by receptors if they are closer in phenotypic space. Jumps are in a random direction and their size is distributed according to a Gamma distribution of mean σ and shape parameter 2. The dimensionless raio σ/d controls the ability of viruses to escape immunity. We assume no selection within one host.
All hosts start off with naive immune systems, implemented as a uniformly zero immune coverage in phenotypic space. If a host is infected by a virus, after the infection, a new immune receptor is added to the host repertoire with a phenotypic position equivalent to the position of the infecting viral strain. Hosts have finite memory and the size of the memory pool of each host immune system M determines the maximum number of receptors in a host repertoire, corresponding to the last M viral strains that infected that host. This constraint can also be seen as the amount of resources that can be allocated to protect the host against that particular virus. In this work we set M=5.
A new infection lasts a fixed time of t I = 3 days before the infected host tries to infect a certain number of new hosts (among those that are not already infected), drawn from a Poisson distribution with average R 0 . The timescale of three days is motivated by the fact that an acute infection typically lasts about a week, but transmission usually occurs early on during the infection. At this time, the infection in the initial host is cleared and a memory immune receptor is added to its repertoire as explained above. During an infection, a virus can mutate in the host with a rate µ. Since we concentrate on the low mutation limit, µt I 1, we limit the number of per-host mutations to at most one. Following [8,17], a mutation in a virus with phenotype a produces a mutant with phenotype b with probability density ρ(a → b) = (1/2π)(4r ab /σ 2 )e −2r ab /σ (Gamma distribution with shape factor 2), where r ab is the Euclidean distance between a and b, so that the average mutation effect is σ. As a result, the newly infected individual can be infected with the same ("wild-type") virus that infected the previous individual with Poisson rate e −µt I , or by a mutant virus with probability P mut = 1 − e −µt I for each infection event.
Not all transmission attempts lead to an infection. When a virus attempts to infect a host, an infection takes place with probability f (r), where f is the cross-reactivity kernel defined above and r is the distance in the 2D phenotypic space between the infecting viral strain and the closest receptor in the host repertoire. If the host repertoire is empty, the infection takes place with probability one. The viral mutation jump size and the cross-reactivity kernel set two length scales in the phenotypic space, σ and d ( Figure 1). Their dimensionless ratio σ/d is one of the relevant parameters of the problem. In this work, we kept d fixed and then varied σ to explore their ratio. We do not explicitly consider competition between immune receptors within hosts, or complex in-host dynamics. Table 1 summarizes the variables used in the model and the main equations.

Initial Conditions and Parameter Fine-Tuning
We simulate several cycles of infection and recovery, keeping track of the phenotypic evolution of viruses and immune receptors throughout time by recording the set of points describing viruses and receptors in phenotypic space at each time, as well as what immune receptors correspond to each host. Once every 360 days, we save a snapshot with the coordinates of all the circulating viruses. In addition, we save the phylogenetic tree of a subsample of the viruses.
In order to quickly reach a regime of co-evolution with a single viral lineage tracked by immune systems, we set initial conditions so that the viral population is slightly ahead of the population of immune memories. Details of the initial conditions are given in Appendix A.1).
Viruses can survive for a long time only because of an emergent feedback phenomenon that stabilizes the viral population when R 0 is fixed, as explained in Section 3.2. Even with that feedback, R 0 needs to be fined-tuned to obtain stable simulations. With poorly tuned parameters, viruses go extinct very quickly after an endemic phase, as also noted in [8]. The detailed procedure for setting R 0 is described in Appendix A.2. Roughly speaking, R 0 needs to be chosen so that the average effective number of infected people at each transmission event is equal to 1, or R 0 p f = 1, where p f is the average probability that each exposure leads to an infection. We further require that the fraction of infected hosts tends towards a target value,f i , which acts as an additional parameter in our model. To do this, R 0 is first adaptively adjusted at each time as: where p f is averaged over the past 1000 transmission events, and f i the current fraction of infected hosts. After that first equilibration stage, R 0 is frozen to its last value. Despite the explicit feedback (∝f i − f i ) being removed, the population size is stabilized by the emergent feedback. As a result, the virus population is stable for long times for a wide range of parameter choices (see Section 3.2).
To have more control over our evolution experiment, we also analyze a variant of the model where we keep constraining the viral population size, constantly adjusting R 0 using Equation (1) for the whole duration of the simulation (100 years). In this way, the fraction of infected hosts f i is stabilized around the averagef i .
Simulations were analyzed by grouping viral strains into lineages using a standard clustering algorithm, as described in Appendix C.1. The traces in each lineage were analyzed to evaluate their speed and variance in phenotypic space, as well as their angular persistence time (see Appendix C.2 for details). We built phylogenetic trees from subsamples of strains as detailed in Appendix C.3.

Detailed Mutation Model
We also considered a detailed in-host mutation model, in which we explicitly calculate the probability of producing a new mutant within a host. We present this model in detail in Appendix B for the case where only one mutant reaches a high frequency during the infection time and we compare the results of this model to the simplified fixed mutation rate model described above.
The general idea is that we consider a population of viruses that replicate with rate α and mutate with rate µ resulting in a non-homogeneous Poisson mutation rate µe αt . The replication rate is the same for all mutants, i.e., there is no selection within one host and the relative fraction of the mutants depends only on the time at which the corresponding mutation arose.
For the case when only one mutation impacts the ancestral strain frequency, we simply calculate the time of the mutation event and use it to find the probability that an invader mutant reaches a certain frequency at the end of the infection. We then randomly sample the ancestral or mutant strain according to their relative frequencies at the end of the infection to decide which one infects the next host.

Modes of Antigenic Evolution
Typical trajectories in phenotypic space show different patterns depending on the model parameters. In the following, we describe a ballistic ( Figure 2A(i-iii)), a diffusive ( Figure 2B(i-iii)), a transient splitting ( Figure 2C(i-iii)), and a stable splitting ( Figure 2D(i-iii)) regime and delineate the corresponding regions of the µ − σ parameter space. Here, we present these four regimes and show sample evolutionary trajectories and corresponding phylogenic trees. We quantify these trajectories and describe the parameter regimes in which these appear in Figure 3 and 4.
Ballistic regime. In this regime of one-dimensional evolution, viruses mutate locally forming a concentrated cluster of similar individuals, called a lineage. Successful mutation events, which take the viral strains away from the regions of antigenic space protected by host immune systems, progressively move the lineage forward ( Figure 2A). For small values of the mutation rate and small mutation jump sizes, the trajectory in phenotypic space is essentially linear, with new mutants always growing as far away as possible from existing host immune systems, which themselves track viruses but with a delay. The delayed immune pressure creates a fitness gradient for the virus population, which forms a traveling fitness wave [3,30,31] fueled by this gradient. A similar linear wave scenario was studied in one dimension by Rouzine and Rozhnova [17].
Diffusive regime. As we increase the mutation jump range, the trajectories lose their persistence length and the trajectories in phenotypic space start to turn randomly, as new strains are less sensitive to the pressure of host immune systems ( Figure 2B).
Both ballistic and diffusive regimes lead to phylogenetic trees with one main trunk and a short distance to the last common ancestor (Figure 2A(ii,iii) and Figure 2B(ii,iii)). The mean time to the most common ancestor TMRCA is the same in these two regimes (Figure 2A(iii) and Figure 2B(iii)). This trend is characteristic of influenza A evolution and has been discussed in detail in Ref. [8]. When viruses evolve in a single lineage, the phylogenetic tree shows a single trunk dominating evolution. When viruses split into more lineages, the phylogenetic tree shows different lineages evolving independently. Each lineage diffuses in phenotypic space with a persistence length that depends itself on the model parameters. In these simulations, viral population size is not constrained, but parameters are tuned to approach a target fraction of infected hosts,f i = 10 −3 . Parameters are Transient splitting regime. Alternatively, we observe a bifurcation regime, where at a certain point in time two mutants form two new co-existing branches, roughly equidistant from both each other and the ancestral strain in antigenic space ( Figure 2C). Each branch has similar characteristics as the single lineage in the one-dimensional, evolution of Figure 2A(i) and B(i). These co-existing branches give rise to phylogenetic trees with two trunks ( Figure 2C(iii)). In the example shown in Figure 2C(iii), the two lineages stably co-exist for~20 years, leading to a linear increase of the distance to the last common ancestor, until one of them goes extinct, returning the evolution to one dominant lineage with small distances to the last common ancestor ( Figure 2C(ii)).
For each parameter point we simulated 100 independent realizations. Stable splitting regime. The two branches can stably co-exist for over~80 years ( Figure 2D, only the first 50 years are shown), starting with similar trends as in the example in Figure 2C(i), not returning to the one dominant lineage regime, but even further branching in a similar equidistant way at later times (not shown). This trend leads to evolutionary trees with multiple stable trunks ( Figure 2D(iii)), with local diversity within each of them and a linear increase of the distance to the last common ancestor over long times ( Figure 2D(ii)).

Stability
The mean extinction time of viral populations depends on the parameter regime ( Figure 3). A stable viral population is achieved in the σ d regime thanks to stabilizing feedback [3]: if viruses become too abundant, they drag the immune coverage onto the whole viral population, and the number of viruses decreases since infecting a new host becomes harder. As a result, the relative advantage of the fittest strains with respect to the bulk of the population decreases as more hosts are protected against all viruses. This feedback slows down the escape of viruses to new regions of antigenic space and the adaptation process. Conversely, when the virus abundance drops, the population immune coverage is slower in catching up with the propagating viruses. The fittest viral strains gain a larger advantage with respect to the bulk and this drives viral evolution faster towards new antigenic regions and higher fitness, increasing the number of viruses.
This stabilizing feedback is very sensitive to the speed and amplitude of variation. Abrupt changes or big fluctuations in population size can drive the viral population to extinction. Because of this, viruses often go extinct very quickly after an endemic phase [3,8], as is proposed to have been the fate of the Zika epidemic [3]. Here, we focus on the stable evolutionary regimes, starting from a well equilibrated initial condition as explained in Section 2.2.

Phase Diagram of Evolutionary Regimes
Our results depend on three parameters: the mutation rate µ, the mutation jump distance measured in units of cross-reactivity σ/d, and the target fraction of infected individuals in the population,f i . The observed evolutionary regimes described in Figure 2 depend on the parameter regimes, as summarized in the phase diagrams presented in Figure 4 for various fractions of infected hostsf i (panels i-iv).
The mean number of distinct stable lineages increases with both the mutation rate and the mutation jump distance ( Figure 4A). Because the process is stochastic, even in regimes where multiple lineages are possible, particular realizations of the process taken at particular times may have one or more lineages. The fraction of time when the population is made of a single lineage (chosen rather than the fraction of runs with a single lineage, which strongly depends on simulation time) decreases with mutation rate and jumping distance ( Figure 4B), while the rate of formation of new lineages increases ( Figure 4C). All three quantities indicate that large and frequent mutations promote the emergence of multiple lineages. This multiplicity of lineages arises when mutations are frequent and large enough so that two simultaneous escape mutants may reach phenotypic positions that are distant enough from each other so that their sub-lineages stop feeling each other's competition and become independent.
Increasing the mutation rate or the mutation jump distance alone is not always enough to create a multiplicity of lineages. For smallf i = 5 × 10 −4 ( Figure 4A(i)) and moderate jump sizes, the single-lineage regime is very robust to a large increase in the mutation rate, meaning the cross-immunity nips in the bud any attempt to sprout a new lineage from mutations with small effects, however frequent they are.
Coalescence times ( Figure 4D) give a measure of the number of mutations to the last common ancestor, and are commonly used in population genetics to characterize the evolutionary dynamics. In the case of a single lineage, coalescence times are short, corresponding to the time it takes for an escape mutation furthest away from the immune pressure to get established in the population. However, when there are multiple lineages, the coalescence time corresponds to the last time a single lineage was present. Such an event can be very rare when the average number of lineages is high, leading to very large coalescence times. Accordingly, the coalescence time increases with lineage multiplicity, and thus with mutation rate and jump size.
In general, large target fractions of infected hosts,f i , lead to more lineages on average and a higher probability to have more than one lineage. Increasing the number of infected individuals increases the effective mutation rate and allows the virus to explore evolutionary space faster. This rescaling allows more viruses to find niches and increases the chances of having co-existing lineages. While an increased fraction of infected hosts may also limit the virgin exploration space where viruses can attack non-protected individuals, this effect may be negligible when the target fractionsf i are small as considered here.

Incidence Rate
When viruses split into lineages, the implicit feedback mechanism described earlier to explain stability remains valid for each cluster independently (unless the number of independent lineages exceeds the immune memory pool M). As a result each lineage can support roughly a fractionf i of the hosts, which defines a "carrying capacity" of each lineage. As a result, the viral population size, also known as incidence rate, is proportional to the number of lineages ( Figure 5). However, the incidence fluctuates with time, with clear bottlenecks when a new cluster is founded.

Speed of Adaptation and Intra-Lineage Diversity
Whether there is a single lineage or multiple ones, each lineage moves forward in phenotypic space by escaping the immune pressure of recently infected and protected hosts lying close behind. We examined the speed of adaptation and the diversity of lineages of viral diversity present at a given time ( Figure 6). We calculated the speed of adaptation in units of cross-reactivity radii d per year by taking, for each lineage, the difference in the two-dimensional phenotypic coordinate of the average virus at time points one year apart. We quantified the diversity by approximating the density of each lineage at a given time by a Gaussian distribution in two-dimensional phenotypic space and calculating its variance along the direction of the lineage adaptation in phenotypic space.
The speed of adaptation increases with the mutational jump size σ, and also shows a weak dependence on the mutation rate µ. The variance in the viral population also increases with the jump size, and in general scales with the speed of adaptation. Fisher's theorem states that the speed of adaptation is proportional to the fitness variance of the population. A correspondance between speed and variance in phenotypic space is thus expected if fitness is linearly related to phenotypic position.
While such a linear mapping does not hold in general in our model, the immune pressure does create a nonlinear and noisy fitness gradient, which can explain this scaling between speed and diversity.

Antigenic Persistence
While lineage clusters tend to follow a straight line, their direction fluctuates as escape mutants can explore directions that are orthogonal to the main direction of the immune pressure. For this reason, while the phylogenetic trees in the ballistic and diffusive regimes in Figure 2A,B are very similar (quantified by the same value of TMRCA compared to the transient splitting and stable splitting regimes), the sample evolutionary trajectories look very different. In Figure 7, we plot the rate at which trajectories turn, changing their direction by at least 30 degrees (see Appendix C.2). As noted in Figure 2, small mutation jump sizes σ favor long periods of linear motion and low turn rates. As σ increases, the turn rate increases.  Several factors affect the turn rate as measured from the simulations. A lineage splitting induces a turn, and regions of phase space where multiple lineages are possible favor short persistence times. The same goes for population extinction: regimes where the population extinction rate is higher do not allow us to observe long persistence times, masking the dependence of the turn rate on µ. Generally, we expect lineage clusters to undergo more angular diffusion in phenotypic space as mutations become more important (large σ). Mutants can explore new regions of the phenotypic space, causing the population to stochastically turn while keeping a cohesive shape. On the other hand, lower mutation rates may mean that fewer mutants will do this exploration, increasing stochasticity in cluster dynamics and effectively increasing the turn rate. In this regime of stochastic turning, predicting the phenotype of future viral strains is much harder than in the linear regime.

Dimension of Phenotypic Space
We explored the effect of phenotypic space dimensions on our results. In Figure 8, we plot the average number of neighbors of a given viral strain within distance r from that strain (for short distances so that only pairs from the same lineage are considered). This measure scales as r D for the cumulative number of neighbors plotted in Figure 8, where D = 2 is the dimension of phenotypic space, as expected for a uniformly distributed cluster of strains in finite dimension. By contrast, that number would be expected to scale exponentially with r for a neutral process in infinite dimensions. These results suggest, that in low dimensions, which seem to be the experimentally valid limit, the dimension of the space does restrict the dynamics and cannot be neglected. However, we are unable to separate the effects of selection and phenotypic space dimensionality. It also implies that lineages form dense, space-filling clusters in phenotypic space. We expect this result to hold for any reasonably low dimension, and will break down in high dimension.

Robustness to Details of Intra-Host Dynamics and Population Size Control
To test whether a detailed treatment of intra-host viral dynamics would affect our results, we also considered a detailed mutation model, where we calculate the probability of producing a mutation within each individual (see Appendix B). Specifically, we compare the model that calculates the probability of having a mutated strain given in Equation (A11) to the simplified model with the mutation rate implemented as discussed above. As we see from Figures 9 and 10, the general evolutionary features are the same as for the simplified model: the probability of multi-lineage trajectories increases with increasing µ and σ, as does the lineage splitting rate and the speed of adaptation. The diversity in phenotypic space in the direction parallel ( Figure 10B) to the direction of motion increases with the mutation jump size, as expected, as well as the turn rate ( Figure 9D).
Lastly, we asked how our results would be affected by strictly constraining the viral population size (as explained in Section 2.2), rather than letting it fluctuate under the control of the emergent negative feedback. The corresponding phase diagrams show the same evolutionary regimes as a function of µ and σ/d ( Figure A1), and the same general dependencies on model parameters of the speed of adaptation ( Figure A2) and turn rate ( Figure A3), as with a fluctuating population.

Discussion
Our model describes regimes of viral evolution with different complexity: one strain dominates (Figure 2A,B), two dominant strains coexist over timescales longer than the host lifetime ( Figure 2C), or multiple strains coexist in a stable way ( Figure 2D). The single-strain regime clearly maps onto influenza A. Influenza B evolution, which is split into the Victoria or Yamagata sublineages, is consistent with prolonged ( Figure 2C) or stable coexistence ( Figure 2D). We can use our results to characterize the differences in the evolutionary constraints acting on the adaptive processes of influenzas A and B. Our results suggest that the combination of mutation rate and effective mutation jump distance in influenza A must be smaller than in influenza B. Since the mutation rates are similar, this means that the effect of mutations at sites in influenza B has a larger phenotypic effect. Alternatively, the effective number of infected individuals per transmission event (R 0 in classical SIR models, equal to R 0 p f in our model) could be larger in influenza B than influenza A. Another possibility is that, since lineage splitting happens stochastically, the difference between the two species is just due to different random realizations.
Our goal was to show that a simple model without additional elements such as the introduction of geographic, demographic or spatial niches can reproduce different evolutionary trends observed in fast evolving viruses. If we consider any specific virus, these specific elements become important for explaining the detailed patterns of evolution. For example, for the flu virus, the seasonal and geographical correlations, as well as the existence of animal reservoirs for human infections and the travel patterns of humans are necessary to predict the global spread of the virus. These additional features lead to a wealth of specific behaviors, but our analysis shows that the stable co-existence of different strains emerges from evolutionary considerations without the need to invoke these additional features.
Our model has the following ingredients: infected hosts pass on infections, viruses mutate, we work in two-dimensional phenotypic space, immune receptors can recognize different viruses (are cross-reactive) and the immune system updates its memory based on the viruses it has seen. Eliminating cross-reactivity and immune memory would result in viruses growing freely, without feeling the immune pressure. In this situation, we would not observe the lineage splitting caused by avoiding immune hosts. Similarly, a one-dimensional model cannot lead to lineage splitting [17,30,31].
We note that co-existing lineages can be obtained in models of evolving populations with weekly interacting niches without any selection pressure (of immune origin or any other)-the number of lineages will simply correspond to the number of niches we assume, each population will evolve according to neutral (e.g., Wright-Fisher) dynamics and the distance between the niches will depend on where we locate them. Unless there is a niche substructure, we will not observe additional within niche splittings. Therefore, observing subsequent lineages within data (which to the best of our knowledge has not been observed) would suggest selection-induced splitting as opposed to pure niches. Our goal was not to explore such niche-induced lineages.
Based on the evolutionary regime, it has been observed that our model could be used to constrain unknown parameters-in particular viral systems, such as the mutation rate or typical effect of mutations. The evolutionary mode also depends crucially on the cross-immunity range d, which could be tested using neutralization assays.
A more detailed comparison between our models and data, which includes virus-specific features, would require refining the mapping between sequence data and phenotypic space. Antigenic maps are a step in this direction, as well as high-throughput genotype-phenotype experiments that map viral strains into virulence phenotypes and similar experiments that map immune receptor sequences into measures of antigen recognition [32]. For our model, the mean extinction times of viruses are plotted in Figure 3. For example, for a mutation rate of µ ≤ 0.001, mutation jump size σ/d < 0.05 and a mean number of infected individuals off i = 10 −3 , viruses survive on average less than 50 years.
Multiple co-existing lineages have been observed in the flu [7]. The question remains if the multiple lineages are self-generated due to population level immune pressure. One test of this scenario is to map the evolutionary regimes where we expect splitting. However, this may not directly validate or falsify the idea due to the mapping problems described above, and also the fact that lineage splitting is a stochastic event, so a lack of splitting in one sample does not mean it cannot happen. An alternative test of the idea of self-generated niches could be performed in synthetic clustered regularly interspaced short palindromic repeats (CRISPR)-phage synthetic evolutionary systems [33]. Averaging over many realisations of the evolutionary experiments, and varying the protection level of the bacteria could help make the mapping between the parameters and increase the observation rate of splitting events.
Our model is applicable to acute infections that spread within one species due to a rapidly evolving virus at the population level and the host clears within a short timescales compared to its lifetime. For this reason, it is a possible model of flu spreading but neither of HIV evolution, which mainly evolves in hosts, nor DNA viruses or slow evolving RNA viruses such as measles. Here, we mainly discuss our results in the context of influenza evolution; however, a detailed comparative study of how often the different evolutionary trends are observed across fast evolving viruses is an interesting future direction.
Our model shares similarities with previously considered models of viral evolution [3,8], while focusing on distinct questions. Among differences in modeling details, our hosts have finite memory capacity and forget past strains after some time, compared to infinite memory assumed in past work. Comparing our simulations with Ref. [3,8] in their relevant regimes, we do not see noticeable differences in the main trends of evolution, which suggests that the effects of losing memory are quantitative rather than qualitative at the population scale, at least for the parameters' regimes that were inspected. The need for revaccination against certain even slowly evolving viruses (although these are not the type of studies here) suggests that the timescales for memory loss can be variable and some antigens stimulate lifelong memory, while the memory repertoire against other antigens decays more rapidly. We assume exponentially decaying cross-reactivity, similarly to [3] (although it is linearized in their analysis). By contrast, the authors in [8] use a linear cross-reactivity, but this minor difference is unlikely to influence the results. The authors in [8] focused specifically on the question of explaining the single dominant lineage in influenza A evolution. While the existence of lineage bifurcations was acknowledged in Ref. [8], this regime was not explored. Instead, a more detailed geographical model was considered, with migrations between different geographical zones. For the single lineage regime, with the addition of seasonal niches, the authors in [8] report a decreased extinction rate compared to our model, as one would expect from classical models. The authors in [3] asked a similar question that we did about the conditions under which strain bifurcations may occur, but in the context of an infinite antigenic space. The general trends seem to be independent of the dimensionality of the space, since both models recover the same behavior. However, the exact scaling laws reported in Ref. [3] seem to be more sensitive to the model assumptions. Lastly, while we also considered a more detailed model of intra-host influenza evolution, we found that it could be mapped onto an effective model of viral transmission with mutations, with little impact on the results.
Two main effective parameters control the evolutionary patterns: the effective mutation rate and the mutational jump size, measured in units of the cross-reactivity radius. The effective mutation rate is a combination of the actual mutation rate per host, and the mean number of infected hosts at each cycle: larger fractions of infected individuals lead to more opportunities for the virus to escape host immunity, and faster viral adaptation as a whole. Additionally, a feedback mechanism is observed between the host immune systems and the viruses: too successful viruses infect many hosts, effectively speeding up the rate at which the susceptible host reservoir is depleted, and mounting up the immune pressure. Our model does not include host death, since we assume we are in the limit of very large host reservoirs. Accounting for host extinction may lead to a different interesting problem that has been explored using SIR models [5,34]. In the context of our model, however, host death would effectively amount to reducing the hosts' immune memory capacity M.
The effects of dimensionality on the observed evolutionary trajectories are worth discussing in more detail. The infinite dimensional model is similar in spirit to the infinite sites model of sequence evolution: infinite dimensions mean that there is always a direction for the virus to escape to. Conversely, low dimensions result in an effectively stronger feedback of the host immune systems on the possible trajectories of the escaping virus. This generates effective mutation and jump rates that depend on the primary parameters in a nonlinear way, with possibly different effects in different parameter regimes. We also observe a breakdown of the scaling of observables such as the coalescence time and the mean number of co-existing lineages with µσ 2 (see Figure A4), as would be predicted by the diffusion limit of the traveling wave framework [31,35]. These results indicate that the discreteness of mutations matter. The effective dimensionality of the phenotypic space depends on the parameters, going from effectively one in the linear regime to the dimension of the phenotypic space in the splitting regime. We expect that our results generalize to higher dimensions than 2, with each splitting event leading to a new direction in phenotypic space and increasing the effective dimension of the viral population.
In summary, a detailed exploration of the mutation rate and jump distance, as well as the fraction of infected individuals, allowed us to understand the constraints that lead to different modes of antigenic evolution and, in particular, lineage splitting at different rates and with different survival times of new (sub-)lineages. Observed bifurcations are rare in nature, which puts an evolutionary constraint on the adaptation process.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Simulation Details
Appendix A. 1

. Initialization
We initialize all simulations in an immune coverage background that favors the evolution of one dominant antigenic lineage. We draw viral positions uniformly from a rectangle with bottom-left and top-right corners positioned at (−3σP mut , 0) and (3σP mut , σ). Each host is initialized with one immune receptor as a point in antigenic space, which grants localized protection. The initial memory repertoires of the different hosts are drawn uniformly from a rectangle with bottom-left and top-right corners positioned at (−3σP mut , −5 σ f i P mut ) and (3σP mut , 0), wheref i is the target fraction of infected hosts, determining the number around which the viral population is stabilized (see Section 2.2) and the timescale with which all hosts add (or renew) an immune receptor to their repertoire. In order to lose memories of the artificial initial conditions, we let the system evolve until 99% of the host population have been infected by a virus, so that most hosts have added at least one strain to their repertoires before recording any data.

Appendix A.2. Control of the Number of Infected Hosts
We studied two versions of the same model, one constraining the viral population size strictly, the other letting it fluctuate. In the latter case, we still have to constrain population size for an initial transient in order to reach a well equilibrated initial condition.
We control the virus population size through the fraction of infected hosts around a target value off i . We modify R 0 -the average number of new hosts that are drawn to be infected in a given transmission event-based on the current fraction of infected hosts f i at each time: where p f is the probability of a successful infection at a transmission event, i.e., the probability that a new host is susceptible to the infecting viral strain. We evaluate its average p f over segments of 1000 transmission events.
On average, Using Equation (A1), we find that the average fraction of infected hosts f i (t) is governed by a logistic map with fix pointf i , effectively producing a process where the viral population growth is limited by an effective carrying capacity Nf i .

Appendix B. Detailed Mutation Model
We present the detailed in-host mutation model, in which we explicitly find the probability of producing a new mutant within an infected host. We assume that the immune system responds only to the first viral strain it sees, and that all viruses see the immune system in the same way, undergoing the same deterministic dynamics, i.e., evolution is neutral within one host. This intra-host neutral selection holds if the characteristic mutation jump size is smaller than the cross reactivity length, σ d, which is the case for our simulations. We consider this mutation-proliferation process up to time t I .
We call the total viral population v tot , the first viral invader that is the first viral strain infecting one host, v 0 , and the new mutants, appearing with size 1, v j . These three quantities (neglecting the discreteness of the process) grow deterministically as function of time t as: where i j denotes the indexes of the viral mutants originated from mutant j (if any) and t i j indicates the times at which such mutations arose (Θ(x) is the Heaviside function, = 0 for x < 0 and 1, otherwise). Each mutation jumps to new phenotypic coordinates. From these equations, the relative mutants fractions are The mutation process from any virus present in the viral pool is a non homogeneous Poisson process with rate µe αt . The probability of having n mutations up to the time t is: with The time t 1 of the first mutation event is distributed as: In our simulations, we assume that all mutations other than the first are negligible, that is, we can have more than one mutation, but those after the first do not significantly affect the relative fraction, therefore we have only one mutant. The mutant fraction in the population is x 1 (t) = e −αt 1 if t > t 1 . Knowing the distribution of the first mutation times t 1 , we can calculate the probability distribution of the mutant fraction x 1 at the time of the transmission event t I : In the simulations, we fixed the growth rate to α = 4 day −1 .

Appendix C. Analysis of Simulations
Appendix C.1. Lineage Identification In order to analyze the organization of viruses in phenotypic space, for each saved snapshot, we take the positions of a subset of 2000 viruses and then cluster them into separate lineages through the python scikit-learn DBSCAN algorithm [36,37] with the minimal number of samples min_samples = 10. The parameter defines the maximum distance between two samples that are considered to be in the neighborhood of each other. We perform the clustering for different values of and select the value that minimizes the variance of the 10th nearest neighbor distance (the clustering results are not sensitive to this choice). From the clustered lineages, we can easily obtain a series of related observables, such as the number of lineages and the fraction of time in which viruses are clustered in a single lineage (Figure 4). A split of a lineage into two new lineages is defined when two clusters are detected where previously there was one, and the two new cluster centroids are farther away than the sum of the maximum distances of all the points in each cluster from the corresponding centroid. We impose this extra requirement in order to reduce the noise from virus subsampling and the clustering algorithm. A cluster extinction is defined when a cluster ceases to be detected from one snapshot to the next.    Figure 7 of the main text but with constant fixed population sizef i = 8 × 10 −4 , 10 −3 , and 1.5 × 10 −3 (from left to right, panels i to iii). Phase diagrams as a function of mutation rate µ and mutation jump size σ for rate of turns of the trajectories.