Ecological Strategies for Resource Use by Three Bromoviruses in Anthropic and Wild Plant Communities

Ecological strategies for resource utilisation are important features of pathogens, yet have been overshadowed by stronger interest in genetic mechanisms underlying disease emergence. The purpose of this study is to ask whether host range and transmission traits translate into ecological strategies for host-species utilisation in a heterogeneous ecosystem, and whether host utilisation corresponds to genetic differentiation among three bromoviruses. We combine high-throughput sequencing and population genomics with analyses of species co-occurrence to unravel the ecological strategies of the viruses across four habitat types. The results show that the bromoviruses that were more closely related genetically did not share similar ecological strategies, but that the more distantly related pair did. Shared strategies included a broad host range and more frequent co-occurrences, which both were habitat-dependent. Each habitat thus presents as a barrier to gene flow, and each virus has an ecological strategy to navigate limitations to colonising non-natal habitats. Variation in ecological strategies could therefore hold the key to unlocking events that lead to emergence.


Introduction
Forecasting plant-virus disease risk is a necessity, as viruses cause the largest fraction of emerging diseases of plants [1]. Central to virus emergence are changes in the virus host range, that is, the number of host species used by a virus [2][3][4]. Host-range evolution is incompletely understood, as it depends on multiple factors, some intrinsic to the virus, such as genetic traits that determine its fitness in different hosts, and some extrinsic to the virus, related to ecological strategies (i.e., ecological and epidemiological traits) that determine host-plant resource utilisation (e.g., the potential for encountering hosts) across plant communities [5][6][7]. The degree of similarity in resource utilisation (niche overlap) by species, in respect to community structure [8], can be articulated by the co-occurrence of viruses in individual host plants across habitats of an ecosystem. Experimental analyses of intrinsic factors in plant-virus host-range evolution have focussed on understanding specificity of infection, which implies that the fitness of a virus varies across its potential hosts. These studies have underscored the relevance of across-host fitness trade-offs in host-range evolution that occur when adaptation to one host involves fitness penalties in another. Causes of across-host fitness trade-offs are pleiotropic effects of, and epistatic interactions among, host range mutations, on one side, and host factors on the other [9,10].
Ecological factors in host-range evolution are less amenable to experimentation (but see [11,12]), and their analysis in wild plant communities is still very limited, because viruses have been studied primarily as pathogens of crops [13][14][15]. Host-range evolution to a subset of potential hosts depends on transmission, species interactions, and heterogeneity in resources [16,17] and, ultimately, on the distribution and movement of viruses across complex communities [18,19]. Despite the use of high-throughput sequencing (HTS) to ically injure leaves or flowers [44,45]. Seed transmission of PZSV, possibly through the maternal tissues and through the pollen, has been shown in tomatoes, the experimental host Nicotiana glutinosa, and the natural wild host Diplotaxis erucoides [37,38]. The few characterised isolates show high sequence identity of the genome sequences, but differ in experimental host range, and in their capacity of seed transmission in different hosts [38].
In this work we characterise the host range, the genetic structure, and the diversity of CMV, TAV, and PZSV in plant communities of a heterogeneous ecosystem in central Spain that occupy habitats with different levels of human intervention. With these data we test the hypothesis that the different host range and transmission traits of these three viruses translate into different ecological strategies for host-plant resource utilisation across plant communities. Results show that the three viruses have strong habitat-specificity in host resource utilisation. The ecological strategies associated with habitat specificity differ in the way host species are utilised by the three viruses. Variation in host use within virus species also depends on habitat type.

Description of Study Sites and Sampling
We conducted 78 collections at 23 sampling sites between July 2015 and June 2017 in the Vega del Tajo-Tajuña agricultural region of south-central Spain ( Figure 1). These collections produced 6291 individual samples of 271 plant species distributed over four habitats, each with distinct cover types. The first two of the four habitats constitute what we term the 'anthropic habitats.' Crops (Crop) are annual monocultures and may include assemblages of wild plant species. Crops are left fallow between seasons or rotated. Narrow boundaries that separate crops (Edge) are relatively permanent communities of wild species that partially benefit from nutrient and water supplementation in adjacent crops. The second two habitat categories, which we term 'non-anthropic', are firstly evergreen oak forests (Oak) that are the region's primary habitat for wild species. Oak is dominated by Quercus species with an understory of sclerophyllous shrubs and grasses. Structurally distinct from Oak are abandoned, undisturbed and patchy areas of successional scrubland (Wasteland) interspersed among the other habitats. The terms collection, site, and habitat represent three levels of experimental unit. Four sites each of Oak and Wasteland, with individual collection produced from the re-sampling of sites of these habitats, were visited in autumn and spring over two growing seasons. Collections from Edge and Crop, with four and eleven sites, respectively, were produced from visits during spring, summer, and autumn [46]. Plant species with an abundance of 5 or more individuals in at least one of the four habitats were retained for HTS, as these are central to transmission among habitats. Abundant species that occurred in a single habitat were also retained. The sample of 271 plant taxa was reduced to 118 species for HTS.

Detection of Viral Reads and Estimation of Host Range
Individual RNA extracts of leaf samples from the same site, collection, and plant species were pooled to obtain a single library preparation. In total, 323 libraries of 2037 pooled extracts were sent for sequencing (for details see [28]). Paired-end reads of 125 or 150 nt. were sequenced on Illumina HiSeq platforms. All reads were provided with Phred quality scores greater than Q30 and trimming of adapter contamination conducted with cutadapt v1.8.3 [47]. Genomic references of ssRNA viruses were accessed from NCBI's Viral Genome Browser (https://www.ncbi.nlm.nih.gov/genomes/, accessed on 1 December 2018) and used to construct a local BLAST database. Blast+ version 2.2.29 [48] was used to identify virus OTUs derived from the HTS libraries. The results of the BLAST queries of each library were merged with taxonomic and study site details.
The detection of an operational taxonomic unit (OTU) was considered credible if a read met the following criteria: (1) having a BLAST query coverage of 100%; (2) a query length of 125 nucleotides (together referred to as HTS2C detections). Validation of the BLAST credibility criteria for HTS detections was undertaken with RT-PCR. Primers were  (Table A1) for detection of these viruses in a subset of 208 randomly chosen libraries that represented 69 plant species from all four habitats. The specificity of primers was verified by Sanger sequencing of the amplicons from 3 to 5 RT-PCR-positive libraries of different plant species. Additionally, RT-PCR primers were assessed for specificity with five libraries from five plant species that were unlikely hosts of CMV, TAV, or PZSV, that is, libraries absent from the list of HTS2C detections, with no amplification in any case. Combined HTS and RT-PCR protocols were used to estimate host range, defined as the number of host species used by a pathogen.

Detection of Viral Reads and Estimation of Host Range
Individual RNA extracts of leaf samples from the same site, collection, and plant species were pooled to obtain a single library preparation. In total, 323 libraries of 2037 pooled extracts were sent for sequencing (for details see [28]). Paired-end reads of 125 or 150 nt. were sequenced on Illumina HiSeq platforms. All reads were provided with Phred quality scores greater than Q30 and trimming of adapter contamination conducted with cutadapt v1.8.3 [47]. Genomic references of ssRNA viruses were accessed from NCBI's Viral Genome Browser (https://www.ncbi.nlm.nih.gov/genomes/accessed 1 December 2018) and used to construct a local BLAST database. Blast+ version 2.2.29 [48] was used to identify virus OTUs derived from the HTS libraries. The results of the BLAST queries of each library were merged with taxonomic and study site details.
The detection of an operational taxonomic unit (OTU) was considered credible if a read met the following criteria: (1) having a BLAST query coverage of 100%; (2) a query length of 125 nucleotides (together referred to as HTS2C detections). Validation of the BLAST credibility criteria for HTS detections was undertaken with RT-PCR. Primers were designed from Coat Protein (CP) gene region references of CMV Subgroup I, TAV, and PZSV with the NCBI Primer BLAST tool (Table A1) for detection of these viruses in a subset of 208 randomly chosen libraries that represented 69 plant species from all four habi-

Virus Set Memberships and Co-Occurrence Simulations
As there was more than one HTS library prepared for a single host species, the aggregation of counts from multiple libraries of the same host species may mistakenly conflate single infections into co-occurrences. Virus species set memberships (Figure 2), implemented with the R [49] package eulerr [50], were conducted by scoring the presence or absence of each of the three viruses either in each single host species or HTS library.
The C-score index ("checkerboard score") was used to compare virus species cooccurrence patterns [51] among populations defined by habitat, season, or site. The larger the C-score, the fewer shared sites (i.e., host species) there are. The matrix-wide average can contain individual pairs of species that are segregated, random, or aggregated. Two simulation algorithms that differ in how host species are treated were used to test whether these patterns differed from random occurrences across hosts, libraries, seasons, and sites. Absence-presence matrices treated host species as either equiprobable (Sim2), or where the probabilities of occurrence at sites is proportional to the observed virus richness at the site (Sim4). Viruses are assumed to colonise host species randomly with respect to one another. The two algorithms were selected and compared as they have the lowest error frequencies.
The co-occurrence simulations were implemented with the R package EcoSimR [52].
currence patterns [51] among populations defined by habitat, season, or site. The larger the C-score, the fewer shared sites (i.e., host species) there are. The matrix-wide average can contain individual pairs of species that are segregated, random, or aggregated. Two simulation algorithms that differ in how host species are treated were used to test whether these patterns differed from random occurrences across hosts, libraries, seasons, and sites. Absence-presence matrices treated host species as either equiprobable (Sim2), or where the probabilities of occurrence at sites is proportional to the observed virus richness at the site (Sim4). Viruses are assumed to colonise host species randomly with respect to one another. The two algorithms were selected and compared as they have the lowest error frequencies. The co-occurrence simulations were implemented with the R package Eco-SimR [52].

Genetic Diversity and Population Genomic Analysis
Reads that satisfied the HTS2C credibility criteria were aligned with their respective reference genome using the Burrows-Wheeler Alignment (BWA) algorithm [53]. Variant calling analysis was conducted with the binary alignment map (bam) files generated from the BWA output, and implemented with BCFtools [54]. The genetic diversity analyses and population genomics were performed with the R package PopGenome [55].
A subset of read sequence libraries with alignments that produced a sufficient depth (Table A3) were used for population genomic analyses. As whole genomes may not be recovered at even depths [56], libraries with at least 10× coverage of the genome were included in the analyses. Pilot analyses of genetic diversity of CMV, in which libraries with progressively lower coverage were included, indicated that libraries with as low as 10× coverage depth could be used for estimating genetic diversity without introducing errors or bias. Genetic diversity of CMV at the site and habitat levels was estimated with 38 libraries from four habitats, while information from 28, 20 and 39 Edge libraries for CMV, TAV and PZSV, respectively, was used for analyses at site level. The nucleotide diversity (π), that is, the average pairwise difference between all possible pairs of individuals in a sample, was used as a measure of genetic diversity (π = S/L where S is the number of sites with more than one nucleotide variant and L is the number of nucleotides in the sequence). Whole-genome consensus sequences (WGCS) were used to estimate within-population genetic diversities using the Tamura 3-parameter model [57], with standard errors of each estimate based on 1000 bootstrap replicates, as implemented in MegaX [58]. Missing data was allowed when it accounted for not more than 30% of the sequence. Populations were defined either by habitat or study site.
A sliding-window analysis was used to calculate nucleotide diversity along the genome (i.e., averaging across several variants). The approach allows for a systematic examination of localised nucleotide diversity across the genome. As the genomes of CMV, TAV, PZSV are tripartite, π was estimated over the full length of each segment. A window size of 1000 nt was used for all viruses, with a shift of 5 nt along the genome made for each estimate of π. Heatmaps were implemented with the R package adegenet v2.1.5 [59] and used to visualise single nucleotide polymorphisms (SNPs) of genotypes in respect to the respective reference genomes of each virus.

Population Genetic Structure
To investigate the strength of association between CMV genotypes and habitat, level of anthropic disturbance (i.e., Crop with Edge and Oak with Wasteland), site, or host species observations (i.e., traits), phylogenetic inference and Bayesian tip association significance (BaTS) tests [60] were conducted. The BaTS analysis was used to test the observations inferred from the phylogenies, and generated an association index (AI) and parsimony score (PS). The AI index measures the overall strength of association between genotypes at tips of phylogenetic trees and traits (the imbalance of internal nodes of the phylogeny), and the PS index measures the degree of homoplasy between genotypes in the phylogenetic tree and tip values. Thus, low AI values represent strong phylogeny-trait association and the PS value is inversely related to the strength of tip-character association [60,61]. The BaTS approach repeatedly simulates associations under the null hypothesis that characters at the tips are randomly distributed across the phylogeny. The resulting p-values follow a uniform distribution, and the type 1 error of the test will be correct for all levels of statistical significance. Significant AI and PS p-values indicate that the observed association and level of homoplasy between genotypes and traits is unlikely to have occurred by chance alone. Whole-genome consensus sequences were aligned with the CLUSTALW algorithm [62] implemented in MegaX. The General Time Reversible model (GTR) with gamma distributed substitution rates and invariant sites allowed was implemented in Mr. Bayes. Posterior probabilities and mean branch lengths of Bayesian consensus phylogenies were derived from 30,000 post-burnin trees. A random sample of 2000 posterior post-burnin trees from a Bayesian inference implemented with MrBayes v3.2.7 [63], were used to account for phylogenetic uncertainty in the tip-association analysis. Four Markov chains were run for 20 million generations, sampling each chain every 500 trees. Convergence and posterior parameter distributions were assessed using the MCMC Tracer Analysis Tool v1.7.1.

Detection of Viral Reads and Estimation of Host Range
The detections of OTUs by HTS2C and RT-PCR approaches are summarised in Table 1 according to each host species and the number of libraries pooled per host. The random subset of libraries selected for RT-PCR accounted for 90% of the host species used for HTS2C detections. Correspondence between the host-range estimates by habitat derived  (Table A2), confirming the accuracy of HTS2C detections. Thus, for subsequent ecological analyses, host range was derived from the HTS2C counts. The detections by the HTS2C approach resulted in host-range estimates of CMV, TAV and PZSV as 88, 38, and 67, respectively, at the ecosystem level. Mean percentage identity of local BLAST queries between reads and reference genomes of the three viruses under the HTS2C criteria were between 92.7% to 98.9% (Table A4). The observed host range of TAV (n = 38) was half that of PZSV (n = 67) and was broadest in CMV (n = 88). The three viruses utilised host species mostly in Edge that had the highest host-species richness among the habitats (Table 1 and Figure A1). Host-species resource utilisation by each of the viruses differed with greater similarity observed between CMV and PZSV ( Figure A2). The realised host range of the three viruses (Table 1 and Figure 2) in each of the habitats was lowest for TAV in Wasteland, with the highest observed for PZSV in Edge. The broadest realised host range in all habitats apart from Edge was observed in CMV. The relatively narrow host range of TAV goes some way to explaining the rarity of this virus in the ecosystem.

Virus Set Memberships and Co-Occurrence Simulations
Independent of the host-range breadth is the co-occurrence or specificity of the viruses in respect to particular host species. As RNA extracts of individual samples of host species were pooled to produce HTS libraries, infection may have occurred in any one of the individual samples that made up the pool. Figure 2 shows counts of each virus in terms of observations made either in a host species or HTS library. Although it is likely that some counts of a species did not include true cases of co-occurrence in an individual library, the counts of set memberships produced similar distributions. Several patterns are evident: (1) infections either by CMV or PZSV in the absence of any of the other two viruses occurred at a higher frequency than for TAV; (2) the number of infections either by CMV or PZSV, in the absence of the other two viruses, varied between habitats; and (3) multiple infections of any combination of all three viruses occurred at the highest frequency in Edge. Furthermore, the number of observations of multiple infection tended to be similar across the sites of Edge ( Figure A3, top), and indicated that counts (i.e., spatial variation) in any category of the set memberships were relatively consistent. The exception was the variation in the co-occurrence of both CMV and PZSV between site L1 and the other three sites. Seasonal (i.e., temporal) variation in the occurrence of the viruses in Edge ( Figure A3, bottom) was substantial especially in the high proportion of infections by all three viruses that occurred in spring. The pattern of seasonal variation in virus occurrence differed according to the habitat ( Figure A3 (bottom) and Figure A4), with CMV and PZSV mostly occurring in single infections during the spring in Oak and Wasteland, respectively. Also, autumn infections of CMV were most frequent in Edge, in co-occurrence with PZSV, and in Wasteland, in single infections.
Simulations in co-occurrence were run to test whether the set-membership observations agreed with differences in aggregation and segregation of viruses among host species of each habitat. The C-score for Edge was significantly lower (p-value = 0.017) than expected (Table 2), which indicated aggregation of virus species pairs in a proportion of host species of this habitat. This result agreed with the high proportion of multiple infection observed in Edge (Figure 2). Similarly, each of the four Edge sites had significantly lower (p-value < 0.001) than expected C-scores (not shown). The high frequency of multiple infections by all three viruses in Edge shown by the set memberships, indicated that the viruses co-occur in a proportion of host species in this habitat. The higher-than-expected C-score of Oak and Wasteland indicates a significant proportion (p-value ≤ 0.001) of virus species pairs are segregated and tend not to co-occur in host species of the non-anthropic habitats. Together, each of the three viruses exhibited distinct patterns of occurrence across the habitats. PZSV occurred in host species in Edge without the presence of the other viruses that tended to co-occur in this habitat. Similar host utilisation by CMV and PZSV ( Figure A2) was a result of the low abundance (or titre) of TAV across all habitats. TAV was only present in Oak without co-occurring with either of the other viruses. Lastly, host communities in Edge by far supported the highest frequency of multiple infection. Table 2. Co-occurrence simulation tests for virus-sharing host species among the habitats. For any particular species pair, the larger the C-score, the more segregated the pair, with fewer shared host species.

Population
Observed C-Score

Genetic Diversity Analysis
Variant calls were used to infer SNPs and assess nucleotide diversity for each virus genomic segment and population. Mean π at the site level (Table A5) was between 0 and 0.007 for all segments and viruses except for RNA2 of PZSV at mean π = 0.01, and indicated consistent low genetic diversity among sites. At the habitat level, CMV had higher π than at the site level, particularly in RNA3 in Crop and Edge habitats (Table 3). A sliding-window analysis at the site level of localised nucleotide diversity (π) across segments of CMV (Figures 3 and A5), TAV ( Figure A6), and PZSV ( Figure A7), indicated the highest diversity was observed in the RNA3 segment of CMV and the lowest in the RNA1 and RNA 3 segments of PZSV. Both CMV and TAV had higher nucleotide diversity in their RNA3 genome compared to the RNA1 and RNA2 segments. For CMV, nucleotide diversity trended in similar ways across each segment for each habitat. A Kruskal-Wallis test showed that ranks of π did not differ significantly across the RNA segments. The nucleotide diversity trends of TAV and PZSV across the segments for Edge sites was not as clear, particularly for RNA1 and RNA2. By comparison to that of CMV, the very low values of π in these instances indicate that variation was driven by mutations at a very few sites. Together, similar trends of localised variation in π across the genomic segments, in instances where nucleotide variation was not minimal, shows that ecosystem-and site-level factors do not correspond to variation in genetic diversity. The observed variation in π between habitats or sites of each virus may not be independent of sample size.

Population Genetic Structure
Whole-genome consensus sequences from 38 libraries of CMV were used to infer phylogenetic relationships among virus genotypes and their habitat affiliations as the reads of TAV and PZSV could only be assembled for Edge sites. The general patterns of branching ( Figure 4) are consistent with the clustering of genotypes by habitat, with most major clades including multiple host species from taxonomically distant families. In all three RNAs, isolates from Oak cluster separately from those of Edge plus Crop. The strong support for branching along the backbone of the phylogeny of each genomic segment indicates Crop and Edge (and instances of Wasteland) isolates that evolved from a variant that occurred in Oak. The next most-derived clades are isolates that largely occurred in Crop and Edge, implying these variants re-colonised Crop (and Oak and Wasteland) from host species of Edge. A very long branch in the RNA3 segment indicated a relatively divergent clade from Edge and Crop, which were split between Convolvulus arvensis and Cucumis melo, respectively. Overall, the sliding-windows analysis and phylogenetic reconstructions show similar evolutionary dynamics. The variation in the genetic diversity across the genomes of all RNA segments and for each RNA between habitats is indicative either of fluctuations in the abundance of each segment or of sample size. Although isolates from Wasteland were rare, in each segment there was phylogenetic over-dispersion of isolates from this habitat, which was consistent with colonisation cycling among the other three habitats. Overall, the inferences are consistent with habitat-specificity more so than host-species specificity in host resource utilisation by CMV.
Bayesian tip-association significance testing for all three RNAs of CMV showed significantly (p < 0.0001) strong (Table 4) associations that indicated a strong effect of anthropic disturbance, or habitat, on genetic diversity. The strong tip associations indicated by the AI statistic showed a high degree of coherence among isolates sampled from the same habitat, which were descended from very few ancestral genotypes.

Discussion
Understanding the ecological strategies that modulate host range is central to forecasting disease risk and virus emergence [4,64]. Most metagenomic studies either explore the richness of viromes or do not address explicitly plant-virus ecological strategies [20][21][22][23][24][25][26][27]. We have focussed on the interactions of multiple plant viruses and higherorder interactions among them in respect to heterogeneity in the resources available from host-plant communities.
Here we analyse the ecological strategies for resource use of three viruses, CMV, TAV and PZSV, that share features of particle and genome structure and gene expression, as they belong to the same family, Bromoviridae [29]. Common features of their single-stranded, messenger sense, tripartite RNA genomes are encoding in the two larger genome segments, RNA1 and RNA2, proteins 1a and 2a, respectively, which are involved in genome replication and are part of the viral replicase [65]. Dicistronic RNA3 encodes the MP, with the function of gating plasmodesmata, and necessary for cell-to-cell and systemic movement within the infected plant [66], and the CP, also required for systemic movement and for virus dispersal as it forms the virus particles [29,66]. Despite similarities in genomic structure and gene function, CMV, TAV and PZSV differ in traits relative to host range and mode of transmission, which we hypothesise will translate into different ecological strategies for host-plant resource utilisation across plant communities.
Our study is based on virus detection by HTS of rRNA-depleted plant RNA in 323 libraries of 118 plant species from communities growing in four different habitats, more (Crop and Edge) or less (Oak and Wasteland) anthropic, in a heterogeneous ecosystem in central Spain. As HTS detection of virus OTUs involves uncertainty from different factors that include the sensitivity of the detection method, the proportion of false positives produced, or the high proportion of unknown virus species expected in wild plant communities [23,67], detection was confirmed by RT-PCR with virus-specific primers on a random subset of 208 libraries from 69 plant species. The very good correspondence between RT-PCR and HTS detection validated HTS2C detection, which was the basis for further analyses.
According to our hypothesis, we show that differences in ecological traits translate into variation in how host resources are used. Thus, CMV and PZSV, which are less related taxonomically as they belong to different genera and are transmitted by different mechanisms, share a more similar pattern of host-species utilisation than CMV and TAV (Table 1, Figure A2), which belong to the same genus, are both vectored by aphids, and are so closely related genetically as to be able to form stable hybrids by reassortment of genomic segments [30]. The three viruses had their broadest realised host range at Edge, but while TAV and PZSV are mainly Edge viruses, with about 75% of their hosts in this habitat, only about half of the hosts of CMV were from Edge, with a large fraction of hosts from the less anthropic habitats, Oak and Wasteland (Table 1, Figure 2). The patterns of virus detection in single occurrence and in co-occurrence over space (habitats) and time (seasons) (Figures 2, A3 and A4) show that the three viruses have different ecological strategies that are habitat-dependent. While in Edge CMV and TAV co-occur with PZSV in most hosts, PZSV has a set of non-shared hosts in this habitat and hosts unique for CMV and TAV occur mostly in the less anthropic habitats. The relatively consistent maintenance of the viruses across space, shown across sites of Edge, contrasts with the temporal variation through the seasons. While most PZSV detections occur in spring in all habitats, most TAV detections in all habitats are from autumn. CMV detections are most frequent in spring in Edge and Oak, and in autumn in Wasteland. Maintenance of CMV and PZSV in the autumn is mostly in Edge co-occurrences or, for CMV, as single occurrences in Wasteland. This underscores that infection of host plants and host plants of specific habitats by the three viruses is dynamic.
Despite the different function the four habitats have in the ecology of these viruses, Edge stands out as a reservoir community with a high virus-host richness, where virushost interactions are mostly organised by interactions among the three viruses in shared hosts. These features of Edge agree with our previous results from the analyses of different plant-virus interaction data sets [7,28]. Edges have features that may explain these results. As Edges benefit from water and nutrient supplied to nearby crops, they provide plant communities with a more stable environment over the year than the wild habitats, where plant growth and reproduction ceases during winter and summer. As a consequence, the temporal variation of plant cover and biomass over the year is less than in the other habitats [68]. However, the role of Edge as a reservoir was not apparent in an analysis of patterns of infection of four contact-transmitted tobamoviruses in the same ecosystem using the same set of HTS libraries [67], as detections were not more numerous at Edge than at the less anthropic habitats. Thus, the reservoir role of Edge might also be explained, at least in part, by its role in maintaining insect populations, which has been repeatedly shown for pollinators and for predators and parasitoids of agricultural pests, and could well be so for insect vectors of viruses [69][70][71]. Lastly, Edge may be a sink for aphid populations, as the first stimulus for aphid alighting in plants is visual cues in the 500-600 nm wave length (yellow-green) that are best perceived in contrast with other colours, such as the brown colour of bare soil [72,73]. The patterns of virus co-occurrence are not explained solely on the basis of virus-host ranges (TAV mostly occurs in infections with the other viruses, and a majority of all co-occurrences involve PZSV rather than CMV). This observation is not explained by shared mechanisms of virus transmission as most co-occurrence involves PZSV, which is not aphid-transmitted. This suggests higher order interactions are a factor in host-species utilisation. It is tempting to speculate on the role of virus infections on the behaviour of aphids and other insects, and on the growth of their populations. As for other aphid-transmitted viruses, infection by CMV has been shown to alter plant volatile composition in different host species, making infected plants more attractive to aphids and to pollinators, also altering the behaviour of insect herbivores, predators, and parasitoids [74][75][76][77][78][79]. Thus, plants initially infected by CMV, the virus with the broadest host range, could attract aphids that transmit TAV, and thrips and other pollinators that would carry pollen from PZSV-infected plants. Of course, we cannot assume that other viruses outside the attention of this study do not have a part in the observed patterns of infection.
The dissimilarity in ecological strategies across habitats is not congruent with either the low genetic variation of each virus in the different habitats, nor the low genetic variation of CMV across segments and habitats. Although there was clearly localised variation in regions of the segments that differed between them, the trends along the segments were relatively consistent among habitats. The exceptions were the relatively high nucleotide diversity of RNA3 of CMV in Crop and Wasteland (Table 3), the two habitats where CMV isolates appear phylogenetically over-dispersed at RNA 3 ( Figure 4). RNA3 encodes proteins related to host colonisation and transmission that, a priori, could show more variation across hosts than proteins involved in the replication of the RNA genome. However, low genetic diversity of the three viruses regardless of their host range, and the limited role of host species, compared to habitat, in shaping the genetic structure of the CMV population, argue against a major role of host specificity, and across-host fitness trade-offs, in the observed host ranges and patterns of infection.
For CMV, data allowed a more detailed analysis of population structure. The sample size of isolates used to estimate π was larger for Edge than for Oak or Crop, in that order, yet the variation in π of CMV in Edge shown by the sliding-window analysis was flat compared to that observed for the other habitats ( Figure 3). The low genetic diversity of isolates infecting the melon crop and perennial or annual host species that grow along the year in Edge is consistent with early epidemiological analyses carried in the south of France and NE USA (reviewed in [80]) and more recently in central Spain [68] that indicated that wild plants growing in proximity to crops assure the year-round presence of the virus for crop infection. That Wasteland had a large variation may be the result of a small sample with extreme pair-wise differences compared to the Edge sample. This assertion is supported by the phylogenetic inferences ( Figure 4) with isolates from Wasteland being phylogenetically over-dispersed within clades that predominate in one of the other habitats. This phylogenetic pattern is consistent with sporadic and successful colonisation from other habitats associated with population bottlenecks, which would counter virus adaptation to host or environment due to genetic drift [81]. Closely related isolates from Oak tend to be basal in the phylogenies of the three RNAs. This is indicative of genotypes from Oak successfully colonising all the other habitats, which occasionally re-colonise Oak.

Conclusions
In summary, our study has shown that genetically closely related viruses do not have to share similar ecological strategies for resource utilisation. The two less-related viruses, CMV and PZSV, had similar strategies by comparison with the closely related pair, CMV-TAV, and this strategy included a broad host range. A broad host range permitted wider resource-use opportunities among host species, but also a higher frequency of co-occurrence with the two other viruses in the host species they had in common, and this was habitat dependent; most co-occurrences were observed in the anthropic habitats. Each habitat thus presents as a barrier to gene flow, and each virus species has an ecological strategy to navigate host-species resource heterogeneity. The variation in ecological strategies, or rate at which they evolve, exhibited by a virus should correlate with the dimensions of its reservoir (niche) available across an ecosystem. The ecological strategies (e.g., phenotypic plasticity [82]) of each virus species will determine the potential for ecological fitting [83] and in overcoming barriers to colonisation of a non-natal habitat. Identification of variation in ecological strategies may therefore hold the key to predicting emergence events.   Figure A5: title; Sliding-windows analysis of nucleotide diversity (π) across the consensus genomes of CMV segments at Edge sites (sliding-window size was 1000 nt). Figure A6: title; Sliding-windows analysis of nucleotide diversity (π) across the consensus genomes of TAV segments at Edge sites (sliding-window size was 1000 nt). Figure A7: title; Sliding-windows analysis of nucleotide diversity (π) across the consensus genomes of PZSV segments at Edge sites (sliding-window size was 500 nt).       Figure A5. Sliding-windows analysis of nucleotide diversity (π) across the consensus genomes of CMV segments at Edge sites (sliding-window size was 1000 nt). Figure A5. Sliding-windows analysis of nucleotide diversity (π) across the consensus genomes of CMV segments at Edge sites (sliding-window size was 1000 nt).  Figure A6. Sliding-windows analysis of nucleotide diversity (π) across the consensus genomes of TAV segments at Edge sites (sliding-window size was 1000 nt). Figure A6. Sliding-windows analysis of nucleotide diversity (π) across the consensus genomes of TAV segments at Edge sites (sliding-window size was 1000 nt).  Figure A7. Sliding-windows analysis of nucleotide diversity (π) across the consensus genomes of PZSV segments at Edge sites (sliding-window size was 500 nt). Figure A7. Sliding-windows analysis of nucleotide diversity (π) across the consensus genomes of PZSV segments at Edge sites (sliding-window size was 500 nt).  Table A3: Coverage depth of libraries used for genetic analyses. Coverage = (total read length x total number of reads)/(genome size).; Table A4: Mean and range of percentage identity of BLAST queries (with HTS2C criteria upper panel and 6C lower panel) between reads and the reference genomes of CMV, TAV, and PZSV.; Table A5: Nucleotide diversity (π) at Edge sites.  Mean ± SE π 0.006 ± 0.001 0.008 ± 0.004 0.004 ± 0.002 0.005 ± 0.001 n.a. = populations had insufficient information at this genomic segment.