Generated Randomly and Selected Functionally? The Nature of Enterovirus Recombination

Genetic recombination in RNA viruses is an important evolutionary mechanism. It contributes to population diversity, host/tissue adaptation, and compromises vaccine efficacy. Both the molecular mechanism and initial products of recombination are relatively poorly understood. We used an established poliovirus-based in vitro recombination assay to investigate the roles of sequence identity and RNA structure, implicated or inferred from an analysis of circulating recombinant viruses, in the process. In addition, we used next-generation sequencing to investigate the early products of recombination after cellular coinfection with different poliovirus serotypes. In independent studies, we find no evidence for a role for RNA identity or structure in determining recombination junctions location. Instead, genome function and fitness are of greater importance in determining the identity of recombinant progeny. These studies provide further insights into this important evolutionary mechanism and emphasize the critical nature of the selection process on a mixed virus population.


Introduction
The rapid evolution of RNA viruses is attributable to both the error prone nature of viral RNA-dependent RNA polymerases (RdRp) and the extensive exchange of genetic information achieved through the processes of recombination and-in segmented viruses-reassortment. Together, these are important drivers of virus evolution, often linked to the emergence of novel pathogens and disease outbreaks [1][2][3][4]. Recombination predominates in single-stranded positive-sense (mRNA-sense) RNA viruses and is proposed to occur via distinct replicative or nonreplicative mechanisms. In the latter, the formation of a full-length genome likely involves processing and ligation by cellular enzymes, and the detailed mechanism and biological significance remain unclear [5,6]. In contrast, replicative recombination between two compatible virus genomes coinfecting the same cell is well-studied [7][8][9][10].
The importance of recombination in nature is typified by the frequent isolation of recombinant forms of members of the enterovirus genus of the family Picornaviridae. Poliovirus and other picornaviruses have been extensively used to study recombination and the underlying mechanisms involved [8,11]. These include the initial demonstration of the copy-choice strand-transfer event and numerous, but often conflicting, studies Viruses 2022, 14 interpreting the influence of RNA sequence and structure on the generation of viable recombinants [10,[12][13][14][15][16][17]. To facilitate the analysis of early events in recombination, we previously established an in vitro assay (the CRE-REP assay) that solely yields recombinant viruses [18].
To directly address the potential influence of RNA structure and sequence identity on the strand-transfer event in recombination, we have exploited the CRE-REP assay and modified the input genomes. By independently analyzing genomes containing extensive regions of sequence identity, or templates modified to contain regions of high or low levels of RNA structure, we were unable to demonstrate any significant influence on the proportion of recombination events occurring in these modified regions. We extended these studies to investigate the early RNA products arising from a natural coinfection of cells with two serotypes of poliovirus using deep sequencing. In contrast to the generally clustered range of recombinants found using the CRE-REP assay [18], these were distributed throughout the genomic region analyzed. Based on our analyses, we propose that neither sequence identity nor RNA structure are important primary determinants in recombination.

Virus and Cell Culture
HeLa cells and L929 mouse fibroblast cells were maintained in Dulbecco's Modified Eagle Medium (DMEM) supplemented with 10% heat-inactivated FBS (FBS-DMEM). Poliovirus type 1 (Mahoney) and type 3 (Leon) were recovered following transfection of in vitro transcribed RNAs from full-length cDNAs. Additional virus stocks for replication of competent PV3-2A/2B junction-modified viruses FLC/PV3 L , FLC/PV3 H , and FLC/PV3 1 (see below) were similarly generated and all virus stocks quantified by plaque assay on HeLa cells. Growth kinetics of viruses were determined following synchronous infection of HeLa cells at a multiplicity of infection (MOI) of 5 in serum-free media (SF-DMEM). Unabsorbed virus was removed by washing with PBS and cells were incubated in fresh FBS-DMEM at 37 • C, 5% CO 2 . Supernatants containing virus were harvested at various time points postinfection and quantified by plaque assay. Recombinant viruses isolated from CRE-REP assays were biologically cloned by limit dilution in 96-well plates of confluent HeLa cells. Virus-containing supernatant was removed after 3 days incubation at 37 • C, 5% CO 2 and stored at −80 • C, and the remaining cell monolayer stained with a 0.1% crystal violet solution. Virus supernatants from the highest dilutions causing complete cytopathic effect (CPE) were utilized in further analysis.
For coinfection studies, poliovirus type 1 and type 3 were used to coinfect confluent HeLa cells in four T175 flasks at an MOI of 10 of each virus. The cells were incubated for 30 min at 37 • C before the virus was removed and the cells were washed with PBS. Fresh DMEM media was added and the flasks were incubated at 37 • C for 5 h. After media removal, cells were washed with PBS, trypsinized, and pooled in 2-mL DMEM media, giving a total of 7.6 × 10 7 cells. The cells was lysed by three freeze-thaw cycles to extract virus and the debris was discarded. The supernatant containing the viruses was filtered through 0.2-µm filters and the resulting filtrate was used for RNA extraction, followed by RT-PCR and NGS sequencing.

Design of Modified 2A/2B Junction Sequences
A 450 nt sequence-equivalent to nucleotides 3599 to 4045-of both PV1 and PV3 was redesigned using the software package SSE [19] to generate sequences with the minimum and maximum amount of RNA structure possible by maintaining amino acid sequence and divergence between PV1 and PV3. The native PV1 and PV3 sequences were scrambled using the CDLR algorithm in SSE ( Figure S1). This randomly scrambles the codon order, disrupting any sequence-dependent underlying RNA structure, whilst maintaining the coding, codon usage, and dinucleotide frequencies identical to those of the input sequence. CDLR randomization was used to generate a large panel of variant sequences that retained these three characteristics. Each was processed using custom-written Perl scripts to deter-Viruses 2022, 14, 916 3 of 14 mine the sequence divergence from the input. Of those that were within 1% divergence (6230 PV1-and 6303 PV3-derived CDLR scrambles), the predicted RNA structure stability was calculated by determining the MFED (the minimal free energy difference [20]. MFED is a measure of the intrinsic folding energy of an RNA sequence and is determined by comparing an input sequence with 99 sequence-order randomized variants. The sequence identity of the 10 highest and lowest energy sequences was analyzed, and representative sequences with similar sequence identity were selected for construction as cDNAs (Table S1).
The same 450 nt sequence was altered for sequence identity by replacing the PV1 sequence with that of PV3 and vice versa. The sequence exchange was not 100% as there were 7 amino acid differences between PV1 and PV3 in this region-2A residues 87, 116, 123, and 2B residues 3, 22, 26, and 30. At these amino acid positions, the sequence was not altered between PV1 and PV3. Synthesis of all modified 450 nt DNA fragments was carried out by GeneArt (Fisher Scientific, Loughborough, UK) and provided as sequence-verified plasmid DNA.

CRE-REP Assay, Plasmids, In Vitro RNA Transcription and Transfection
The CRE-REP assay used to generate recombinant viruses has been described previously [18]. Wild type cDNAs of PV1 donor template, pRLucWT, and PV3 acceptor template, pT7/SL3, were used as a background for the generation of the 2A/2B modified cDNAs. Standard molecular biology techniques were used to replace the 450 nt sequence spanning the 2A/2B boundary with the synthesized DNAs described above to generate the CRE-REP cDNA variants referred to as PV1 L , PV1 H , PV1 3 , PV3 L , PV3 H , and PV3 1 . In addition, the relevant PV3 sequences were placed into a background of the replication-competent PV3 cDNA, pT7/FLC, for growth kinetics analysis. All cDNAs were confirmed by sequence analysis prior to use.
Plasmids were linearized with ApaI (pRLucWT background) or SalI (pT7/SL3 or pT7/FLC background) and RNA transcribed using a HiScribe T7 High Yield RNA Synthesis kit (NEB, NEB, Hitchin, UK) following the manufacturers' protocol. RNA transcripts were DNaseI (NEB, NEB, Hitchin, UK) treated to remove residual template DNA and column purified using a GeneJET RNA Purification Kit (Fisher Scientific, Loughborough, UK) prior to spectrophotometric quantification. For all CRE-REP variant assays, equimolar amounts of both template RNAs (based on 250 ng of acceptor) were prepared with Lipofectamine 2000 (Fisher Scientific, Loughborough, UK) in a 3:1 Lipofectamine 2000:RNA ratio as per manufacturers' protocol and transfected into 80% confluent L929 cell monolayers. Virus supernatant was recovered at 30 h post-transfection and virus quantified by plaque assay on HeLa cells. For virus stocks, 1 µg of RNA was transfected into 80% confluent HeLa cell monolayers using Lipofectamine 2000 as above, and virus supernatant harvested at 12 h post-transfection and quantified by plaque assay on HeLa cells.

RNA Extraction and RT-PCR
Viral RNA was isolated from CRE-REP clarified cell culture supernatant samples using a GeneJET RNA Purification Kit (Fisher Scientific, Loughborough, UK) and reverse transcribed at 42 • C using an oligo dT primer and SuperScript II reverse transcriptase (Fisher Scientific, Loughborough, UK) as per manufacturers' protocol. The region of recombination (VP1 to 2C) was amplified using primers PV3-F (5 -GCAAACATCTTCCAACCCGTCC-3 ) and PV1-R (5 -TTGCTCTTGAACTGTATGTAGTTG-3 ) and Taq polymerase (Fisher Scientific, Loughborough, UK) with an initial denaturing at 94 • C for 3 min, followed by 35 cycles of 94 • C for 45 s, 50 • C for 30 s, 72 • C for 60 s, and a final extension at 72 • C for 10 min. Following poliovirus coinfection, total RNA was extracted from cells by QIAamp Viral RNA Mini Kit (Qiagen, Redwood City, CA, USA) and reverse transcribed as above. Subsequently, 5 µL from the cDNA synthesis reaction was used in the PCR amplification. The region of recombination was amplified using primers PV3-F2 (5 -CTCCAAAGTCCGCATTTACA-3 ) and PV1-R2 (5 -ATCAGGTTGGTTGCTACA-3 ) and Taq polymerase (Promega, Promega, Southampton, UK) with an initial denaturing at 95 • C for 2 min, followed by 35 cycles of 95 • C for 1 min, 58.1 • C for 30 s, 72 • C for 1.5 min, and a final extension at 72 • C for 5 min.

NGS Library Preparation
Illumina Nextera XT DNA kit (Illumina, Illumina Way, CA, USA) was used to prepare the NGS samples for sequencing as per the manufacturer instructions. Briefly, 1 ng from the amplicons was simultaneously fragmented and tagged with unique adapter sequences, followed by 10 cycles of PCR, before loading into the MiSeq instrument (Illumina, Illumina Way, CA, USA). The generated reads were uploaded to Illumina BaseSpace (Illumina, Illumina Way, CA, USA), a cloud-based genomics analysis and storage platform that directly integrates with all Illumina sequencers.

Nomenclature
The terms precise and imprecise refer to whether the resulting recombinant genome is full length (precise) or contains insertions or deletions (imprecise) ( Figure S2). We additionally defined the recombination junction as unambiguous, where the sequence between the donor and acceptor strands was dissimilar, or ambiguous, where it was located within a short region of sequence identity in the parental genomes ( Figure S2). We deliberately avoid use of the inexact term 'homology', which is variously used to indicate identity or relatedness. In two divergent recombining sequences, junctions can occur in regions of sequence identity or at positions where there is sufficient sequence divergence (whilst still being homologous) to unambiguously define the junction between the donor and acceptor sequences. Similarly, insertions or deletions can generate junctions where the origin of the sequences around the junction cannot be definitively assigned to the donor or recipient strand due to limited sequence identity around the junction.

Influence of RNA Structure and Sequence on CRE-REP Recombination
The CRE-REP assay [18] utilizes a poliovirus type 1 (PV1) luciferase-encoding subgenomic replicon as the polymerase donor and a poliovirus type 3 (PV3) genome bearing a lethal mutation in the cis-acting replication element (CRE) located within the 2C-coding region [19,20] as the acceptor ( Figure 1A). Cotransfection of in-vitro-generated RNA from both templates only yields viable recombinants if a suitable strand-transfer event 5 to the functional CRE element in the donor and 3 to the P1 capsid-coding region in the acceptor template occurs. We previously demonstrated that viable recombinants had junctions that predominantly clustered in one of two regions, spanning either the VP1/2A or 2A/2B polyprotein cleavage boundaries (Cluster 1 and Cluster 2, respectively, hereafter known as C1 and C2). We reasoned that by modifying the sequence identity or RNA structure within the C2 region of the donor or acceptor templates, we could investigate the local influences of sequence or RNA structure on recombination using the unmodified C1 region as an internal control. For example, a positive influence on recombination would yield a higher ratio of C2:C1 recombinants than observed in a control assay with unmodified templates.
We exploited the redundancy of the amino acid triplet code to design 450 nt sequences, centered on the 2A/2B cleavage site, with high or low levels of RNA structure (see Materials and Methods). All designed sequences exhibited the same level of sequence identity when compared with their parental sequence (85.5%) and retained the same level of sequence identity as present between the parental PV1 and PV3 CRE-REP partners (77.6 +/− 1%; Table S1). In each case, the encoded proteins were identical to the respective native PV1 donor or PV3 acceptor templates. The resulting modified RNA templates used in CRE-REP assays were as follows: (1) the PV3-derived acceptor modified PV3 L and PV3 H (where L and H refer to low and high RNA structure, respectively); (2) the PV1-derived donor-modified PV1 L and PV1 H ( Figures 1B and S3). We exploited the redundancy of the amino acid triplet code to design quences, centered on the 2A/2B cleavage site, with high or low levels of RN (see Materials and Methods). All designed sequences exhibited the same level identity when compared with their parental sequence (85.5%) and retained the of sequence identity as present between the parental PV1 and PV3 CRE-REP pa +/− 1%; Table S1). In each case, the encoded proteins were identical to the respe PV1 donor or PV3 acceptor templates. The resulting modified RNA templa CRE-REP assays were as follows: (1) the PV3-derived acceptor modified PV3 (where L and H refer to low and high RNA structure, respectively); (2) the P donor-modified PV1 L and PV1 H ( Figures 1B and S3).
Similarly, we engineered donor and acceptor templates such that the s region was ~98% identical. This was achieved by directly exchanging the PV with that of PV3, or vice versa, with the exception of retention of the seven a that differ between PV1 and PV3 in this region (see Materials and Methods) t ruption of any cis interactions involving these residues. The resulting modified plates for use in CRE-REP assays were PV31 (a PV3-derived acceptor containi Similarly, we engineered donor and acceptor templates such that the same 450 nt region was~98% identical. This was achieved by directly exchanging the PV1 sequence with that of PV3, or vice versa, with the exception of retention of the seven amino acids that differ between PV1 and PV3 in this region (see Materials and Methods) to avoid disruption of any cis interactions involving these residues. The resulting modified RNA templates for use in CRE-REP assays were PV31 (a PV3-derived acceptor containing a stretch of PV1 sequence) and PV13 (a PV1-derived donor containing a stretch of PV3 sequence; Figures 1B and S3). We confirmed that the sequences introduced to the CRE-REP donor and acceptor templates had no significant effect on virus replication ( Figure S4) and, therefore, should not influence the underlying replicative recombination process.
To generate recombinant virus populations, we conducted CRE-REP assays in parallel. For each assay, one unmodified template (either donor or acceptor) was paired with one modified donor or acceptor template. Assays are referred to by the name of the modified template of the RNA pair. Donor and acceptor RNAs were cotransfected into murine L929 cells (permissive but not susceptible to poliovirus) in equimolar amounts and the virus-containing supernatant harvested 30 h post-transfection. In addition, we transfected each donor and acceptor RNA individually into L929 cells and confirmed that, as expected, none were alone able to generate infectious virus (data not shown). A total of three independent cotransfections were carried out for each assay and the yield of recombinant virus determined by plaque assay ( Figure S5). The unmodified CRE-REP control assay (WT) generated an average of 2.1 × 10 3 pfu/mL of recombinant viruses. The total yield of recombinants from CRE-REP assays with template modifications did not significantly differ from the WT control assay, demonstrating no obvious advantage or disadvantage in recombinant generation resulting from the modifications.

Analysis of Recombination Junctions in the Modified CRE-REP Assay
The overall yield of recombinants ( Figure S5) includes all those that map in C1 and C2 combined. To determine whether there were any changes in the ratio of recombination events occurring within the C1 and C2 regions as a result of the introduced modifications, we sequenced a representative sample of recovered viruses. The supernatants from the triplicate transfections of each assay were pooled to generate a diverse virus population and minimize founder effects. A total of 420 recombinants (60 from each CRE-REP assay) were isolated by limit dilution. Following viral RNA extraction and reverse transcription, the entire recombination region was amplified by PCR and sequenced to determine the location and features of each recombination junction ( Figure 2A). each donor and acceptor RNA individually into L929 cells and confirmed that, pected, none were alone able to generate infectious virus (data not shown). A total o independent cotransfections were carried out for each assay and the yield of recom virus determined by plaque assay ( Figure S5). The unmodified CRE-REP contro (WT) generated an average of 2.1 × 10 3 pfu/mL of recombinant viruses. The total y recombinants from CRE-REP assays with template modifications did not significan fer from the WT control assay, demonstrating no obvious advantage or disadvan recombinant generation resulting from the modifications.

Analysis of Recombination Junctions in the Modified CRE-REP Assay
The overall yield of recombinants ( Figure S5) includes all those that map in C2 combined. To determine whether there were any changes in the ratio of recomb events occurring within the C1 and C2 regions as a result of the introduced modific we sequenced a representative sample of recovered viruses. The supernatants fr triplicate transfections of each assay were pooled to generate a diverse virus pop and minimize founder effects. A total of 420 recombinants (60 from each CRE-REP were isolated by limit dilution. Following viral RNA extraction and reverse transcr the entire recombination region was amplified by PCR and sequenced to determ location and features of each recombination junction (Figure 2A).  recombinant within the population of precise (black) or imprecise-insertion (blue) recombinants (see Nomenclature, materials and method). (B) Junctions were defined as precise or imprecise and graphed according to location in Cluster 1 or Cluster 2. Unflattened data were compared to flattened data using Fisher's Exact Test and showed no significant difference in the distribution of recombinants. (C) Junctions were defined as for (B) and graphed by individual CRE-REP assay. Each modified assay was compared to wild type using Fisher's Exact Test and showed no significant difference in the distribution of recombinants, either between assays or unflattened and flattened data sets.
After sequencing, 119 samples were discarded due to ambiguities in the sequence that indicated the virus population was not clonal. Approximately one third of the remaining recombinants were present more than once in the dataset. These identical sequences could either arise independently due to unique recombination events or may reflect early recombination products that have undergone additional rounds of replication (founder effects), thus increasing their proportion in the mixed virus progeny. Since the experimental design could not discriminate between these, we considered the data both in their entirety (unflattened) and after discarding identical sequences (flattened).
In the flattened data (n = 205), 58% of all junctions mapped to the C1 region while 42% mapped to the C2 region. No statistical difference (Fisher's Exact Test) was observed when compared with the unflattened data (n = 301), where 53% mapped to C1 and 47% to C2 ( Figure 2B). Further analysis of the data demonstrated that there were no statistically significant (Fisher's Exact Test) favored or unfavored modified assay pairings that altered the distribution of junctions between the C1 and C2 regions when compared with the unmodified WT templates ( Figure 2C).
These results suggested that neither local sequence identity nor the gross level of RNA structure play a major role in influencing the location of recombination junctions within viable recombinants that were selected, packaged, and subsequently propagated.

Isolation of Recombinants Following Virus Coinfection
If, as we suggest, RNA sequence and structure do not influence the process of recombination, we would expect the junctions to have been functionally selected-i.e., on viability or relative fitness-from a much more diverse spectrum of crossover events. Such a diverse population potentially includes both in-frame insertions and deletions and out-of-frame junctions. Of these, insertions have been previously reported [18] but deletions or out-offrame recombinants are highly unlikely to be present amongst viable recombinant progeny. To investigate recombinant genome diversity further, we developed an assay based on the next-generation sequencing (NGS) analysis of a mixed virus population arising after coinfection with poliovirus serotypes 1 and 3.
HeLa cells were coinfected at an MOI of 10 and intracellular viral RNA was harvested 5 h postinfection (hpi), a duration previously shown to allow the production and detection of intracellular recombinants [21], whilst minimizing the selection and analysis of genomes that have been packaged or initiated further rounds of infection. Purified RNA was reverse transcribed and PCR amplified across the region spanning Clusters 1 and 2 before analysis by gel electrophoresis (Figure 3A). A distinct product of~1.3 kb, representing the expected size of a precise recombination product, was observed, together with both larger (up to~3 kb) and smaller (~0.45 kb) products. In addition, there was extensive diffuse background staining between these products suggestive of a range of different-sized PCR products ( Figure 3A). Preliminary analysis of seven randomly-selected cloned PCR products by Sanger sequencing (Table S2) showed a diversity of recombination junctions and encouraged us to study the population of molecules using NGS.

The Complexity of Recombination Events Revealed by NGS Sequence Analysis
After the recombination-specific amplification of the viral RNA 5 hpi, the amplicons were fragmented before being sequenced on a MiSeq Illumina platform, generating~1.5 million reads of 250 nt in length. We developed a pipeline based upon the aligner Bowtie2 [22] and the Viral Recombination Mapper (ViReMa) [23] to analyze recombination junctions, using it as previously described [24]. Briefly, the entire dataset was aligned to WT PV3 and PV1 reference sequences using Bowtie2 and those that matched perfectly, i.e., without the recombination junction, were eliminated. Remaining reads were analyzed using ViReMa to identify the recombination junction (Table 1).

The Complexity of Recombination Events Revealed by NGS Sequence Analysis
After the recombination-specific amplification of the viral RNA 5 hpi, the amplicons were fragmented before being sequenced on a MiSeq Illumina platform, generating ~1.5 million reads of 250 nt in length. We developed a pipeline based upon the aligner Bowtie2 [22] and the Viral Recombination Mapper (ViReMa) [23] to analyze recombination junctions, using it as previously described [24]. Briefly, the entire dataset was aligned to WT PV3 and PV1 reference sequences using Bowtie2 and those that matched perfectly, i.e., without the recombination junction, were eliminated. Remaining reads were analyzed using ViReMa to identify the recombination junction (Table 1).   We identified~27 × 10 3 recombinant junction sequences (Table 1) in the dataset, defining 323 unique junctions, between PV3 and PV1. There were no recombinant junctions in the control dataset. Identified recombinant junctions were initially grouped into those with no sequence duplication or deletion (so-called precise junctions [18], n = 203; see nomen-clature in Materials and Methods), those with sequence insertions (imprecise insertions, n = 35), or those with sequence deletions (imprecise deletions, n = 85) ( Figure 3B). The two groups of imprecise junctions exhibited a range of different-sized insertions (1-3150 nt) or deletions (1-896 nt; Figure 3B). All 323 recombinants were mapped using parallel coordinate diagrams, which clearly showed that recombination junctions, whether precise or imprecise, were distributed throughout the analysis region ( Figure 3C), with no evidence for the clustering around the encoded polyprotein cleavage sequences seen in previous CRE-REP analysis [18].
To validate these observations, we performed a technical replicate by independently extracting total cell RNA from the original coinfection sample ( Figure S6, Table 1). Despite fewer NGS reads overall, and a concomitant reduction in ViReMa-detected junctions, we were still able to identify hundreds of recombinants between PV3 and PV1. Correlation analysis of the NGS reads for each junction in the two replicates, normalized to the total reads per junction, demonstrated that the method produced statistically reproducible results (Spearman R = 0.88). All subsequent analyses used the first sequenced population containing 323 unique junctions.

Analysis of Recombination Junctions Following NGS
The NGS analysis of junction sequences provides an independent approach to determine the randomness, or otherwise, of the recombination process. To formally test whether the observed crossover junctions between PV3 and PV1 were randomly distributed across the targeted region, we compared the precise junction locations with a modelled dataset containing randomly generated precise junctions. The comparison used a sliding cumulative score across the window of recombination to determine the occurrence of recombination events at each nucleotide position ( Figure 4A). We observed no significant difference between our NGS dataset and the randomly generated dataset (p = 0.06; Mann-Whitney U test), supporting our contention that precise recombination junctions are randomly located throughout the targeted region.
Viruses 2022, 14, x FOR PEER REVIEW 1 an in silico library containing every possible imprecise recombinant within the tar region showed a significant positive correlation (R = 0.8, p < 0.01). This suggests th precise recombination tracks closely with the random model and indicates no role f quence identity in their generation. Finally, we mapped all the imprecise recombina their genome positions and could not detect any correlation between the NGS rea quency and the length of insertion/deletion sequences, reading frame, or location o genome ( Figure 5B). While the above analysis strongly suggested that recombination junctions wer domly distributed, it remains a possibility that localized regions of RNA structure influence junction location. To investigate these possibilities, we applied mean fo energy difference (MFED) analysis [20] to all detected recombinants of all types. In paring the MFED values from either the positive-or negative-strand to the num recombination junctions occurring within the same sliding window, we were una detect any correlation between the presence, or absence, of predicted RNA structur the location of recombination junctions ( Figure 6). This was in agreement with the REP assay analysis ( Figure 2) and further supports our view that RNA structure do influence the location of recombination junctions.
Taken together, these studies suggest that recombination does not preferential cur within regions of RNA structure or sequence identity between templates. We pr that recombination per se is a random process but is followed by a rigorous selecti 'functional' genomes [18,27]. It is the analysis of the latter that has produced the prev view of the importance of sequence identity or structure in their generation, where this study-specifically examining early recombinant genomes-we cannot find evi supporting their involvement.  The occurrence of recombination on the genome was given a score of 1 while no occurrence was 0. The number of recombination occurrences was calculated in a cumulative way, i.e., the occurrence of recombination at each position was added to the occurrence at the previous position. This was applied to two populations; the precise recombinants from the virus coinfection sample and a random recombination model generated by Excel. The cumulative recombinant count (y-axis) was plotted against the locations on the PV1 (donor) amplicon (x-axis). (B) An identical sequence length (IS) of 4 nt, which hypothetically occurred twice in the genome, was used as an example. The rectangles correspond to a part of the targeted region of PV1 and PV3, the lines represent recombinants and their lengths denote the NGS read, i.e., the longer the line, the more NGS reads. The identical sequence is placed within a smaller rectangle and the red Ns represent a mismatch of any other nucleotides. Looking at a possible real situation in the left panel, the Precise Recombinants (PRs) occurred at different sites within the identical sequence with different reads. ViReMa will always report 1 PR per IS-something similar to the middle panel. In our linear model, we convert ViReMa model into the right panel, where junction counts equal the number of sequence identity length + the mismatch, and NGS reads are equal for all junctions. (C) Individual counts of identical sequences (IS) of different lengths were summed within the amplified region and plotted along with the count of precise junctions that mapped to each different length of sequence identity. For each of the latter the average number of junctions per nucleotide was calculated, assuming that recombination was equally likely to occur at each position within identical donor and template sequences. The linear regression analysis was performed to compare the slopes between the predicted-data model (fitted line) and the expected-data model (perfect line); since the slopes of both lines are nearly identical, only one line can be seen in the figure.
As short regions of localized sequence identity have previously been suggested to influence the recombination process [10,12,25,26], we next investigated a role for such sequences in shaping our NGS dataset. Within the targeted region, there are 219 short regions of sequence identity between 2 and 32 nt in length. ViReMa maps junctions to the first divergent nucleotide 3 to a region of identity [21], which makes it impossible to determine the exact nucleotide-within the identical sequence-at which strand transfer occurred. Due to this, we made the assumption that recombination was equally likely to occur between any nucleotide within a region of sequence identity and its adjacent 3 nucleotide ( Figure 4B). We normalized the recombinants per nucleotide on this assumption; for example, if a precise junction with 100 NGS reads mapped 3 to a 4 nt region of aligned PV3 and PV1 sequences, this equated to five precise junctions each represented by 20 reads ( Figure 4B).
Notably, there were no recombinants that mapped to the most extensive region (32 nt) of sequence identity between the parental genomes. Of the remaining 218 regions (2-21 nt), we normalized the reads as described above to generate the observed-data model and compared it to the predicted-data model (a straight line based on our assumption- Figure 4B, right panel) using linear regression analysis. We found no significant difference (p = 0.93) between the observed and expected models ( Figure 4C), suggesting-for precise junctions at least and at this time point of the coinfection of poliovirus serotypes 1 and 3-that recombination junctions are distributed independently of sequence identity.
We additionally failed to detect enrichment of short regions of sequence identity at imprecise junctions ( Figure 5A). Correlation analysis between the experimental data and an in silico library containing every possible imprecise recombinant within the targeted region showed a significant positive correlation (R = 0.8, p < 0.01). This suggests that imprecise recombination tracks closely with the random model and indicates no role for sequence identity in their generation. Finally, we mapped all the imprecise recombinants to their genome positions and could not detect any correlation between the NGS read frequency and the length of insertion/deletion sequences, reading frame, or location on the genome ( Figure 5B).
(IS) of different lengths were summed within the amplified region and plotted along with the count of precise junctions that mapped to each different length of sequence identity. For each of the latter the average number of junctions per nucleotide was calculated, assuming that recombination was equally likely to occur at each position within identical donor and template sequences. The linear regression analysis was performed to compare the slopes between the predicted-data model (fitted line) and the expected-data model (perfect line); since the slopes of both lines are nearly identical, only one line can be seen in the figure.  The influence of RNA structure on recombination. The MFED was calculated for a 100 nt sliding window (at 30 nt increments) for both PV1 and PV3 positive and negative strands (dashed and solid lines, respectively) and plotted against the number of total recombinants within the same sliding window. We chose a 100 nt window, as we found from our previous studies [19,20,28] that decreasing the sliding window size would introduce noise to the system attributed to the appearance and disappearance of small stem-loop structures, making the MFED values less conclusive. Spearman's correlation coefficient was calculated between the recombinant count and the MFED values on either strand. While the above analysis strongly suggested that recombination junctions were randomly distributed, it remains a possibility that localized regions of RNA structure may influence junction location. To investigate these possibilities, we applied mean folding energy difference (MFED) analysis [20] to all detected recombinants of all types. In comparing the MFED values from either the positive-or negative-strand to the number of recombination junctions occurring within the same sliding window, we were unable to detect any correlation between the presence, or absence, of predicted RNA structure and the location of recombination junctions ( Figure 6). This was in agreement with the CRE-REP assay analysis ( Figure 2) and further supports our view that RNA structure does not influence the location of recombination junctions.
Taken together, these studies suggest that recombination does not preferentially occur within regions of RNA structure or sequence identity between templates. We propose that recombination per se is a random process but is followed by a rigorous selection for 'functional' genomes [18,27]. It is the analysis of the latter that has produced the prevailing view of the importance of sequence identity or structure in their generation, whereas, in this study-specifically examining early recombinant genomes-we cannot find evidence supporting their involvement.  The imprecise recombinants were counted at each identical length (x-axis) and the percentage proportion related to the total imprecise junctions was calculated (y-axis). The flattened data of the imprecise recombinants generated from the coinfection experiments (grey) were plotted against a random control (black), where every possible imprecise junction was created and classified using a custom Perl script. (B) In the upper panel, the locations of the detected imprecise recombinants on the PV1 genome were plotted against the NGS read count. In the lower panel, the size of either the insertions or deletions of the detected recombinants were plotted against NGS read count. Each dot represents a unique recombinant. Figure 6. The influence of RNA structure on recombination. The MFED was calculated for a 100 nt sliding window (at 30 nt increments) for both PV1 and PV3 positive and negative strands (dashed and solid lines, respectively) and plotted against the number of total recombinants within the same sliding window. We chose a 100 nt window, as we found from our previous studies [19,20,28] that decreasing the sliding window size would introduce noise to the system attributed to the appearance and disappearance of small stem-loop structures, making the MFED values less conclusive. Spearman's correlation coefficient was calculated between the recombinant count and the MFED values on either strand. Figure 6. The influence of RNA structure on recombination. The MFED was calculated for a 100 nt sliding window (at 30 nt increments) for both PV1 and PV3 positive and negative strands (dashed and solid lines, respectively) and plotted against the number of total recombinants within the same sliding window. We chose a 100 nt window, as we found from our previous studies [19,20,28] that decreasing the sliding window size would introduce noise to the system attributed to the appearance and disappearance of small stem-loop structures, making the MFED values less conclusive. Spearman's correlation coefficient was calculated between the recombinant count and the MFED values on either strand.

Discussion
Recombination is an important evolutionary process in the single-stranded positivesense RNA viruses [29,30]. It is probably ubiquitous and may be a necessary rescue mechanism to accommodate, and escape, the build-up of deleterious mutations that arise from the rapidly processive [31], error-prone polymerase [32,33]. Despite this, the molecular mechanisms underlying recombination remain relatively poorly understood. Several previous studies have implicated RNA secondary structure and particularly sequence similarity in determining the location of recombination junctions [12][13][14]34,35]. However, while it appears that this interpretation is widely accepted [10,12,25,26], the studies reported here suggest that the early products of recombination are not significantly influenced by RNA sequence identity or structure, necessitating a rethink of the defining features of this important process.
By modifying our CRE-REP in vitro recombination assay [18], we investigated the influence of sequence identity or RNA structure in the parental genomes on the location of recombination junctions. We observed no significant variation in the distribution of recombinants occurring between the control region (Cluster 1) and the modified region (Cluster 2; Figure 2). Had either RNA sequence identity or structure been a significant influence on the mechanism of recombination, we would have expected the distribution of recombinants to be skewed towards the Cluster region that favored polymerase templateswitching events. As all assays maintained a distribution similar to the wild type CRE-REP assay, we inferred that neither sequence identity nor structure were a major influence on recombination.
By design, the CRE-REP assay generates viable progeny viruses for analysis and, as such, does not allow the initial population of hybrid genomes, from which viable recombinants are selected, to be characterized. We therefore went on to analyze the RNA population from HeLa cells coinfected with wild type PV1 and PV3. Despite the analysis of several hundred recombinant junctions by NGS (Figure 3), we could not detect any correlation between sequence identity (Figures 4 and 5) or RNA structure ( Figure 6) and the enrichment of recombination junctions at any one particular site. Consequently, we propose that such regions undergo recombination at a rate no higher than would be expected by chance.
We have recently shown that the resolution of imprecise recombinants to wild type length genomes occurs via repeated template-switching events and is highly dependent on virus fitness [27]. This new understanding of resolution may help explain the discrepancy between the current prevailing view on recombination and the conclusions of this study if, as we suspect, the roles of sequence identity and structure are involved in genome function and fitness, rather than recombination per se. For example, studies that map crossover junctions to structured regions of recombinant genomes may conclude that they are causative, when the association may actually be correlative, and reflective of the genome modularity and relatively high fitness of a crossover in that specific region. Similarly, with genomes that exhibit variably extensive regions of sequence identity within an area in which recombinants are mapped, the functional selection may result in a clustering of crossovers in a region of sequence identity, not because of the identity per se, but because the resulting genome has enhanced fitness and is capable of competing better with others in the population.
A number of interesting questions remain about this important evolutionary mechanism, which will require further study. If recombinants are generated at random as we propose, what are the predominant functional criteria that determine their selection? At a more mechanistic level, how does the template-switching event occur? Although our analysis strongly suggests that the process is random, there remains the possibility that regions of microidentity may influence the process. There remains the interesting possibility that precise and imprecise junctions could be generated by independent mechanisms. We consider this unlikely; a sequence and RNA structure-independent process, coupled with functional selection, can account for all of the range of crossover junctions we identify and all that have been reported in the literature, and would fit with the laws of parsimony.