Recombination Events and Conserved Nature of Receptor Binding Motifs in Coxsackievirus A9 Isolates

Coxsackievirus A9 (CVA9) is an enterically transmitted enterovirus and one of the most pathogenic type among human enteroviruses. CVA9 isolates use a distinctive RGD (Arg-Gly-Asp) motif within VP1 capsid protein that defines its ability to bind to integrin receptor(s) for cellular entry. To investigate CVA9 evolution and pathogenicity, genetic relationships and recombination events were analyzed between 54 novel clinical isolates of CVA9, as well as 21 previously published full length CVA9 sequences from GenBank. Samples were investigated by partial sequencing of the novel VP1 and 3Dpol genes, as well as including the corresponding areas from GenBank sequences. Phylogenetic analyses were combined with clinical data in a further attempt to analyze whether sequence evolution reflects CVA9 pathogenicity in the phylogenies. Furthermore, VP1 gene was also analyzed for receptor binding sites including the RGD motif and the putative heparan sulfate (HS) site. Analysis of the 559-nucleotide-long VP1 sequences identified six clades. Although most of the strains within each clade showed geographical clustering, the grouping pattern of the isolates in the analysis of the VP1 gene was strikingly different from grouping of 3Dpol, which suggests that recombination events may have occurred in the region encoding the nonstructural proteins. Inclusion of clinical data did not provide any evidence of symptom based phylogenetic clustering of CVA9 isolates. Amino acid sequence analysis of the VP1 polypeptide demonstrated that the RGD motif was fully conserved among the isolates while the putative HS binding site was only found in one isolate. These data suggest that integrin binding is essential for virus tropism, but do not explain the symptom repertoire.


Introduction
Human enteroviruses, small positive-sense, single-stranded RNA viruses in the genus Enterovirus, family Picornaviridae, are subgrouped into four species, Enterovirus A-D, with altogether 116 virus types including polioviruses, coxsackie A and B viruses, echoviruses and numbered enteroviruses. Typing of human enteroviruses is based on genetic distances between VP1 sequences, representing the most heterogenous viral protein. Enteroviral VP1 is the only picornaviral protein linked to serotyping [1,2]. Enteroviruses are common human viruses and endemic in many developing countries. Although most enteroviral infections are subclinical, they cause a spectrum of diseases including mild upper respiratory illness (common cold), febrile rash (hand, foot, and mouth disease and herpangina), aseptic meningitis, pleurodynia, encephalitis, acute flaccid paralysis (paralytic poliomyelitis), and neonatal sepsis-like disease [3,4].
Genetic analysis of coxsackievirus A9 (CVA9) revealed that, despite its coxsackie A virus-like pathogenicity in newborn mice, it is genetically more closely related to CV-B viruses than to other

Clinical Specimens and Symptoms
The novel CVA9 samples used in the study originated from Finland (n = 16), the Netherlands (n = 30), Denmark (n = 1), and Northern America (n = 7). Sample details including symptoms are presented in Table 1, which also shows data for the 21 full length CVA9 isolates obtained from GenBank for this study.

Amplification of VP1 and 3Dpol Regions
RNA from CVA9 samples was extracted using a QIAamp MinElute Virus Spin Kit (Qiagen GmbH, Düsseldorf, Germany) following the manufacturer's instructions. Primers (OS/OAS and IS/IAS) for VP1 and 3Dpol and modified semi-nested PCR protocol were described previously [35]. In short, reverse transcription (RT) step was performed in 10 µL volume and contained 0.5 µL ImProm II (Promega, Madison, WI, USA), 1 µL OAS primer at 1 µM final concentration, ImProm II buffer, RNasin, and ddH 2 O. Samples were incubated at 42 • C for 60 min. Five microliters of RT sample was added to 25 µL PCR1 reaction, which included 0.5 µL Platinum Taq High Fidelity enzyme, 2.5 µL primers at 1 µM final concentration, buffer, and ddH 2 O. Cycling conditions for both VP1 and 3Dpol amplification were 95 • C for 3 min, then 35 cycles of denaturation at 95 • C for 20 s, annealing at 50 • C for 30 s and extension at 72 • C for 60 s. Final extension stage was for 5 min. In snPCR2, 1 µL of PCR1 reaction product was used.

Nucleotide Sequencing
Amplified DNA from RT-PCR reaction was treated with ExoSAP mixture and directly sequenced using 3730xl DNA Analyzer (Eurofins GATC, Köln, Germany) with the inner sense or antisense primer used for amplification (Eurofins Genomics, Ebersberg, Germany). Overlapping sequences were assembled with Staden Package, trimmed for equal length and aligned with Clustal omega implemented in Seaview program [36], and subjected to evolutionary analyses. All newly generated sequences were deposited in the GenBank database under the accession numbers MN493979-MN494086.

Phylogenetic Analysis
Bootstrapped phylogenetic trees for the VP1 and 3Dpol regions were constructed using the MEGA6 software package. Maximum likelihood (ML) and maximum composite likelihood (MCL) methods were used in the phylogenetic tree construction using the Kimura 2-parameter substitution model [37] with 500 bootstrap replicates. Data used in the analysis included all codon positions, with pairwise deletion for missing data. A Markov Chain Monte Carlo (MCMC) method, implemented in the BEAST software package version 1.8.2 [38], was used for temporal phylogenetic analysis of the data. For the Viruses 2020, 12, 68 5 of 15 VP1 region, dated sequences were analyzed with a chain length of 60 million, sampling every 6000 states, under the SRD06 substitution model [39] with the assumptions of a relaxed molecular clock model and a constant size coalescent. Other parameters were initially manually checked and adjusted, with final optimization done during the burn-in period. For the 3Dpol region, BEAST analysis was run with a chain length of 90 million to ensure good quality control parameters. Otherwise, the process for the 3Dpol region analysis was carried out the same way as for the VP1 region. Outputs from BEAST for both regions were analyzed within the TRACER v1.6 program to ensure convergence through graphical checks, as well as adequate quality control parameters of posterior distribution (ESS > 200). Final phylogenetic trees from BEAST outputs were constructed with the TreeAnnotator v1.8.2 software to find the maximum clade credibility tree from all the sampled tree states. Finally, the obtained trees were visualized with the FigTree v1.4.2 software.

RGD and HS Binding Site Analysis
To extract the RGD motif from the sample sequences, the raw inner antisense (IAS) and inner sense (IS) files were loaded into Unipro UGENE program [40] where VP1 open reading frames (ORFs) containing the motif were extracted if the sequence data were found to cover the area adequately. These sequences were later translated and trimmed to a 20 aa window around the motif for further analysis with the BioEdit 7.2.5 software [41]. Possible unresolved bases found in some sequences, likely due to RGD motif's position in the C-terminal end and the Sanger sequencing method used, were manually fixed by using the data from the original sequencing chromatograms. Eventually, a total of 64 sequences were used in the final RGD motif analysis, including the prototype Griggs (GenBank accession number D00627.1) sequence, as well as all available full length CVA9 sequences from GenBank.
The putative HS binding site region was extracted from the originally aligned sequence data with a 29 aa window around the VP1-T132 position as suggested in previous studies [20,21]. The resulting alignment was run through protein BLAST using the non-redundant protein sequence database, excluding CVA9 sequences, with the aim of identifying other potential enteroviruses with present HS binding sites. From the top 250 returned hits, the non-redundant enterovirus sequences were combined with the putative HS region from the full length CVA9 GenBank sequences included in this study and subjected for further analysis.

Phylogeny of CVA9 VP1 and 3Dpol Regions
A data set of 75 coxsackievirus A9 isolate sequences collected between 1959 and 2016 were analyzed for their VP1 and 3Dpol gene phylogenies. In detail, the complete data set consisted of sequences from three distinct geographical regions: Europe (n = 48), Americas (n = 11), and Asia-Pacific (n = 15), as well as the CVA9 prototype Griggs sequence. Full breakdown of details regarding the isolates is shown in Table 1.
The temporal phylogenetic analyses done with the BEAST v1.8.2 software package that visualize estimates of CVA9 divergence times are seen in Figure 1. Temporal phylogenies give clues to the evolutionary rate, as well as the time to most recent common ancestor (tMRCA) for the whole data set of VP1 and 3Dpol sequences. The CVA9 VP1 region showed a mean substitution rate of 4.1 × 10 −3 substitutions/site/year (95% HPD range 3.1-5.0 × 10 −3 ) (Table 2). Additionally, VP1 dated the tMRCA back to approximately 1889 (1856-1918). The 3Dpol region showed a mean substitution rate of 3.4 × 10 −3 substitutions/site/year (95% HPD range 2.3-4.6 × 10 −3 ), with the tMRCA dating further back to approximately 1814 (1726-1886) ( Table 2). The data may be somewhat biased due to the size of the sample collection and temporal and spatial differences.   1 Estimated year of tMRCA (95% CI in parentheses). 2 Estimated mean substitution rates (95% HPD range in parentheses).
The topology of the VP1 phylogeny obtained from BEAST analysis is showing strong clustering by country of origin, and to a degree by collection year, which is similar to what was observed with the initial ML trees. The Asia-Pacific sequences in clade C1 ( On the side of the 3Dpol phylogeny, the tree topology is immediately more disorganized compared to the VP1 topology ( Figure 1). While some clusters are still seen forming by geographic area, the clusters are noticeably smaller compared to VP1. Additionally, there are a higher number of individual sequences not positioning in distinct clades, with noticeably higher tMRCA estimates. . Clade C4 of Dutch sequences was largely intact in the 3Dpol phylogeny, as seen in clade C4.1. Two exceptions were seen with 63-7-NL positioning closer to C3.1, and 66-9-NL positioning closer C1.1. C5, the last clade from VP1 that contained primarily Dutch isolates, was found intact in the 3Dpol phylogeny as seen in clade 5.1. The Dutch sequences from the 1980s that were already seen dispersing around the tree on the VP1 side, are similarly seen dispersing around different clades in the 3Dpol phylogenetic tree, with no clear clustering by collection year seen, contrary to most of the other isolates.
American isolates were previously seen in the VP1 phylogeny forming the clade C6, as well as positioning in the Asia-Pacific clade C1, with C6 containing US, Mexico, and Canadian sequences from the 1970s and 1980s, whereas C1 contained more recent US sequences from 2005 and 2016. Clade C6 was found largely intact in the 3Dpol phylogeny with two noticeable changes. First, the 2015 US sequence 15-7-US, that did not cluster with other American sequences in VP1, is now seen part of C6 on the 3Dpol side. Secondly, C6 lost the member 83-3-US, which positioned closer to C1.1 in the 3Dpol phylogeny.

Receptor Binding Site Analysis
Previous studies have shown that the arginine-glycine-aspartic acid (RGD) motif found in the VP1 capsid protein of CVA9 has a role in cell entry. This motif mediates cell entry also in other picornaviruses such as echovirus 9 (Barty strain), human parechoviruses 1, 2, 4, 5, 6, and foot-and-mouth disease viruses, by binding to several different previously identified integrins [42]. In the case of CVA9, the RGD motif has been found to recognize and bind to the αVβ3 and αVβ6 integrins. Analysis of the 64 CVA9 isolates found that all of them possessed the RGD motif ( Figure 2). Several different mutations around the RGD motif were seen. L/M mutations at position RGD + 1 and L/F mutations at position RGD + 4.
BLAST search and analysis of the putative heparin binding site, described in a previous study [21], resulted in twelve non-redundant enterovirus types that had the proposed motif (E3, E5, E6, E7, E11, E12, E16, E31, EVB74, EVB85, EVB93, and EVA119). The motif for the putative heparin binding site for the viruses obtained through BLAST, as well as the full length CVA9 sequences obtained from GenBank for this study can be seen in Figure 3. Results for the heparin binding site analysis for the other novel CVA9 isolates used in this study have been previously published by others [21]. From the BLAST results, three of the twelve enterovirus types (E5, E7, and E11) were already previously suggested to possess the heparin binding motif and the relevant T132R/K mutation, whereas the nine remaining species are new to our knowledge [20]. However, the specific E7 isolate obtained through BLAST did not have this mutation present. The T132R/K mutation suggested for being responsible for enabling heparin binding was found in one of the nine newly proposed enterovirus types, EVB85 with a T132R mutation. Five out of the eight remaining types showed T132/Q/S/D/N mutations at this position (E3, E12, E16, E31, and EVB74). None of the full length CVA9 sequences obtained from GenBank exhibited the T132R/K mutation thought to enable heparin binding. remaining species are new to our knowledge [20]. However, the specific E7 isolate obtained through BLAST did not have this mutation present. The T132R/K mutation suggested for being responsible for enabling heparin binding was found in one of the nine newly proposed enterovirus types, EVB85 with a T132R mutation. Five out of the eight remaining types showed T132/Q/S/D/N mutations at this position (E3, E12, E16, E31, and EVB74). None of the full length CVA9 sequences obtained from GenBank exhibited the T132R/K mutation thought to enable heparin binding.

Discussion
Virus epidemiological studies have proven to be valuable in understanding the nature of virus evolution and clinical features. For example, in the case of enteroviruses, extensive epidemiological works, such as virus identification and typing, have been done with the poliovirus during eradication process. Furthermore, in the modern world population mobility is at a level where pathogens can travel across the planet within a day. This high mobility of people has highlighted further need for understanding how pathogens have moved geographically and evolved along the way, which in turn has a relationship with epidemiological research [44].
In this study, 75 clinical isolates of coxsackievirus A9 (CVA9) from three geographical regions, collected between 1959 and 2016, were subjected to phylogenetic analysis in order to study the occurrence of recombination and possible links to CVA9 pathogenicity. Additionally, genomic

Discussion
Virus epidemiological studies have proven to be valuable in understanding the nature of virus evolution and clinical features. For example, in the case of enteroviruses, extensive epidemiological works, such as virus identification and typing, have been done with the poliovirus during eradication process. Furthermore, in the modern world population mobility is at a level where pathogens can travel across the planet within a day. This high mobility of people has highlighted further need for understanding how pathogens have moved geographically and evolved along the way, which in turn has a relationship with epidemiological research [43].
In this study, 75 clinical isolates of coxsackievirus A9 (CVA9) from three geographical regions, collected between 1959 and 2016, were subjected to phylogenetic analysis in order to study the occurrence of recombination and possible links to CVA9 pathogenicity. Additionally, genomic analysis of the RGD and HS receptor binding sites was carried out to shed light on possible mutations that might affect tropism and pathogenicity. Enteroviruses are among the most common disease-causing viruses in humans, and coxsackieviruses belonging to this family are a cause of wide range of diseases in humans, such as meningitis, myocarditis, respiratory, and gastrointestinal infections. Thus, understanding possible recombination events between isolates originating from different geographical areas will be important in understanding how the species has evolved and what effects it might have on epidemiology. Furthermore, closer analysis of receptor binding areas gives valuable information regarding virus cell entry mechanisms and tropism. Combining available clinical data with the phylogenetic and sequence level analyses will further allow us to speculate on the possible effects that CVA9 evolution has on pathogenicity.
Evidence of possible recombination events was gathered by building phylogenetic trees of the VP1 and 3Dpol genomic areas of all 75 CVA9 isolates. Within the VP1 area phylogenetic tree, six bootstrap supported clades were identified, which resulted in the isolates grouping largely by country of origin and collection year. The clades identified within VP1 were compared to the phylogenetic tree of the 3Dpol area, which showed evidence of incongruence in the tree topologies (Figure 1). This incongruence suggests possible recombination between CVA9 isolates in clade C1 and C6. Clade C1 contained sequences gathered from Finland in 2007-2008, Asia-Pacific region between 2006 and 2013, and a single US sequence from 2005. On the other hand, C6 contained isolates mainly from the Americas gathered between 1974 and 1988, as well as a Dutch sequence from 1979. Isolates from C1 were seen splitting into four smaller clades in the 3Dpol phylogeny, with C1.2 and C1. 3 showing closer relationship to C6.1. Furthermore, Finnish CVA9 sequences were seen dispersing to a noticeable degree in the 3Dpol phylogeny compared to VP1, with isolates from C2 dispersing on the 3Dpol side and showing incongruent positioning compared to their VP1 counterparts. Additionally, Dutch isolates from C3 were seen mixing with the Finnish sequences from C2, as is seen on the 3Dpol phylogeny with the breaking of C3 into C3.1 and C3.2 while exhibiting closer relationships with Finnish sequences 94-10-FI, 99-15-FI, and 99-16-FI. Furthermore, Dutch sequences from C4 were also seen positioning significantly closer to Finnish sequences from C2 in the 3Dpol phylogeny, as is evident by the positions of clades C4.1 and C2.1. Overall, the phylogenetic analysis shows signs of continuous recombination between the CVA9 isolates. The tMRCA results show a rather distant ancestor for the viruses, but this could be explained due to the nature of the data set. Importantly, it should be noted that tMRCA estimates with these data are extrapolatory in nature, as the temporal analyses were done without any old reference sequence available to calibrate the data further. Thus, conclusions based on the tMRCA estimate of the basal node of the phylogenetic tree should be done with care in mind. The involvement of sequence data from additional countries, and spanning several decades from each country, would give a more detailed overview of the evolution of these viruses.
Combining clinical data with phylogenetic analyses gives an opportunity to speculate on what possible effect sequence evolution might have on CVA9 pathogenicity, and if this effect would be extracted from the data. VP1 and 3Dpol phylogenies were generated to visualize any discernible correlations between the tree topologies of the isolates and clinical data. The data set contained 21 isolates from patients with symptoms of meningitis (08-6-FI, 08-7-FI, 08-8-FI, 08-9-FI, 09-1-CHN, 09-2-CHN, 60-3-NL, 61-4-NL, 61-5-NL, 70-14-NL, 71-15-NL, 72-16-NL, 73-18-NL, 73-19-NL, 79-22-NL, 83-3-US, 85-1-CA, 96-11-FI, 97-12-FI, 97-13-FI, and 97-14-FI). The majority of these isolates are positioned in clades C1, C2, C3, C5, and C6 in the VP1 phylogeny. Additionally, isolates 08-9-FI, 85-1-CA, 70-14-NL, and 71-15-NL are not part of any previously defined clade. Comparing this topology to the positioning of the isolates in the 3Dpol phylogeny, there is no clear shift in the placement of meningitis causing CVA9 isolates. The isolates are seen grouping similarly as on the VP1. That is, the meningitis causing isolates within the individual clades in the VP1 phylogeny form similar clusters in the 3Dpol phylogeny. Furthermore, there is no perceived effect of these clusters pooling together or closer to each other in any significant manner in either phylogeny based on symptoms, but rather the grouping is seen to be more related to the country of origin and collection dates as previously stated. In a similar fashion, no conclusions can be made on pathogenicity based on the phylogenies by looking at cases where the clinical isolates caused diarrhea in patients. A more complete record of clinical data for symptoms, as well as more longitudinal data, would to highlight evolutionary effects and could help in analyzing these possible effects.
Regarding the RGD motif, the leucine flanking RGD at position +1 has been previously found to stabilize integrin binding to the α5β1 and αVβ3 receptors in another picornavirus, the foot-and-mouth disease virus (FMDV) [44]. Additional studies have also shown the leucines at RGD + 1 and +4 to be important for stable binding to integrin αVβ6 [45]. The RGD surrounding area is shown to be highly variable with multiple mutations visible on both the nand C-terminal side of the motif. However, the exact effect of these mutations on CVA9 infection remains unknown. At the same time the highly conserved nature of the RGD motif is indicative of its likely role in the early steps of multicellular infection since αV receptors are primarily expressed in epithelial cells.
The heparin binding has been suggested to occur via a T-R/K mutation in the position 132 (T132R/K) of VP1 in CVA9. The T132R/K mutation was proposed to cluster positive charges around the 5-fold axis, thus allowing binding to heparan sulfate on the cell surface [20]. The T132R/K mutation identified in three other enterovirus types suggests that these types may use HS as a cell entry receptor. However, previous studies have shown that blocking of this site does not necessarily inhibit infectivity [21]. Thus, it can be assumed there being additional putative binding sites that warrant further study in order to gain a comprehensive view of the role of HS in receptor-mediated infection by CVA9. Additionally, BLAST analyses exploring this possibility showed three other previously known enteroviruses (E5, E7, and E11) and one previously unknown (EVB85) also possess the T132R/K mutation thought to be responsible for enabling HS binding. The five enterovirus types (E3, E12, E16, E31, and EVB74) discovered possessing a mutation other than T132R/K at this point included the mutations T132/Q/S/D/N. None of these amino acids possess a positive charge, with glutamine, serine, and asparagine having a neutral charge, and aspartic acid having a negative charge. Furthermore, studies have shown that heparan sulfate binding motifs are composed of sequences of the type XBBXBX and XBBBXXBX, where B is a basic residue, and X is somewhat neutral or hydrophobic residue [46]. None of the found Q/S/D/N mutations fit into the suggested model of the motif structure. However, ultimately their effect on CVA9 infection remains unexplored.
The RGD and HS binding motifs are both located in the VP1 gene, which encodes for one of the four capsid proteins and is a highly variable area of the genome. Furthermore, the VP1 area is used in virus typing and as such is an area of great interest when studying viral pathogenicity. Thus, any significant changes in these motifs would be of high interest if detected. In this data set, we show that RGD remains to exhibit conserved nature in clinical samples. The RGD motif was found to be fully conserved in all of the isolates, and while some mutations surrounding the immediate motif were found, the motif itself remained unchanged, both highlighting its importance in infection, but also diminishing its importance in changes of tropism. The putative heparin binding motif was found to be present in some enterovirus types, and the site has been proposed to play a role in enterovirus infection [20]. However, the T132R/K mutation thought to enable heparin binding is found to be very rare in these data, with none of the GenBank CVA9 sequences possessing the mutation, and only 10 out of 54 of the novel CVA9 sequences possessing a T132R/K mutation as published previously [21]. Consequently, any correlations cannot be speculated between changes in CVA9 pathogenicity and the changes, or lack thereof, of the RGD and HS binding motifs. Thus, changes seen in CVA9 pathogenicity must originate from elsewhere in the sequence.

Conclusions
In conclusion, the analysis presented in this study suggests that CVA9 exhibits continuous recombination, taking place over multiple decades as well as across vast geographical areas. The data set used in this study showed possible recombination events taking place by analyzing incongruences observed in the topologies of the phylogenetic trees. A clade containing CVA9 isolates from the Asia-Pacific region collected between 2006 and 2013, US isolates collected in 2005 and 2016, and Finnish isolates collected in the 2000s exhibited possible recombination with sequences collected from the US between 1970 and 1980. Dutch sequences collected between 1972 and 1979 and present in the clade C3 showed possible recombination with Finnish sequences from 1990s, which were within the clade C2 in VP1, and with some Dutch sequences from the early 1960s. In addition to phylogenetic recombination analyses, the clinical data was analyzed to visualize if the changes in tree topologies reflect changes in CVA9 pathogenicity. Based on the analyses of meningitis and diarrhea clinical data, no conclusions can be made from these data as there were no discernible changes in the phylogenies that would reflect the clinical symptoms. A more complete record of both clinical and longitudinal data is required to more accurately reflect sequence evolution and to make more educated conclusions on these connections. Furthermore, analysis of the RGD and HS receptor binding sites revealed that RGD remains conserved while the role of HS site is elusive. The RGD motifs highly conserved state in nature warrants further studies of its function in multicellular infection, as it has been shown not to be required for infection in certain cell lines [15,16,18]. Additionally, the observed mutations targeting leucines at positions RGD + 1 and +4 around the motif could have an effect on binding affinity. Although the highly conserved nature of RGD highlights its importance in infection, it does not bring forward any new information regarding CVA9 pathogenicity. Thus, any differences between pathogenicity of CVA9 isolates must be explained by sequence evolution outside RGD region. The putative heparin binding site, thought to be enabled by the VP1-T132R/K mutation, was found to be rare. The T-R mutation was found in only one of the 54 novel CVA9 isolates within this data set, whereas the T-K mutation was found in nine, as described in a previous study [21]. Additionally, the putative HS binding site was not found present in any of the full length CVA9 sequences gathered from GenBank for this study. BLAST analyses found that other enteroviruses could also possess the HS binding mutation, suggesting that echoviruses 5, 7, and 11 also have the motif in question, all of which were already described by a previous study [20]. However, previous inhibition studies have shown that CVA9 may possess additional unidentified HS binding sites within its sequence [21], making these sites a good target for further studies in order gain a more complete picture of the infection mechanisms of CVA9, as well as other enterovirus types.