Evolutionary Dynamics of Type 2 Porcine Reproductive and Respiratory Syndrome Virus by Whole-Genome Analysis

Porcine reproductive and respiratory syndrome virus (PRRSV), an important pathogen in the swine industry, is a genetically highly diverse RNA virus. However, the phylogenetic and genomic recombination properties of this virus are not yet fully understood. In this study, we performed an integrated analysis of all available whole-genome sequences of type 2 PRRSV (n = 901) to reveal its evolutionary dynamics. The results showed that there were three distinct phylogenetic lineages of PRRSV in their distribution patterns. We identified that sublineage 2.7 (L2.7), associated with a NADC30 cluster, had the highest substitution rate and higher viral genetic diversity, and inter-lineage recombination is observed more frequently in L2.7 PRRSV compared to other sublineages. Most inter-lineage recombination events detected are observed between L2.7 PRRSVs (as major parents) and L3.4 (a JXA1-R-related cluster)/L3.7 (a WUH3-related cluster) PRRSVs (as minor parents). Moreover, the recombination hotspots are located in the structural protein gene ORF2 and ORF4, or in the non-structural protein gene nsp7. In addition, a GM2-related cluster, L3.2, shows inconsistent recombination modes compared to those of L2.7, suggesting that it may have undergone extensive and unique recombination in their evolutionary history. We also identified several amino acids under positive selection in GP2, GP4 and GP5, the major glycoproteins of PRRSV, showing the driving force behind adaptive evolution. Taken together, our results provide new insights into the evolutionary dynamics of PPRSV that contribute to our understanding of the critical factors involved in its evolution and guide future efforts to develop effective preventive measures against PRRSV.


Introduction
Porcine reproductive and respiratory syndrome (PRRS) is a viral disease characterized by two overlapping clinical presentations, complications with reproduction and respiratory disease in pigs of any age [1,2]. This disease has brought enormous economic losses to swine production in recent years, particularly following the emergence of the highly pathogenic PRRS virus (HP-PRRSV) in China and has become an intractable problem for the development of the swine industry [3][4][5]. PRRSV belongs to the genus Arterivirus and has a positive-sense RNA virus with a full-length genome of approximately 15 kb, which is organized into at least ten overlapping open reading frames (ORFs): ORF1a, ORF1b, ORF2a, ORF2b, ORF3, ORF4, ORF5a, and ORF5-ORF7, whereas ORF2a, ORF2b, ORF3, ORF4, and ORFs 5-7 encode the viral structural proteins GP2, E, GP3, GP4, GP5, M, and N, Viruses 2021, 13, 2469 2 of 13 respectively [6]. ORF1a and ORF1b constitute 75% of the viral genome, encoding at least 16 mature non-structural proteins (nsp1a, nsp1b, nsp2-6, nsp2TF, nsp2N, nsp7α, nsp7β and nsp8-12) [7][8][9][10]. ORF5, encoding GP5, is one of the most variable regions of structural proteins in the PRRSV genome and is thus often used for phylogenetic analysis [11,12]. GP5 is the primary envelope protein, a glycoprotein of approximately 200 amino acids with an apparent molecular mass of 26 kDa [11]. There are 2-4 glycosylation sites, an N-terminal signal peptide directing protein synthesis to the rough endoplasmic reticulum (ER), and some antigenic determinants that induce neutralizing antibodies included in GP5 [13,14]. The presence of a major neutralization epitope in the N-terminal ectodomain implied GP5 is involved in receptor recognition [15]. Thus, GP5 is important for understanding the genetic relatedness and epidemiological status of PRRSV field isolates, and for advancing vaccine development [12,13,15].
The PRRS disease was first discovered in swine farms in the United States and Europe in the 1980s [16], with reports of its occurrence in China as early as 1996 [17]. With the emergence of HP-PRRSV strains; however, serious disease epidemics have occurred in China since 2006, and it has spread throughout the country with considerable genetic variation [5,[18][19][20]. PRRSV is a rapidly evolving virus, with over 900 whole-genomes of PRRSV issued in the last years. Thus, the two genotypes of PRRSV, European type (Type 1), and North American type (Type 2), share only about 60% nucleotide similarities at the genomic level and characterized by Lelystad virus and VR-2332, respectively [16,21]. One of the key questions on PRRSV arises from its genetic diversity, which is thought to have a direct impact on immunobiology, epidemiology, diagnosis, and vaccine efficacy [22]. The widespread epidemic, persistent recurrences, and significant losses because of this intractable disease call for surveillance on the disease. Constant surveillance on PRRSV occurrence is crucial to a better understanding of the epidemiological and evolutionary dynamics of co-circulating viral lineages [9,23]. Previous studies described a comprehensive picture of the diversity of type 2 PRRSVs and classified PRRSV sequences into nine lineages based on phylogenetic analyses of ORF5 sequences [12]. However, the evolution of complete PRRSV genomes is not fully established, and there is limited knowledge about how PRRSV lineage (especially regarding vaccine-related PRRSV lineage) influences the evolution of PRRSV. In the study, we performed a comprehensive analysis of PRRSV through phylogenetic analysis, Bayesian evolution analysis, recombinant analyses, and selective pressure analyses, which will help continually update our knowledge on circulating PRRSV. Understanding viral dynamics on a regional scale could provide important insights into the local evolutionary and ecological dynamics of PRRSV and aid in developing strategies that might diminish disease impact. Meanwhile, effective and safe vaccines should be developed and selected for the prevention and control of PRRS, considering the extensive genetic diversity of PRRSV.

Phylogenetic and Sequence Distance Analyses
The sequences of 75 type 1 PRRSV and 901 type 2 PRRSV were obtained from the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov, accessed on 1 November 2021) in August 2020 and used for analyzing phylogenetic and new lineage classification. These sequences consist of PRRSV field samples worldwide (e.g., China, the United States, and South Korea), live attenuated vaccine strains and laboratory attenuated strain. All information of sequences (GenBank ID; reported time; country; host) is classified in Table S1. A total of 976 genomic sequences were aligned using MAFFT [24]. Conserved sequence was selected with Gblocks using default parameters [25]. Multiple sequence alignments of the 976 PRRSV genomes were performed by using the genome sequence of Lactate dehydrogenase-elevating virus (GenBank accession no. NC001639) as an outgroup. To further identify the lineage classifications for all type 2 PRRSV strains circulating in the world, a complete genome phylogeny was reconstructed using IQ-TREE software. ML phylogenetic trees were constructed using IQ-TREE [26], and the best-fitting Viruses 2021, 13, 2469 3 of 13 nucleotide substitution model was determined automatically by the program following 1000 bootstrap replicates. The results were visualized using iTOL v.4 (http://itol.embl. de/ (accessed on 1 November 2021)) [27]. The p-distance analyses were assessed by MEGA X [28].

BEAST Analyses
Bayesian molecular clock phylogenetic analysis was performed by using BEAST 1.8 [29] and then processed through recombination detection using RDP v.4.96 [30]. Sequences suspected to be recombinants were removed. To ensure that these had sufficient temporal structure in alignment for reliable rate estimation, we first performed a regression of root-to-tip genetic distances on the ML tree against exact sampling dates using the TempEst [31], and anomalous sequences were removed based on the estimated root-totip distance. For each lineage, two independent runs of 100 million generations were performed, sampling every 10,000 generations and removing the first 15% as burn-in. Using BEAST, time-stamped data were analyzed under the uncorrelated lognormal relaxed molecular clock [32], the GTR nucleotide substitution model [33]. The Bayesian skyline coalescent tree prior was used. Convergence was assessed using TRACER 1.7 [34].

Recombinant Analyses
Potential recombinations within the whole genome sequences were screened using seven methods (RDP, GENECONV, MaxChi, Bootscan, Chimera, SiScan, and 3Seq) implemented in the Recombination Detection Program version 4 (RDP4) and set p < 0.001 to avoid inaccurate recombination events [30]. The breakpoints were also defined by RDP4, and recombination events were indicated when four of seven methods reported recombination signals. The recombination frequency for each gene was the percentage of the recombination events that occurred at genes from the total length of genes. Recombination events with only sufficient evidence (not partial or trace evidence in the same event) and both parental strains unambiguously identified (i.e., no missing parental strain scenario) were kept. The heat map was plotted using TBtools [35].

Phylogenetic Analyses of the Whole-Genome of PRRSV Strains
Recently, an increasing number of PRRSV whole-genome sequences have become available by next-generation sequencing (NGS) [38]. Here, we constructed a phylogenetic tree using IQ-TREE based on the 976 whole-genome sequences of PRRSV isolates, and PRRSVs can be classified into two major genotypes, type 1 (75 PRRSV isolates) and type 2 (901 PRRSV isolates) ( Figure 1A), which is consistent with previous phylogenetic studies [12,16,21]. To further gain insight into type 2 PRRSV, a phylogenetic tree was constructed based on the maximum likelihood phylogenetic analysis of 901 whole-nucleotide alignments, and type 2 PRRSV isolates can be divided into three major lineages, named as  (Figures 1B and S1). Most PRRSV strains in L2 were predominantly observed in the United States, while most PRRSV strains in L3 were reported in China. Moreover, inter-lineage strains showed considerable genetic diversity ( Figure 1C), indicating a meaningful classification system. Intra-lineage strains in L2 showed greater genetic diversity than those in L1 and L3, suggesting that different evolutionary events probably influenced these three different lineages. In addition, both L2 and L3 were further divided into seven sublineages (L2.1 to L2.7; L3.1 to L3.7). Among L2 PRRSVs, L2.7 was not only predominant in the United States but also gradually became dominant in China. Another remarkable result was that a large number of sequences belonging to L3 had "abnormally" low genetic diversity. In addition, we located two important live attenuated vaccines in the overall PRRSV whole genome phylogeny. MLV vaccine strain and its parental strain, VR-2332, belonged to lineage 2.1, along with a large number of PRRSV strains in the United States. However, JXA1-R vaccine strain belonged to lineage 3.4, which contained 107 sequences with a relatively low level of diversity ( Figure 2). Therefore, based on the number of strains and previous phylogenetic studies, [9,39] we selected three sublineages (L3.3, L3.4, and L3.7) of L3 and one sublineage (L2.7) of L2 for further study.

Intra-Sublineage Genetic Diversity and Temporal Analysis of Sublineages of Type 2 PRRSV
To investigate the genetic diversity of the major PRRSV sublineages (L2.7, L3.3, L3.4, and L3.7), we calculated the mean genetic distance of intra-sublineages and their distribution over time to illustrate the evolutionary event ( Figure 2). Our analysis of the major sublineages revealed two modes of genetic diversity distribution. One of the modes of genetic diversity distribution included strains with low genetic diversity. In particular, L3.3 and L3.4 accounted for a high proportion of all sublineages and became the dominant sublineages in 2006-08 (14.15% and 15.09%), whereas L3.7 strains were reported more frequently after 2009 and occupied a larger proportion of strains (53.33%), but with lower genetic diversity. Another significant point is that L3.4 had a relatively stable number at all times, which is unusual as a vaccine-like sublineage. This result was consistent with the fact that significant HP-PRRSV outbreaks were reported in China after 2006. In contrast, L2.7, the predominant sublineage circulating in 2012-2018, showed another type of genetic diversity distribution mode, which had multiple substitutions at the nucleotide sequence, resulting in diversification into high genetic diversity. L2.7 had not only a particular temporal distribution but also a characteristic geographic distribution, showing L2.7 in both China and the United States. In general, the study of the molecular epidemiological characteristics of L2.7, L3.3, L3.4, and L3.7 will play an important role in the prevention of PRRSV. To investigate the spatiotemporal relationships of L2.7, L3.3, L3.4, and L3.7, we performed Bayesian spatiotemporal speculation using BEAST 1.8. As shown in Figure 3, the viruses in L2.7 and L3.7 evolved at the higher substitution rate, while the viruses in L3.3 and L3.3 evolved at the lower substitution rate, suggesting that the rates of substitution varied among different sublineages. For the high substitution rate mode, the BEAST estimation of L2.7 and L3.7 obtained by the strict molecular clock model yielded an average substitution rate of 5.117 × 10 −3 and 2.484 × 10 −3 substitutions/site/year, with a 95% credible interval of 4.8646 × 10 −3 to 5.3617 × 10 −3 and 2.3525 × 10 −3 to 2.62 × 10 −3 substitutions/site/year.

Positively Selective Drives Adaptive Changes of PRRSV
Envelope glycoproteins GP2, GP4, and GP5 are the most variable structural proteins of PRRSV and are the most viral protein that most frequently induces neutralizing antibodies [6,40,41]. Five neutralizing epitopes (41-55 and 121-135 amino acids in GP2, 40-79 amino acids in GP4, 27-31 and 37-44 amino acids in GP5) have been identified in these envelope glycoproteins [13,17,[42][43][44][45]. GP5 also contains several N-glycosylation sites, two of which (N44 and N51) are highly conserved among PRRSV isolates [46,47]. The secondary structure and transmembrane helixes of GP2, GP4 and GP5 have been predicted by PRED-TMBB and PSIPRED servers [48,49]. Structurally, PRRSV GP5 (in type 2 PRRSV) has an N-terminal signal peptide (11-26 amino acids) directing protein synthesis to the rough endoplasmic reticulum (ER), passing the outer membrane three times, and has membrane-spanning strands built up-barrel structures (63-83 amino acids; 87-102 amino acids; 108-132 amino acids). The C-terminal part (132-200 amino acids) is most probably located in the cytosol and terminates inside the virus. PRRSV GP2 and GP4 (in type 2 PRRSV) have a similar structure which has an N-terminal signal peptide  Figure 4B). The positively selected sites for PRRSV GP2 in L2.7 were more in contrast to the L3.3, L3.4, and L3.7. These adaptive changes may have beneficial consequences for the existence of L2.7 PRRSV. However, in contrast to the L2.7, L3.3, and L3.7, the positively selected sites for PRRSV GP5 in L3.4 were higher, which means that viral variants from L3.4 may have higher fitness and are therefore preferred. Then, the result explains one of the reasons why L3.4 has a large number of strains but maintains "the low substitution rate mode". Thus, vaccines for PRRSV require continuous monitoring and surveillance on development stage or post-market stage, which can improve the conditions of the swine industry.
Previous studies have reported the occurrence of recombination in different PRRSV strains, but the frequency of such recombination events and the genome segments involved are not well known. To map and compare the probability of recombination along the PRRSV, the relative recombination rates (the number of recombination that occurred at a gene relative to the total gene length) of each gene were calculated. The locations of recombination breakpoints were determined according to the prototype type 2 PRRSV strain. The recombination breakpoints were distributed throughout the whole-genome. In all 296 recombination events, the high-frequency recombination regions were located in nsp7 region (4.04%) which is involved in PRRSV replication, and in the GP2 and GP4 region (4.89% and 5.05%, respectively) ( Figure 6). The high-frequency recombination regions of PRRSV in all recombination events genome-wide showed a similar fluctuation trend as in the L2.7 recombination events, suggesting that the recombination hotspots of PRRSV are consistent and would not change based on different lineages.

Discussion
Type 2 (or North American-like) PRRSV was first recorded in the United States in 1987, and its spread constitutes a risk for the swine industry, causing large economic losses each year [16]. Based on analyses of PRRSV GP5 sequences, previous studies described nine well-defined lineages for type 2 PRRSVs. Although the PRRSV ORF5 phylogeny has been frequently used to study genetic relationships, the full picture has hardly been revealed due to the limited information provided by PRRSV GP5. Thus, with the rapid growth of PRRSV whole genomes in databases, the phylogenetic analyses of PRRSV whole genomes needed to establish a reference sequence set for future genotyping studies. Here, we systematically investigated the genetic diversity, evolutionary dynamics, and recombination of type 2 PRRSV. The phylogenetic tree indicated that the complete PRRSV genomes were classified into three separate genogroups (L1, L2 and L3). The sequences of L2 and L3 accounted for a large proportion as the dominant lineages in different years, which is a suitable model to detect the genetic variation lineage of PRRSV. This study was the first to calculate the genetic diversity of type 2 PRRSV. Type 2 PRRSVs showed completely different genetic diversity in L2 and L3, suggesting that L2 and L3 are probably influenced by different evolutionary events. We investigated the reasons for this phenomenon, which would provide valuable information and new insights for PRRSV epidemic trends and control strategies.
From the perspective of molecular evolutionary dynamics, L2.7, representing by the currently prevalent isolate NADC30, is undergoing a reduction in genetic diversity, which is in line with the trend in which the diversity of a certain sublineage is reduced as it becomes the dominant population before the occurrence of an epidemic. Therefore, we predict that an outbreak of L2.7 may occur in the near future. Moreover, L3.7 experienced evolutionary events similar to L2.7 and kept stable in China. Our data suggested the need to protect against related viruses that have a high substitution rate mode. We would like to predict when the new PRRSV pathogen with high substitution rate mode re-emerges, which reminds us to constantly have surveillance on some lineages (L2.7 and L3.7). In addition, although several commercial live attenuated live vaccines have been widely used in China, PRRS is still severe in the swine industry. [51][52][53] The genetic diversity and complexity of PRRSV are further increasing. Some vaccine lineages (L3.4) are currently undergoing a "high plateau" in population size and positively selected results explain it. We hope our study could not only facilitate the correct selection of vaccines but also a light on directions for future vaccine development.
Recombination is a pervasive phenomenon among PRRSV isolates and is an important strategy for generation viral genetic diversity [39,51,54]. L1, L2, and L3 were systematically screened for inter-and intra-lineage recombination. Despite our limited understanding of the reason for recombination events in type 2 PRRSV, the recombinant analyses update our understanding of the recombination model of PRRSV, allowing us to understand the frequency of such recombination events. In the analysis of recombination, we reviewed the recombination modes in type 2 PRRSV in this study and found several differences in recombination patterns between L2 and L3. Our data reflected that recombinants in L2 evolved from a complex pattern (intra-and inter-lineages) of L2 + L1, L2 + L2, and L2 + L3, whereas recombinants in L3 evolved from a singular pattern (intra-lineages) dominated by L3 + L3. In addition, L2 PRRSVs were extremely active in the recombination event, which we attributed to some PRRSV genotypes that were more susceptible to recombination, particularly L2.7 PRRSVs (Figure S2), and co-circulate was also an important factor in preventing and controlling PRRSV difficultly. The minor parents of recombinants have gradually focused on L3.4 PRRSVs and provide adaptive evolutionary gene when it currently co-circulating with other lineages, which is an important reservoir for the genetic recombination of PRRSVs. There is a unique sublineage "L3.2" (intra-lineage recombination frequency at 65%) that underwent extensive recombination in the evolutionary history of L3 and is distinct from other sublineages.
In our analysis of recombination, we also examined recombination breakpoints in the PRRSVs sequenced in this study. Attractively, the recombination hotspots were located in the nsp7 region (4.04%), which is involved in the replication of PRRSV, and in the GP2 and GP4 region (4.89% and 5.05%).GP2 and GP4 are the main factors of PRRSV entry into cultured cells [39]. Recombination of PRRSV at these regions may be associated with an increase in replication capacity and cellular tropism, which is thought to facilitate survival and spread and ultimately drive the pathogenesis of PRRSV. As to what type of gene structure causes high recombination frequency, further exploration of this potential mechanism is needed. In short, our analysis would provide valuable information and new insights for PRRSV epidemic trends and control strategies. We hope it will help in the future prevention and control of PRRSV.
Finally, these results will help us to understand the recombination mode of PRRSV, which guides us to focus on the monitoring of the PRRSV recombinant virus. These facts prompt people to re-evaluate the strategies of prevention and control of PRRSV all over the world, and people become more cautious about massive vaccination. Maybe they can look for a new vaccine development strategy that is safe. Combining PRRSV recombination mode, positively selective, and molecular evolutionary dynamics would help with future prevention and control of PRRSV.

Data Availability Statement:
The datasets generated for this study are available on request to the corresponding author.