1. Introduction
The global community is at the peak of emerging bugs, even though the earlier scares of zoonotic viruses were not retained. The re-emergence of viral agents is a great threat and challenge for the global health community [
1]. The global community has witnessed that over the last two decades, the world has experienced three outbreaks of coronaviruses with elevated morbidity rates. In December 2019, cases of mysterious pneumonia with unknown etiology were reported in Wuhan, Hubei, a province of China, which got the attention of the world [
2]. Researchers and the Chinese government responded swiftly, and after deep etiological and sequencing investigation, the International Committee on Taxonomy of Viruses entitled it as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [
3].
The SARS-CoV-2 belongs to
Betacoronavirus, a member of the subfamily
Coronavirinae having four genera:
Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and
Deltacoronavirus in family
Coronaviridae, categorized in the order
Nidovirales (
Figure 1).
Generally, CoVs are broadly distributed among humans, birds and other mammals, usually causing hepatic, enteric, neurologic and respiratory syndromes [
4,
5]. Four (229E, OC43, NL63 and HKU1) out of six human disease-causing CoVs are widespread, and in immune-competent individuals they normally cause common cold symptoms [
6]. Two other strains that were linked with fatal illness were zoonotic in origin, including severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) [
7].
In 2002–2003, the outbreak of severe acute respiratory syndrome occurred due to SARS-CoV in the Guangdong Province of China and quickly became pandemic to twenty-seven countries, infecting 8098 people with 774 deaths and was declared the first endemic of the 21st century [
8]. A decade later in 2012, MERS-CoV caused a severe respiratory disease that emerged in the Middle East with 2494 confirmed human infection cases and 858 deaths [
9]. In both epidemics, bats were identified as the original source of SARS and MERS-CoVs. The rate of human-to-human transmission of SARS-CoV-2 appears higher than earlier outbreaks of CoVs via cough and/or sneezing droplets emitted from an infected person. SARS-CoV-2 has appeared more transmissible but less deadly than SARS-CoV. To date worldwide, 194,029 confirmed cases of human infection and 7873 deaths across 164 countries have been recorded [
10].
In natural populations, mutations, recombination, and reassortment are the strategic evolutionary process considered for genetic diversity. The high incidence of homologous RNA recombination is one of the most fascinating features of CoVs replication [
11,
12,
13,
14]. Kottier et al. reported the first experimental-based recombination evidence for avian infectious bronchitis virus (IBV) [
15], although additional studies have also concluded that IBV evolves through recombination [
16,
17,
18,
19,
20,
21]. Moreover, murine hepatitis virus (MHV) evolution through recombination was also practically confirmed [
22]. This encouraged exploration of the probable role of recombination in the SARS-CoV emergence. The current condition might appear as a vulnerable factor for severe disease and may impose serious health threats to the human. Due to wide distribution with the increasing prevalence of CoVs, frequent genomes recombination, large genetic diversity and high human-animal interface behavior, CoVs might be emerged from time-to-time in humans due to occasional spillover and recurrent cross-species infectious events [
7,
23].
As an emerging virus, very limited information is available to describe the genetic diversity, evolutionary ancestors and possible routes of transmission of SARS-CoV-2 from the natural reservoir to humans. This study aimed to track the evolutionary ancestors of SARS-CoV-2 and different evolutionary strategies (mutations, recombination or reassortment) that were genetically adapted by the novel coronavirus.
3. Discussion
SARS-CoV-2 is a novel emerging contagious agent that found a way into human civilization. The outbreak of SARS-CoV-2 is the third pandemic of the 21st century and the situation is still ongoing. The prediction of Fan et al. [
24] that a future SARS or MERS-like CoVs epidemic would emerge in China with a probable bat source became reality when the first case of concentrated viral pneumonia was reported on December 30, 2019 in Wuhan city of China [
25]. Later on, the novel coronavirus designated as SARS-CoV-2 was found responsible for the viral outbreak of pneumonia in Wuhan [
26].
Generally, emerging and re-emerging viral infections belong to the RNA family of viruses since these viruses have high mutation rates that lead to eminent environmental adaptation with rapid evolution [
27]. To date, very little knowledge is available about SARS-CoV-2. To understand the genetic diversity relationship and potential origin of SARS-CoV-2, our molecular phylogenetic analysis predicted that SARS and SARS-like CoVs were the ancestors of SARS-CoV-2. Two bat SARS-like CoVs (ZXC21 and ZC45) were the closest relatives of SARS-CoV-2 (
Figure 2). Consequently, we found that the bat would be the convenient native host of SARS-CoV-2. Previously, it was found that several bat CoVs were able to cause infection in humans without any intermediate host [
28,
29].
Rapid sequencing of SARS-CoV-2 provided an opportunity for the research community to look into its genetic diversity, developing diagnostic tests and ultimately helping with vaccine production. The whole-genome sequence of SARS-CoV-2 retained ~80% nucleotide homology with SARS epidemic viruses. All the structural proteins were well conserved except for spike glycoprotein that showed a high rate of mutation in SARS-CoV-2 [
30,
31]. Our results demonstrated that compared with SARS-CoV, the SARS-CoV-2 shares ~81% amino acid similarity in spike (S) protein (
Table 1,
Figure 4), which represented less conserved patterns of S protein than other CoVs like HKU3-CoV [
32]. Through deep receptor-binding domain (RBD) analysis of SARS-CoV (amino acids), the SARS-CoV-2 RBD was 73% preserved comparatively to the pandemic RBD (
Figure S5). This conservation pattern of RBD placed the SARS-CoV-2 between HKU3-4 (62.7% conserved), a bat virus that was not capable of using the human ACE2 receptor, and the divergent bat CoV rSHC014 (80.8%), a spike known to use the human ACE2 receptor for entrance [
29,
33]. Moreover, the binding free energies for the S-protein to human ACE2 binding complexes were calculated and the binding free energy for the Wuhan-Hu-1-CoV S-protein increased by 28 kcal mol
–1 when compared to the SARS-CoV S-protein binding, representing more binding affinity to the human ACE2 receptor [
34].
Moreover, a recent study revealed that a polybasic cleavage site was present at the S1 and S2 junction of SARS-CoV-2 that effectively allowed cleavage by furin and the other protease and took part in viral host range and infectivity [
35], whereas these polybasic cleavage sites in other human beta-corona viruses have not been detected [
36]. Experimental investigation of Follis et al. with SARS-CoV demonstrated that furin cleavage site insertion at the S1-S2 junction increases cell-cell fusion [
37]. Additionally, an effective cleavage site in the MERS-CoV spike motif empowers bat MERS-like CoVs to infect human cells [
38]. On the other hand, in avian influenza viruses, quick replication and diffusion effectively acquired polybasic cleavage sites in the hemagglutinin (HA) protein, which served a similar function to that of the coronavirus spike protein. In CoVs, insertion or recombination facilitates acquisition of transforming low-pathogenicity into highly pathogenic forms for polybasic cleavage sites [
39]. So far sampled pangolin beta-corona viruses and the bat beta coronaviruses do not have polybasic cleavage sites. CoVs could have adopted a natural evolutionary mechanism to mutate and to attain the polybasic cleavage site because the virus must have both the mutations and polybasic cleavage site for appropriate human ACE2 receptor binding. For this purpose, it required a large population density for natural selection to attain an ACE2-encoding gene that is akin to the human ortholog [
40,
41]. The recent study of Peng et al. revealed that might it be possible that SARS-CoV-2 ancestors jumped into humans, getting the genetic features through adaptation and remaining undetected during human-to-human transmission. Once it adapted, these variations became pandemic and sufficiently produced a large number of cases to activate the immune system that identified it [
40,
41].
Usually viruses adopt different strategies including recombination, mutation and reassortment which facilitate the viruses in getting to equilibrium in the final host. Due to low fidelity of reverse transcriptase and RNA-dependent RNA polymerase, RNA viruses are more vulnerable to point mutations even though the point mutation rates in RNA viruses are approximately 10
−4 to 10
−5 [
42]. During the 2002 SARS-CoV epidemics, three mutations per RNA in each replication round were estimated (8.26 × 10
−6 per nucleotide per day) [
43]. Often, large population size and high rate of mutations in RNA viruses rapidly adjust genotypes allowing for quick adaptations in a rapidly changing environment. Respectively, mutations have a specific influence on virus reproductive fitness as positive selection drives to fix the positive fitness effects of beneficial alleles, while negative selection removes lethal and deleterious alleles from a population. Together with these selective approaches, the evolutionary routes of virus populations can be figured out across a sequence space [
34]. Examining the genetic insight of SARS and Wuhan-Hu-1-CoV presented more than 90% sequence conservation between the E, M and N protein with few numbers of point mutations (
Table 1), whereas the higher rate of mutations in the S protein of Wuhan-Hu-1-CoV were also observed and shared ~81% identity (
Table 1,
Figure 4A). These results were in accordance with the results of Xu et al. and Pradhan et al. [
44,
45].
Recombination and reassortment became a powerful tool of emerging viruses to get innovative antigenic combinations that might aid the course of cross-species diffusion. The recombination strategy facilitates this mechanism to find a better fraction of sequence space than the mutation, raising the probability of finding a genetic configuration which supports host adaptations [
46]. It is important to note that numerous recently emerged RNA viruses which were involved in human diseases exhibited active recombination or reassortment events. Mostly RNA viruses get entry into the new host through the cross-species transmission [
47]. The recombination events in viruses are in fact related to discontinuous utilization of RNA polymerase involved in the transcriptional mechanism to make mRNAs. RNA polymerase of viruses must use different RNA prototypes while making negative or positive RNA strands that eventually result in RNA recombination that is either homologous or non-homologous [
12]. In RNA viruses, this model of recombination is called the copy-choice model of recombination [
13,
14]. In CoVs, a high recombination rate has been reported [
48]. It might be due to having large genome size, discontinuous transcription, and sub- or fully transcriptionally active genomic length of RNA. The co-infection of two CoVs in same animal or cells can potentially facilitate crossing over. In the recent past, the emergence of new infectious bronchitis virus recombinant (IBV), a new type of CoV in turkeys, was reported. The genome sequence revealed that the S protein gene of this virus was the recombinant of another CoVs [
49]. In the S protein, the recombination event is certainly significant as it permits the virus to modify superficial antigenicity to get from the immune reconnaissance into the animals, and then adapt to a human host. We identified nine putative recombination patterns, which encompass, in terms of genes involved, the spike glycoprotein, RdRp, helicase and ORF3a. Six of the nine recombination regions were spotted in the S gene (
Table 2). Significantly, in this study each of the recombinant regions were predicted with at least two methods (
Table 2) according to the method of Posada. He recommended that one should not be dependent on a single method [
50]. These results were in agreement with previous reports where the recombinant event was reported between parent viruses in the avian-like and mammalian-like SARS-CoV evolution [
51,
52].
When segments of multiple viral genomes infect the same animal or tissue simultaneously, it ultimately results in new viral progeny with a multiple parent genome set. This process is termed as gene reassortment used by viruses for evolution [
28]. The literature suggests that a typical RNA influenza A virus has eight ssRNA segments and the assortment occurred among multiple influenza viruses termed as genetic “shift’’ or ‘‘antigenic shift’’ resulted in the change of influenza viral surface glycoprotein’s/neuraminidase. Thus, the sequence of these virus strains diverges widely when host animal cell gets infected by confection and the progeny is developed by reassortment or recombination [
27].
Taking this together, we found that SARS-CoV-2 was the descendent of SARS/SARS-like coronaviruses, being a close relative of Bat SARS-like CoVs (ZXC21 and ZC45). We confirmed that mutations in different genomic regions of SARS-CoV-2 have a specific influence on virus reproductive adaptability, allowing genotypes to adjust and quickly adapt in a rapidly changing environment. Moreover, for the first time we identified nine putative recombination patterns in SARS-CoV-2 which were undoubtedly important for evolutionary survival, meanwhile permitting the virus to modify superficial antigenicity to get from immune reconnaissance into animals and then adapting to a human host. With these combined natural selected strategies, SARS-CoV-2 emerged as a novel virus in human society.
4. Materials and Methods
For molecular phylogenetic analysis, the whole-genome sequences of 53 viruses including 10 SARS-CoV-2 were retrieved from NCBI through BLASTn search, with Wuhan-Hu-1-CoV being used as reference (
Table S1). All the sequences were aligned by using MAFFT (V 7.452) online server [
53]. To determine the nucleotides substitution model, the Bayesian information criterion (BIC) value for aligned sequences was determined using jModel Test 2 and the substitution model with minimum BIC values was considered for phylogenetic inference (
Table S2) [
54]. The whole-genome sequence was considered as a single partition, and three chains of Bayesian analysis were performed by applying the GTR+I+G model of substitution. Reaching the maximum allowed number of generations after discarding burin (270030000), the optimal analyses trees were pooled into a single tree file. Posterior probability values with majority consensus rule were visualized.
Figure 3 was used to visualize the best tree and the likelihood phylogram was exported as a picture [
55]. Multalin software was used to align and visualized the envelope, membrane, nucleocapsid, and spike glycoprotein regions of SARS-CoV, MERS-CoV and SARS-CoV-2 [
56]. The amino acid conservation motifs of the receptor-binding domain (RBD) in SARS-CoV and SARS-CoV-2 genome were traced by performing MUSCLE alignment using MEGAX software. The three-dimensional structures of spike glycoproteins of SARS-CoV2 and SARS-CoV were generated by using an online server Protein Homology/analogY Recognition Engine V 2.0 (Phyre2) [
57] and the structure was visualized and marked by using PyMol [
58]. To detect the recombination events, whole-genome nucleotide sequences of seven viral strains (Wuhan-Hu-1-CoV; Bat SARS-like including W1V1, ZXC21, ZC45; Bat SARS GZ02, RF1 and MERS) were aligned using ClustalW. Preliminarily, MaxChi and Chimaera algorithms were used to detect the recombination events in the dataset by a recombination detection program (RDP5) [
59]. Additionally, bootscan analyses and similarity plots were performed using Simplot 3.5.1 [
60] to confirm the RDP-suggested potential recombination events and were analyzed on the whole-genome sequence of Wuhan as a query and Bat SARS-like, SARS and MERS as potential parental sequences (
Table S1). A PHI statistical test was applied to evaluate the significance of recombination evidence between closely and distantly related genomes. Furthermore, the point of recombination along with major and minor parents of the recombinant was accessed through RDP, Bootscan, MaxChi, Chimaera and 3Seq methods [
59].