3.1. The N-Terminal Domain (NTD) of the SARS-CoV-2 Spike Protein Contains Divergent Loop Regions That Are Structurally Analogous to MERS-CoV
Receptor binding with the host cells is the initial step in virus infection, tissue tropism, and cell spread. Coronaviruses utilize a complex patterns of receptor recognition to infect diverse host cells. COVID-19 caused by SARS-CoV-2 has been a concern of increased global burden with high co-morbidity and mortality [
26]. With a limited understanding of the diverse range of tissues targeted by the virus and its potential receptors, there is an immediate need to understand the SARS-CoV-2 entry mechanism and pathogenesis to develop effective therapeutics. Genome sequence-based phylogenetic analyses discuss the evolutionary origin of SARS-CoV-2 and its similarity with SARS-CoV [
27]. Additional studies highlight the high degree of similarity between the receptor-binding domain (RBD) of SARS-CoV-2 and SARS-CoV, and their binding with the common human ACE-2 receptor [
28]. Despite all these similarities, SARS-CoV-2 is more infectious than SARS-CoV [
29]. MERS-CoV remains the only known human-infecting coronavirus that utilizes a dual-receptor strategy during infection. To investigate if SARS-CoV-2 could utilize similar reciprocal-receptor utilization, we analyzed the phyletic relatedness of the spike glycoproteins from coronaviruses that are known to infect humans. As expected, spike proteins of SARS-CoV and SARS-CoV-2 are highly similar and both groups are together in the same clade (
Figure 1a). The closest spike glycoprotein to SARS clade is of the MERS-CoV, which suggests that MERS-CoV shares a higher similarity to SARS clade than other coronaviruses. Additionally, the spike protein of HCoV-229E and HCoV-NL63 form a separate clade (
Figure 1a). Even though the spike proteins of HCoV-OC43 and MERS-CoV are distantly related, they both bind to host sialic acids as an alternate host-receptor during infection. To date, nothing is known about the interaction of the SARS clade with host sialic acid receptors.
Despite the 76% homology between spike-proteins, SARS-CoV-2 is more infectious than SARS-CoV, which suggests a possible structural or mechanistic difference [
7]. One stark difference between their spike proteins is the presence of a furin-like cleavage site on the SARS-CoV-2 spike protein [
30]. SARS-CoV-2 has 12 extra nucleotides encoding three amino acids upstream to the single Arg↓ cleavage site forming a PRRAR↓SV sequence, which is a canonical furin-like cleavage site [
30]. The presence of this furin-like cleavage site in SARS-CoV-2 is predicted as a possible reason for its efficient spread as compared to the other beta coronaviruses [
30]. Alternatively, by comparative sequence analysis of the NTD of spike protein, we identified three extended regions in SARS-CoV-2 and MERS-CoV (
Figure 1b), but not in SARS-CoV (
Figure 1b). To identify if the divergent loop region forms a functional module in the NTD of SARS-CoV-2, we modeled the full-length structure of SARS-CoV-2 spike glycoprotein strongly biased on the cryo-EM structure of SARS-CoV-2 spike protein (
Figure 1c). The cryo-EM structures of SARS-CoV-2 spike protein display a well ordered β-strand rich NTD, RBD, and the core helical domain [
8]. Owing to their flexibility, several β-β loops in the NTD of SARS-CoV-2 spike glycoprotein display almost no cryo-EM density even after
B-factor sharpening. All unresolved β-β loops were modeled ab-initio and the model with the best DOPE score was further energy minimized and used for computational analyses as discussed in the methods section. Upon structural comparison of the modeled SARS-CoV-2 spike glycoprotein with SARS-CoV, we found a major difference concerning their β-β loop lengths. SARS-CoV-2 has larger, β4-β5, β9-β10, and β14-β15 loops in comparison to that of SARS-CoV (
Figure 1b). The β14-β15 loop is particularly interesting owing to its length and flexibility due to the presence of interspersed glycine residues and a flanking poly-alanine region respectively (
Figure 1b). The β14-β15 loop of SARS-CoV-2 is reminiscent of the β-hairpin loop (Ser126-Ile140) of MERS-CoV (
Supplementary Figure S1a) [
11]. The MERS-CoV β-hairpin loop features a similar long arm that forms critical electrostatic anchor points to host sialoside receptor engagement and stability [
11].
3.2. The Divergent Loop Regions within SARS-CoV-2 NTD Forms a Sialoside-Binding Pocket
Despite the presence of the RBD domain that binds to host cell receptors, several coronaviruses utilize sugar-binding receptors as an alternative mode to bind and infect host cells. Such a dual-receptor binding mechanism allows enhanced infection and host-cell tropism [
31]. To test the capacity of the SARS-CoV-2 spike protein to engage host-cell sialic acid receptors, we selectively docked 5-N-acetyl neuraminic acid (Neu5Ac), α2,3-sialyl-N-acetyl-lactosamine (2,3-SLN), α2,6-sialyl-N-acetyl-lactosamine (2,6-SLN), 5-N-glycolyl neuraminic acid (Neu5Gc), and sialyl Lewis
X (sLeX) (
Figure 2) on to the modeled structure of SARS-CoV-2 NTD. The selected sialosides represent a large family of more than 500 human sialosides and have been previously shown to bind with the S1A domain of MERS-CoV [
11]. The amino acid residues Leu18-Gln23, His66-Thr78 of the β4-β5 loop, and Gly252-Ser254 of β14-β15 loop forms the sialoside-binding site in SARS-CoV-2 spike protein (
Figure 2 and
Figure 3). Recently, Milanetti et al. also predicted a sialoside-binding pocket in the NTD of SARS-CoV-2 by surface iso-electron density mapping, further supporting our findings [
32]. While the β4-β5 loop is involved in the engagement with all tested sialosides, the β14-β15 loop is specific to larger sialoside such as sLex (
Figure 2 and
Figure 3). The predicted interacting sites of the tested sialosides are mapped in
Figure 3. The presence of key electrostatic and hydrophobic interactions with each of these sialosides suggests the possibility of a physiological interaction with the NTD domain of SARS-CoV-2. The computational binding affinity of the NTD of SARS-CoV-2 with diverse sialosides is compared and analyzed with that of MERS-CoV (
Supplementary Figure S2). For both SARS-CoV-2 and MERS-CoV, all sialosides were found to interact and localize to the sialoside-binding pocket in the NTD (
Supplementary Figures S2 and S3). These docking results are in strong agreement with the previously published cryo-EM structures of MERS-CoV bound with sialosides (
Supplementary Figure S3). [
11]. On the other hand, SARS-CoV displayed significant promiscuity and leaky binding with sialosides (
Supplementary Figure S2c,d), which was expected since SARS-CoV is not reported to bind with sialosides [
11,
17]. In particular, Neu5Ac and Neu5Gc occupied different regions within the SARS-CoV NTD suggesting a possible lack of selectivity (
Supplementary Figures S2c,d and S3c). Moreover, SARS-CoV displayed lower computational ligand-binding affinities to larger sialosides including 2,3-SLN, 2,6-SLN, and sLeX (
Supplementary Figure S2d). Overall, these results indicate the less affinity of SARS-CoV NTD over SARS-CoV-2 and MERS-CoV in engagement with host sialosides.
Molecular dynamics simulation of SARS-CoV-2 NTD-sialoside complexes highlights the flexibility of the β14-β15 loop (
Figure 2f). The superimposition of all produced SARS-CoV-2 NTD-sialoside complexes shows an outward movement of the β14-β15 loop, allowing the sialoside-binding site to accommodate larger sialosides such as sLex (
Figure 2f and
Supplementary Figure S2a). On the other hand, the spike protein of SARS-CoV features a shorter 9 amino acid β14-β15 loop (
Figure 1b), which offers reduced degrees of freedom, with a decreased capacity to engage host sialosides (
Supplementary Figure S2c,d). In addition, a single-turn alpha helix (Thr20-Leu24) formed key interactions with all sialosides tested (
Figure 3). Interestingly, the NTD of MERS-CoV (Gln37-Phe40) [
33] and SARS-CoV (Phe22-Val25) also display a single-turn helix suggesting its involvement in sialoside stabilization but not in their recruitment and selectivity (
Supplementary Figures S1a and S2). Although HCoV-OC43 spike glycoprotein lacks a single-turn helix, the same region interacts with sialosides [
34]. Taken together, these findings suggest that SARS-CoV-2 spike glycoprotein might have independently evolved to recognize sialosides using its longer NTD loop inserts.
In addition to interacting with host protein receptors, the NTD region of coronavirus spike glycoproteins has reciprocally evolved to recognize host sugars, including sialoside receptors [
35]. The spike glycoprotein of some beta coronaviruses such as the mouse hepatitis virus interacts exclusively with host protein receptors despite having a sugar-binding pocket in their NTD [
35]. Unlike the mouse hepatitis virus, the bovine coronavirus spike glycoprotein does not have a canonical protein receptor and binds solely with 5-N-acetyl-9-O-acetyl neuraminic acid (Neu5,9Ac2) to infect host cells [
31,
33]. Among human-infecting coronaviruses, some bind exclusively to host sialosides like HCoV-OC43 and HCoV-229E, whereas others like SARS-CoV utilize only protein receptors to infect host cells. Recent studies suggest that MERS-CoV utilizes a dual-receptor mechanism to infect host cells, where it binds with both human dipeptidyl peptidase-4 (DPP4) host protein receptor and host sialosides [
11]. Such unique evolutionary co-adaptation to amalgamate the recognition of both protein and sialoside receptors provide an additional bridging mechanism for effective virus attachment, cellular entry, and tropism of the virus.
Thus, the acquired ability of SARS-CoV-2 to engage diverse sialosides as reciprocal host-cell receptors might be the reason for its high infectivity with broad tissue tropism. Taken together, the differential distribution of sialosides in the respiratory tract and other organs, along with limited ACE2 expression in human airway epithelia [
9] would explain differential infectivity, transmission, and tropism shown by SARS-CoV-2 [
36]. In connotation to this, the recent preprint report suggests that the SARS-CoV-2 spike glycoprotein recognizes different siglecs (sialic acid-binding Ig-like lectins) and C-type lectins; suggesting that spike glycoprotein interacts in ACE2-independent infection pathways with the immune cells [
37]. Besides, the human ABO blood group and COVID-19 susceptibility [
38,
39] may relate to modulation of the sialosides distribution pattern on the target membrane, possibly regulating SARS-CoV-2 transmission and tropism [
40] despite high affinity with ACE2. When confirmed by biophysical and structural binding studies, the in silico structural analysis reported in this study will provide a basis for further research to explore the functional role of the reciprocal interaction of SARS-CoV-2 with host sialic acid during virus entry and spread.