The Spike Mutants Website: A Worldwide Used Resource against SARS-CoV-2

A large number of SARS-CoV-2 mutations in a short period of time has driven scientific research related to vaccines, new drugs, and antibodies to combat the new variants of the virus. Herein, we present a web portal containing the structural information, the tridimensional coordinates, and the molecular dynamics trajectories of the SARS-CoV-2 spike protein and its main variants. The Spike Mutants website can serve as a rapid online tool for investigating the impact of novel mutations on virus fitness. Taking into account the high variability of SARS-CoV-2, this application can help the scientific community when prioritizing molecules for experimental assays, thus, accelerating the identification of promising drug candidates for COVID-19 treatment. Below we describe the main features of the platform and illustrate the possible applications for speeding up the drug discovery process and hypothesize new effective strategies to overcome the recurrent mutations in SARS-CoV-2 genome.


Introduction
As of August 2022, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of COVID-19, accounted for more than 599 million infections and more than six million deaths worldwide (https://covid19.who.int, accessed on 1 October 2020). The incessant rise in the number of cases despite the development of vaccines and the resulting immunization process reflects the impact of new variants of SARS-CoV-2 globally. Indeed, the evolution of SARS-CoV-2 was caused by the acquisition of several mutations since the pandemic started, as reported in the GISAID database (https://www.gisaid.org, accessed on 1 October 2020), which collected more than one million SARS-CoV-2 sequences. It is important to point out that the selection of random mutations stands out as one of the main mechanisms of acquiring resistance and represents a relevant phenomenon in viruses that mutate at high frequencies. RNA viruses, for instance, have a mutation rate estimated at 10 −4 per nucleotide per replication [1,2]. In particular, the mutations were classified as variants of concern (VOCs), ; and variants of alert (VOA) (https://www.who.int/en/activities/tracking-SARS-CoV-2-variants, accessed on 1 October 2020) [3,4]. However, this classification is dynamic, based on the spread of different variants over time and their clinical significance. In fact, these variants were able to emerge at the same time in multiple locations that are independent of each other, and then they were no longer circulating after a variable period. Chronologically, we have witnessed the emergence of B.1.1.7 in the United Kingdom (UK) [5], then B.1.351 in South Africa [6], followed by P.1 in Brazil [7], and B.1.617 in India [8]. These new variants show multiple mutations on their spike (S) glycoprotein and spread rapidly across the globe, resulting in more virulence. In December 2020, the B.1.1.7 variant was identified in the southeastern United Kingdom (UK), hence the name "UK variant", and it is marked by 23 genetic mutations, compared to the original genome sequence that was first detected in Wuhan, China.
As previously reported, other VOCs have also been isolated in South Africa, India, and Brazil and have been investigated for their enhanced contagiousness and resistance to neutralization. B.1.351 or the Beta variant that emerged in South Africa, is characterized by 18 mutations and 3 amino acid deletions in the S glycoprotein [9]. P.1 variant, also known as Gamma, was detected in December 2020 and has spread in an accelerated way across Manaus, Brazil. Another variant, identified as B.1.617, has been isolated from Maharashtra, India and is subdivided into three sub-lineages, such as B.1.617.1, B.1.617.2, and B.1.617.3 [10,11]. In November 2021, the emergence of the new variant detected in Botswana and South Africa, termed Omicron, alarmed the community due to the high number of changes in the spike protein, the increased transmission efficiency, and its ability to escape from neutralizing antibodies. However, clinical studies have reported that the rapidly spreading Omicron variant was less dangerous than its predecessor, the Delta variant [12]. The main differences found in this variant are to be attributed to the N-terminal domain (NTD): deletion of V143, Y144, and Y145 residues and a 3-amino-acid insertion of "EPE" at position 214. However, accumulating evidence suggests high transmission fitness due to its higher affinity to ACE2 receptors and efficient immune evasion, compared to the other VOCs [13].
To date, thanks to sequencing, the identification of emerging SARS-CoV-2 variants and sets of mutations potentially linked to changes in viral properties has been achieved. Due to these possible changes, the understanding of the functional consequences of VOCs and VOIs needs to rapidly expand. To do this, much of this knowledge should concern the impact of these variants in terms of conformational changes in the S glycoprotein, which could enhance receptor recognition and affect the production of neutralizing antibodies.
Although assessing changes in the interaction between SARS-CoV-2 epitopes and antibodies is difficult, the binding affinity between mutated S glycoprotein and both Tcell receptor (TCR) and ACE2 receptor could be predicted in silico to test the immune response and infectivity, respectively. In this regard, computational methods hold the potential to understand resistance mechanisms, shedding light on the elusive link between novel mutations present in the S glycoprotein and its targets. These methods exploit the available structural information on protein-ligand complexes and structural modeling of point mutations in the protein structure [14]. Reported examples of the use of structurebased methods include: (i) the application of molecular docking to predict resistance or susceptibility of antiviral targets to different inhibitors [15,16]; (ii) the use of molecular dynamics simulations (MDs) to investigate the impact of mutations on enzyme stability and binding affinity on the receptor [17,18] or to highlight long-range altered communications in wild-type (WT) and mutated S glycoproteins; (iii) the use of computational mutation scanning protocols to extract insights on free energy and binding affinity changes, resulting from the active site and non-active site mutations [19]. Even though these methods are constantly adding new pieces to the puzzle and opening opportunities in the understanding of drug resistance, they suffer from various drawbacks, such as being time consuming and offering limited predictive accuracy. As a result of such limitations, the primary challenge facing structure-based drug resistance prediction is to achieve an acceptable balance between prediction accuracy and computational efficiency to become both reliable and fast tools to be used in the clinical context [17]. In fact, some of the most recent reports describe the use of machine learning strategies merging both sequence and structural data in an attempt to achieve such a balance [20,21]. In this perspective, several platforms have been developed to assemble sequencing data, provide information and tools to visualize the mutations present in the SARS-CoV-2 structures [22,23], and distribute a plethora of experimental data and bioinformatic tools through the European COVID-19 Data Platform (https://www.covid19dataportal.org/, accessed on 1 October 2020) for different purposes. Another SARS-CoV-2 immuno-analytics platform has been developed to visualize multidimensional data to inform target selection in immunological research by combining genomic and whole-proteome analyses with in silico epitope predictions [24,25]. Recently, in the context of drug discovery, user-friendly platforms were designed to identify promising drugs and targets for COVID-19, such as COVID19db [26] and COVID Moonshot [27]. Moreover, the EU-funded project, Exscalate4CoV (https: //www.exscalate4cov.eu/, accessed on 1 October 2020), released additional web services, such as https://viralseq.exscalate4cov.eu/ and https://mediate.exscalate4cov.eu/ (accessed on 1 October 2020), to support the scientific community. The majority of created platforms collect data present in the literature and integrate databases that consider the SARS-CoV-2 targets, their mutations, and related clinical and immunological information. Unlike known platforms, we created an open access portal that provides original structural insights related to the SARS-CoV-2 spike variants. Our platform offers these mutated structures already modeled and previously equilibrated through molecular dynamic simulations, saving time and effort in the SARS-CoV-2 field of research. Therefore, we developed the Spike Mutants website (https://spikemutants.exscalate4cov.eu/, accessed on 1 October 2020), which contains the tridimensional coordinates of the mutated structures of the known VOCs and VOIs and their MDs trajectories. With the support of Exscalate4CoV, this platform acts as a solid starting point for more advanced studies, taking into account the impact of the mutations on the drug discovery process and the identification of effective strategies to plan new drugs and vaccines. Herein, we describe the main features of the portal and illustrate the possible applications for the scientific community.

Results
Unfortunately, with the rapid speed at which mutations occur in the S-protein, the crystal experimental structure of new variants is not feasible in a short period of time. In this context, computational models of the new variants can be developed for structural stability studies, design of new drugs, and baseline studies for the development of new vaccines, among others.
However, developing a realistic model of the new variants is a process that demands not only human time, but also computational time.
For a better understanding of the complexity of building the models of the new variants, we detail some of the steps of our workflow: (1) analysis of the sequences of the new variants in order to determine which mutations are the most significant to be modeled; (2) homology modeling for the construction of a first 3D system based on the crystallographic structure of WT; (3) insertion of the glycans to the model in the correct position; (4) construction of the topology taking into account the correct description of the glycans force field and the new glycan-protein bonds that should be inserted into the new models; (5) MD system setup and simulations; (6) MD analysis.
The complexity of this workflow is mainly due to the difficulty of inserting glycans covalently attached to the S protein and also to the equilibrium of the mutated systems, especially in the regions of insertion and deletion where secondary structures can be formed or undone.
For in silico experiments, an important aspect, which must not be neglected in the building of the 3D molecular model of the S glycoprotein of SARS-CoV-2, is the presence of numerous glycans on the surface of the protein. More specifically, the S glycoprotein of SARS-CoV-2 shows 18 N-glycosylation sequons [28] per monomer, two O-glycosylation [29] sites in the "head" (comprising S1/S2 subunits, residues 27-1140), and another four Nglycosylation sequons on the "stalk" portion of the protein (residues 1141-1234). Glycans not only determine the correct folding of the S glycoprotein but also shields the epitopes from antibody recognition [30], thus, modulating the structural and dynamical properties of protein [31]. In particular, the glycans on the "head" region of S glycoprotein are involved in the modulation of dynamics of the opening and closure of the protein [32][33][34] and in its capability to bind the ACE2 receptor.
All these modeled structures were modeled and equilibrated through MDs and were collected in an open access portal to make these structures were available for the scientific community in order to represent a solid starting point for more advanced studies.
The main page of the Spike Mutants portal (https://spikemutants.exscalate4cov.eu/, accessed on 1 October 2020) allows downloading the 3D structure of the spike trimeric protein in glycosylated form, as previously described [34], with its corresponding MD trajectory 100 ns long (see Figure 1) in PDB and GROMACS xtc format, respectively. field and the new glycan-protein bonds that should be inserted into the new models; (5) MD system setup and simulations; (6) MD analysis.
The complexity of this workflow is mainly due to the difficulty of inserting glycans covalently attached to the S protein and also to the equilibrium of the mutated systems, especially in the regions of insertion and deletion where secondary structures can be formed or undone.
For in silico experiments, an important aspect, which must not be neglected in the building of the 3D molecular model of the S glycoprotein of SARS-CoV-2, is the presence of numerous glycans on the surface of the protein. More specifically, the S glycoprotein of SARS-CoV-2 shows 18 N-glycosylation sequons [28] per monomer, two O-glycosylation [29] sites in the "head" (comprising S1/S2 subunits, residues 27-1140), and another four N-glycosylation sequons on the "stalk" portion of the protein (residues 1141-1234). Glycans not only determine the correct folding of the S glycoprotein but also shields the epitopes from antibody recognition [30], thus, modulating the structural and dynamical properties of protein [31]. In particular, the glycans on the "head" region of S glycoprotein are involved in the modulation of dynamics of the opening and closure of the protein [32][33][34] and in its capability to bind the ACE2 receptor.
All these modeled structures were modeled and equilibrated through MDs and were collected in an open access portal to make these structures were available for the scientific community in order to represent a solid starting point for more advanced studies.
The main page of the Spike Mutants portal (https://spikemutants.exscalate4cov.eu/, accessed on 1 October 2020) allows downloading the 3D structure of the spike trimeric protein in glycosylated form, as previously described [34], with its corresponding MD trajectory 100 ns long (see Figure 1) in PDB and GROMACS xtc format, respectively. The Spike Mutants website distributes 3D structures of SARS-CoV-2 S unglycosylated variants in three sections named "Variants of Concern", "Variants of Interest", and "Variants under Monitoring" (see Figure 2A-C, respectively) from which PDB structures can be freely downloaded. Figure 1. Snapshot of the upper portion of the Spike Mutants portal from where the 3D structure of the spike trimeric protein in glycosylated form and its corresponding MD trajectory 100 ns long can be freely downloaded.
The Spike Mutants website distributes 3D structures of SARS-CoV-2 S unglycosylated variants in three sections named "Variants of Concern", "Variants of Interest", and "Variants under Monitoring" (see Figure 2A-C, respectively) from which PDB structures can be freely downloaded. On our website, it is also possible to visualize a graphical representation of the phylogenetic tree using the maximum likelihood method based on whole genome sequences of SARS-CoV-2 (Figures 3 and S1) and a video of a 3D model of the S protein in interaction with the human receptor ACE2 embedded in a membrane bilayer ( Figure 2D). On our website, it is also possible to visualize a graphical representation of the phylogenetic tree using the maximum likelihood method based on whole genome sequences of SARS-CoV-2 ( Figure 3 and Figure S1) and a video of a 3D model of the S protein in interaction with the human receptor ACE2 embedded in a membrane bilayer ( Figure 2D).  The last section of the Spike Mutants website distributes 3D models of the main SARS-CoV-2 S variants in glycosylated form, accompanied by their related MD trajectories (see Figure 4). All major WHO-labeled variants are present, highlighted in colored boxes. Their complete list is reported in Table 1 with "WHO Label", lineage, and a list of mutations for each variant. In the same website section, additional sub-variants are distributed, highlighted with white boxes (Figure 4). The last section of the Spike Mutants website distributes 3D models of the main SARS-CoV-2 S variants in glycosylated form, accompanied by their related MD trajectories (see Figure 4). All major WHO-labeled variants are present, highlighted in colored boxes. Their complete list is reported in Table 1 with "WHO Label", lineage, and a list of mutations for each variant. In the same website section, additional sub-variants are distributed, highlighted with white boxes (Figure 4).   Note that the first residue in the models is number 27 and the higher is 1146, not considering the glycosyl groups. Therefore, the variants outside this range are reported only for the sake of completeness. * Not a WHO Label.
The modeling and simulation protocol is described in Materials and Methods.
Together with the 3D model, through the web portal there is also a short MD trajectory available, 100 ns long, for each analyzed variant. Even though a better understanding of the structural and dynamic perturbation of the system is obtained with longer simulations (see results with one microsecond reported in recent works) [34][35][36], we found that interesting features can be inferred by them.
The root main square fluctuations (RMSF) plot for some of the mutants in comparison to WT shows several examples ( Figure 5).
(i) The monomer in the UP position (monomer 2, red in Figure 1) has higher fluctuations in a larger portion of its RBD in all mutants, as compared to WT, but, in particular, in some systems: i.e., Kappa, Alpha, Iota1, Iota2, and Epsilon (panel A, B, G, H and I, respectively); (ii) Some of the mutants have higher RMSF peaks (higher than 10 Å) than WT and other mutants: Iota1, and Epsilon >12 Å; Iota2 > 10 Å (note that the y-axis has a different scale in panel G-I); (iii) All mutants have higher fluctuations in the 834-853 region, namely the fusion-peptide proximal region (FPPR), as compared to WT and nearly all in monomer 3 (green in Figure 1), with the exception of Ihu (where monomer 2 and monomer 3 have comparable RMSF) and Epsilon (where monomer 2 has higher RMSF); (iv) Several mutants have a higher RMSF peak at residue 681, at the boundary of S1 and S2, as compared to WT: i.e., Kappa, Ihu, Delta, and Iota1. In the context of the EXSCALATE4COV (E4C) (www.exscalate4cov.eu) project, the Spike Mutants website was first made available in January 2021 as a public resource. From then, 1575 researchers from all over the world (see Figure 6A) registered on the website and downloaded the distributed structures and MD trajectories, with the USA being the country with the largest number of users ( Figure 6B). In the context of the EXSCALATE4COV (E4C) (www.exscalate4cov.eu, accessed on 1 October 2020) project, the Spike Mutants website was first made available in January 2021 as a public resource. From then, 1575 researchers from all over the world (see Figure 6A) registered on the website and downloaded the distributed structures and MD trajectories, with the USA being the country with the largest number of users ( Figure 6B).

PEER
Regarding the main research objectives of the registered users, drug repurposing is the first subject, with 46% of users, while the vaccine is the second, with 20.9% of users (see Figure 6C). The reported numbers are constantly increasing. Regarding the main research objectives of the registered users, drug repurposing is the first subject, with 46% of users, while the vaccine is the second, with 20.9% of users (see Figure 6C). The reported numbers are constantly increasing.

Discussion
With drug development being a series of crucial decisions to be considered to select the right target and binding site and to analyze the mutational pattern, it seems evident that computational chemistry can support each of those evaluations by capturing the value of simulation data into useful predictive models. In the context of COVID-19, the SARS-CoV-2 proteome has been deeply studied. One just has to think of all the tridimensional structures deposited in the Protein Data Bank, thus, discovering its complex mechanism and seeing the binding interactions with different chemical entities to accelerate the drug discovery process. Instead, regarding the virus' genetic variability, the rate of onset of mutations on SARS-CoV-2 proteins makes it almost impossible to keep abreast. The emergence of new variants, as it is known, represents a major threat to public health. Indeed, they could compromise the therapeutic efficacy, developing resistance to antibodies and drugs [37]. Thus, the goal of this new platform is to provide a basis for further studies on how variants may influence the pandemic. Indeed, the scientific community could increase their knowledge in the fields of vaccines and antibodies by starting with the prepared models on our website and speeding up the drug repurposing and drug design of novel chemical entity processes. As evidence, this resource has recently been used to obtain effective results in the fields of neutralizing antibodies [38], studying the interaction

Discussion
With drug development being a series of crucial decisions to be considered to select the right target and binding site and to analyze the mutational pattern, it seems evident that computational chemistry can support each of those evaluations by capturing the value of simulation data into useful predictive models. In the context of COVID-19, the SARS-CoV-2 proteome has been deeply studied. One just has to think of all the tridimensional structures deposited in the Protein Data Bank, thus, discovering its complex mechanism and seeing the binding interactions with different chemical entities to accelerate the drug discovery process. Instead, regarding the virus' genetic variability, the rate of onset of mutations on SARS-CoV-2 proteins makes it almost impossible to keep abreast. The emergence of new variants, as it is known, represents a major threat to public health. Indeed, they could compromise the therapeutic efficacy, developing resistance to antibodies and drugs [37]. Thus, the goal of this new platform is to provide a basis for further studies on how variants may influence the pandemic. Indeed, the scientific community could increase their knowledge in the fields of vaccines and antibodies by starting with the prepared models on our website and speeding up the drug repurposing and drug design of novel chemical entity processes. As evidence, this resource has recently been used to obtain effective results in the fields of neutralizing antibodies [38], studying the interaction between spike and aptamers [39], and providing important hints about spike protein mutant variants' heparin interactions for potential resistance [40].
The Spike Mutants website provides a fast and easy way to understand the impact of the mutations on the binding affinity to human receptors and antibodies, aiming to enhance novel therapeutic and preventive solutions. Given the mutational profile's fast pace of growth in the SARS-CoV-2 spike, we will promptly update the platform.

Data Storage and Web Implementation
The Spike Mutants website was developed to ensure sharing of our results. The website is based on the common LAMP software stack (Linux, Apache, MariaDB, PHP). The frontend interface was built on bootstrap 4, jQuery, and HTML5 doctype. The embedded survey service is SurveyJS.

Modeling of S Glycoproteins
The 3D protein models of the WT and the mutated spike were initially built by homology modeling using the web tool SWISS-MODEL [35]. The reference sequence used to align and model the missing atoms of the pre-fusion form of the cryo-EM structure (PDB ID: 6VYB) [36] was NCBI YP_009724390.1 (UniProt: P0DTC2 SPIKE_SARS2). In this structure, the RBD of the second monomer was in the up conformation, also known as open conformation, which favors the interaction with the ACE2 receptor. The full protocol of the WT model generation was described by Tagliamonte et al. [29]. For the variants, the mutated sequences taken into account were in accordance with the WHO classification and were reported in Table 1.

Mutant Modeling
The glycosylation of N-and O-sites of the WT and the variants (residues 27-1146) was built by using the glycoprotein builder available at GLYCAM-Web (www.glycam.org), as described by Borocci et al. [34]. Asymmetric glycosylation of the asparagine residue of the three monomers of the WT and the variant protein was used, as reported in Tables S1-S3. The presence of steric clashes between atoms of protein and glycans was removed by changing the dihedrals of the N-glycosidic bond.

Glycosylation of S Proteins
The modeled spike glycoprotein structure of the WT and the variants (residues 27-1146) contains N-glycosylation [28] and O-glycans [29] sites and was constructed, as described by Borocci et al. [34]. Tables S1-S3 detail the glycans that are bounded with the spike amino acids.

Molecular Dynamics Simulation and Analyses
The full MD protocol is described in previous work [34]. Briefly, the forcefields Amber14SB [41], GLYCAM06 [42], and TIP3P [43] were employed to model the protein, the glycans, and the water molecules, respectively. The ACPYPE script [44] was used to convert the topology into the GROMACS format. All the MD simulations were run with GROMACS 2020.6 package [45]. The P-LINCS [46,47] algorithm was used to restrain h-bonds and the SETTLE [48] algorithm was employed to constrain the water molecules. The particle mesh Ewald method (PME) [49] was used to model the short-range interactions. The systems were firstly minimized in vacuum, applying harmonic potential restraints of 1000 kJ mol −1 nm −2 to the protein atoms. Then, cubic simulation boxes were created and neutralized. Unrestrained MD simulations were performed with a time step of 2 fs and with a length of 100 ns for each system. The 300 K temperature and the 1 bar pressure were controlled, respectively, by V-rescale [50] and Parrinello-Rahman barostat algorithms [51]. For the RMSF analyses, which were performed by the GROMACS RMSF tool, the first 5 ns of each trajectory were excluded. For the variants with deletion, dummy residues with null fluctuations were inserted in the RMSF plot to preserve the WT residue numbering. All the simulations and analyses were performed on the supercomputer Marconi-100, CINECA, Bologna, Italy.