Proteomic Approach for Comparative Analysis of the Spike Protein of SARS-CoV-2 Omicron (B.1.1.529) Variant and Other Pango Lineages

The novel SARS-CoV-2 variant, Omicron (B.1.1.529), is being testified, and the WHO has characterized Omicron as a variant of concern due to its higher transmissibility and very contagious behavior, immunization breakthrough cases. Here, the comparative proteomic study has been conducted on spike-protein, hACE2 of five lineages (α, β, δ, γ and Omicron. The docking was performed on spike protein- hACE-2 protein using HADDOCK, and PRODIGY was used to analyze the binding energy affinity using a reduced Haddock score. Followed by superimposition in different variant-based protein structures and calculated the esteem root mean square deviation (RMSD). This study reveals that Omicron was seen generating a monophyletic clade. Further, as α variant is the principal advanced strain after Wuhan SARS-CoV-2, and that is the reason it was showing the least likeness rate with the Omicron and connoting Omicron has developed of late with the extreme number of mutations. α variant has shown the highest binding affinity with hACE2, followed by β strain, and followed with γ. Omicron showed a penultimate binding relationship, while the δ variant was seen as having the least binding affinity. This proteomic basis in silico analysis of variable spike proteins of variants will impart light on the development of vaccines and the identification of mutations occurring in the upcoming variants.


Introduction
The SARS-CoV-2 virus is the causative agent of COVID-19, which consists of singlestranded RNA and is enveloped with proteins [1]; it was defined as a global pandemic disease by WHO in 2019 and is a virus that causes respiratory tract and gastrointestinal infections in humans (host) [2].Due to the high mutation rate, the variants in the virus were characterized as Variants of Concern and Variants of Interest.Based on genetic changes in spike protein, the Variant of Concern includes five major variants α SARS-CoV-2, β SARS-CoV-2, γ SARS-CoV-2, δ SARS-CoV-2 and Omicron strain (B.1.1.529)[3].Omicron SARS-CoV-2 strain was highly responsible for pandemic conditions around the globe.It was more widely mutated in spike genes than any other previous strains [4]; these mutations in the spike gene directly influence the structure and function of spike protein and cause an aggressive stage of the disease.Because spike proteins are responsible for host-pathogen interaction [5].The spike protein in SARS-CoV-2 has two subunits (S1 and S2).S1 contains a receptor binding domain (RBD) on the N terminus that serves to bind with the receptor.At the same time, S2 has a fusion peptide with two heptad-repeat domains (HR1, HR2) on C-terminus whose function is to enter and destabilize the host cell membrane [6,7].The host cell has receptors such as hACE2(Antagonistic converting enzyme), C-type lectins, TIM1 (T cell immunoglobulin mucin domain-1), TAM (Targeting Tyro3, Axl, and MerTK), AXL (Anexeletkto), CD147 (Cluster of differentiation 147) and TMPRSS-2 (Transmembrane protease, serine 2.) which aggravate the entry of SARS-CoV-2 [8].hACE2 is an enzyme that occurs on the cell membrane type II alveolar cells (lungs), enterocytes (small intestine), and endothelial cells (arteries and veins) and serves as host cell membrane receptor and primary target for SARS-CoV-2 [9,10].The interaction between RBD of the S1 protein and hACE2 is the early stage of SARS-CoV-2 infection in the host.In this interaction, 20 residues of hACE2 and 17 residues of RBD result in the formation of a hydrophilic side-chain interaction [11].Thirty mutations, 15 of which occur in the receptor-binding domain, as well as three tiny deletions and one minor insertion, dictate the spike protein's variation [12].In this present investigation, using different In-silico tools, we identified the variability in sequence, structure, mutational study and pathogenicity of spike protein (Omicron) with the existing strains of SARS-CoV-2.Comparison of transmissibility with the host cell, which is resulted by the interaction between humanACE2 and spike protein (α coronavirus, β coronavirus γ coronavirus, and δ coronavirus and omicron coronavirus were identified by molecular docking.Our current study would open new avenues for identifying unpredicted mutations responsible for host-pathogen interaction [13].

Determination of Physicochemical Properties
The physical and chemical characteristics, such as molecular weight, several amino acids, aliphatic index, theoretical pI, instability index, and grand average of hydropathy (GRAVY) [20] of the SARS-CoV-2 and other variants spike proteins, were computed through Expasy ProtParam tool (https://web.expasy.org/protparam,accessed on 21 August 2022).

Phylogenetic Tree Construction and Primary Amino Acid Sequence Alignment
The α, β, δ, γ, and omicron SARS-CoV-2 spike protein sequences were retrieved in FASTA format from Protein Data Bank.Studies of the mutation in spike protein and increase in viral transmissibility were inferred by the evolutionary link of spike protein sequences through the phylogenetic tree [24].Multiple sequence alignment has been done by using the MUSCLE approach with 1000 bootstrap and distance-based neighbor-joining (NJ) based phylogenetic tree construction for protein sequences generated in Molecular Evolutionary Genetics Analysis (MEGA-X) [25].

Comparative Analysis of the Secondary and Tertiary Structure of Omicron
The GOR (Garnier-Osguthorpe-Robson) tool employs information theory and Bayesian statistics for secondary protein structure analysis.The GOR IV was used to predict secondary structure α, β, γ, δ, and omicron variants [26,27].Protein tertiary structure prediction has done using PDB templates for Omicron, α, β,γ, and δ SuperPose10.1 webserver (http://superpose.wishartlab.com,accessed on 25 August 2022) based on the eigenvalue matrix was used to analyze the pairwise structure alignment.SuperPose used a modified quaternion eigenvalue technique [28].SuperPose is used to measure the maximum deviation in tertiary structures, RMSD data, as well as difference distance charts and values of the molecules superimposed, which are in numerical form.The technique of orienting an item until it can be immediately placed on top of another object is known as superposition or superimposition [29].

Protein-Protein Interactions
Protein-protein docking was performed between spike proteins of α, β, δ, γ, Omicron, and hACE2 with the help of the HADDOCK v2.4 server [30,31].For docking purposes, blind docking was performed between variants and hACE2.For docking we have used folloing input parameters (Supplementary Table S2).In total, five docking runs were executed, and for every run, 10 clusters of four poses each were generated through the HADDOCK server.A further cluster with the least HADDOCK score was selected for their binding energy study via PRODIGY for all five docking runs.PRODIGY (PROtein BinDIng enerGY prediction) (https://wenmr.science.uu.nl/prodigy, accessed on 26 August 2022) is a set of online services aimed at predicting binding affinity in biological complexes and identifying biological interfaces based on crystallographic data [32].Finally, the interacting residues of both chains, salt bridges, H-bonding between residues of two chains, and nonbonded interactions were calculated through PDBsum (http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum, accessed on 26 August 2022).PDBsum is a visual database that shows the components within each three-dimensional structure deposited in the Protein Data Bank at a glance (PDB) [33].

Physical Parameters of Proteins
In comparison to α, β, δ, γ and Omicron have the highest number of amino acids, 1116, 1258, 1261, 1256 and 1285, respectively.pI (isoelectric point) is the measure of pH at which the net charge of the surface is zero.As the pI of α (5.66) is far less than 7 indicates more acidic compared to the pI of Omicron, which is nearby 7, which is 6.63.Research shows that an instability index of less than 40 predicts that protein structure is stable; all variant II indicates that the spike protein shares stability.Aliphatic Index indicates the aliphatic amino acid present on the side chain of the concerned protein.A high AI indicates more thermal stability data indicating α is maximally thermostable, and Omicron is least thermostable compared to other variants.GRAVY indicates hydropathicity; the lower the score, the more would have an affinity toward the water; γ and Omicron show a stronger affinity towards water compared to others.

Prediction of Immune Properties
Exposed B cell epitope, which plays a vital role in antigen portion binding to the immunoglobulin interaction, varied from 33 to 40 (Table 1).Among all the variants, α and Omicron variants show equal scores for protective antigen (0.4646) and antigenicity (0.717053) (Table 1).Prediction for C-cell epitope ranged from 35 to 38.The Immunogenicity prediction scores for the spike protein variant are varied.The Omicron variant shows the highest number (27) of strong binders in T cells that extrapolate into an immunogenicity score of 0.49637, which means the omicron variant has more virulence transmissibility than other variants.

Comparative Sequences and Phylogenetic Analysis of Omicron Spike Protein
The branch length specifies genetic change, i.e., the extended branch and the additional genetic change (divergence) have happened.Omicron forms a sister group with β, showing the maximum divergence from α. Omicron is most diverged from other variants and has evolved lately with the greatest number of mutations.The percentage similarity between α and Omicron was (33.2%), β having a percentage similarity with Omicron of (94.9%), and δ when and γ were compared with Omicron, did not show much more significant differences in similarity.The percentage similarity between δ and Omicron is (95.2%), and between γ and Omicron is (95.1%)(Figure 1A,B).The amino acid substitution in Omicron compared to α, β, δ and γ variants were described in Supplementary Table S1.

Secondary and Tertiary Structure Analysis
Omicron spike protein has a 3.64%, 1.46%, 1.97% and 1.5% lower fraction of α-helix structure compared with spike proteins of α, β, δ, and γ.Spike protein of α has a higher extended strand of 5.88%, 6.32%, 5.05%, and6.3% compared with spike protein of β, δ, γ and Omicron, respectively.The Spike protein of Omicron has the highest deviation with α spike protein, around 9.93%, and the lowest deviation with β spike protein was about 1.97% (Table 2)."SuperPose 10.1" is used to measure the maximum deviation in the tertiary structure of spike protein of Omicron with novel SARS-CoV-2 variants (Figure 2); these interactions provide the RMSD value of α-carbon, backbone and heavy chains in both local and global forms.An RMSD value in Angstrom, which represents measured RMSD between the superposed molecules, is one of the seven forms of output produced by SuperPose; this RMSD value is shown in two forms chain-wise and a whole structure in both forms, local and global (Tables 3 and 4).
RMSD is mainly used for quantitative measurement (in angstrom) of the similarity between two superimposed atomic coordinates.As per the result, α-omicron (PDB_ID: 7CYD-7T9J) has the highest RMSD score between α-carbon around 2.785Å, backbone having 2.783Å and a heavy molecule having an RMSD score of 2.903Å; these high RMSD values denoted that the spike protein of Omicron has distinguished from the spike protein of α variant.

Proteome-Based Mutational Analysis of Spike Protein Domains
The earlier data suggest that there was structural variation in the spike protein of the SARS-CoV-2 virus, and the spike protein of the α variant was highly deviated compared to the Omicron variant.Mutational analysis altered the amino acid in the spike protein's domains (RBD and NTD).Alterations in the amino acid sequence of the RBD can significantly affect S binding affinity for hACE2 and, ultimately, SARS-CoV-2 infectivity.Although mutations occur throughout this region, direct interactions with potential ligands are still feasible because most of the mutations in this area are found on the surface of S [34] Figure 3. Deep mutational studies are being conducted to determine whether single-site mutations affect the hACE2 affinity in this region; these results might be contrasted with emerging concerns as of March 2022; the significant change in the NTD domain of Omicron in comparison to other variants (T94I, G141D, and A66V) is found in the beta variant (A79D) and in the gamma variant (Y138D).The major changes in the RBD domain of Omicron in comparison to other variants are (G337D, S371L, S373P, S374F, N440K, G446S, and S417N); these results suggest that the key mechanism driving the positive selection of mutations within the RBD is not the host receptor's binding affinity for S. Furthermore, the majority of mutations in this area modify the RBD's charge or hydrophobicity, greatly increasing the likelihood that the antibody may escape through altered epitope affinities or regional conformational changes that reduce epitope accessibility.Numerous factors, including widespread common mutations throughout the NTD subdomain, contribute to the positive selection of variants carrying mutations in the NTD of SARS-CoV-2 S.Although the NTD is the target of 35% of SARS-CoV-2 antibodies, only around one-third of these antibodies have a neutralizing impact [35].

Protein-Protein Interaction Analysis: (Spike-SARS-CoV-2)-hACE2
The binding of SARS-CoV-2 to the host receptor is a key factor in infectivity, transmission, and pathogenesis, hence alteration in the structure of the spike protein (NTD and RBD) domain during the evolution of the virus would have a significant impact on these processes.Using HADDOCK 2.4, protein-protein docking was executed between spike proteins of α, β, δ, γ, and Omicron, with human hACE2 (hACE2).The binding affinity was calculated through PRODIGY.The results in HADDOCK displayed the 10 best clusters, and the one with the lowest HADDOCK score was taken into account to calculate the binding affinity.As per the result, the Omicron variant shows the highest HADDOCK score and binding affinity (Table 5) compared to other variants.Further, the interaction analysis was done through PDBsum taking the cluster mentioned above for different dockings.It was observed in this study that in comparison to other variants of SARS-CoV-2.In Omicron, spike protein found 32 hydrogen interactions involving N417, Y449, Y453, L455 and N487 residues with hACE2.Additionally, the number of salt bridges increased from one to three when the RBDs of Omicron spikes protein bind with hACE2.Majorly the N501Y alterations, which were previously reported for the α variation, also boosted the binding affinity for the Omicron variant because the number of hydrogen bonds and Pi-Cation link were increased (Tables 6 and 7).In addition, it was observed that mutations enhanced the binding affinity between the receptor-binding domain of spike protein and hACE2, which further elucidated the mutational changes in the RBD domain and increased the pathogenesis and transmission of the Omicron variant.Table 6.List of interactive residues of spike RBD residue of different variants of SARS-CoV-2 and hACE2 residues.Table 7. Protein-protein docking of α-hACE2, β-hACE2, δ-hACE2, γ-hACE2 and Omicron-hACE2 interaction analysis through PDBsum showing the number of H-bonding and Salt bridges.

Discussion
There are various proteomics techniques available for the identification and which enable the study of the interaction between host proteins and virus spike proteins, to understand evolutionary lineages.Proteomics can be used to understand intricate SARS-CoV-2 interaction with the host cell.In this study, different types of computational approaches are used to compare different types of SARS-CoV-2 variants (α, β, γ, δ and Omicron) based on sequence, physiochemical properties, structure, and how they alter the interaction with host receptor protein hACE2.Different variants of SARS-CoV-2 show remarkable scores in terms of immunogenicity and antigenicity.Especially omicron variant showed high antigenicity and low exposed B-cell epitopes, which denote the strongest bonding with an epitope and indicate the highest transmissibility.As per earlier research, phylogenetic relationships are established between Omicron with other variants based on the distance matrix [36].Using the UPGMA algorithm, the mapping of variable strains at different branches was generated per the rules of phylogenetic preparation [37].This study established an inference that Omicron shares a monophyletic clade [38].The sequence variation or the mutation rate establishes an omicron variant dissimilar to the α variant as analyzed by polyphyletic classification based on the Neighbor-Joining methodology (MEGA-X) [39]; this establishes a probability about the rate of single nucleotide polymorphism, which directly causes a change in sequence, structure, and function of omicron variants.After the determination of position in the phylogenetic tree, analysis of functional variability among proteomes of different variants by computational methods shows a change in the surface charge in omicron spike protein compared to other variants due to mutation, which directly results in the increment of hydrophobic residues; this increase enhances the stability of the omicron protein core [40,41], while the change in the amino acids of the omicron RBD region of spike protein in comparison to other lineages affects the immune response and also the vaccine (Ab) interaction [42,43].The transition from Proline-603 to lysine-730, aspartic acid 655 to valine 782, aspartic acid 669 to glycine 796, and many more in Omicron in comparison to α increase the positive charge and hydrophobicity, which improves its binding with hACE2(due to negative charge of protein) and stability [44].Secondary structure analysis displays an increase in an α helix as compared to a δ variant; a greater α helix provides conformation stability which enhances the transmissibility in the host [45].Variation in the secondary structure directly correlated with the tertiary structure, which consists of variable domains regulating its binding with hACE2.Spike protein consists of the following domains distributed according to different positions of amino acids such as 14-305 residues (N terminal domain), 306-330 residues (C terminal domain), and 331-527 (Receptor binding domain).686-815(S1/S2 cleavage),816-911(fusion peptide), 912-984 (heptad repeat), 1035-1147(Connector domain) [46].Compared to the omicron structure, in α-coronavirus spike protein, there is a reduction in α helix in NTD and RBD domain, while in βcoronavirus, in RBD less α helix are present, γ structure is similar, but δ RBD consist earlier omicron strain.Earlier research signifies the importance of protein-protein interaction as spike protein molecular interaction with hACE2 for the access of the virus into the host cell [47].The substitution in amino acids present in the spike protein RBD domain in different strains from α, β, γ, δ, and Omicron due to mutation increases the transmissibility and infectivity of the virus.The change in amino acids in the RBD domain, such as Leu455, Phe486, Glu493, Ser494, and Asn501 alter the binding of SARS-CoV-2 with the host cell [48].The interaction study can study this dynamic nature through docking and binding energy.The greater the affinity of hACE2 and spike protein is dependent on the kD value; the smaller the value, the more affinity.Data suggest that Omicron has more affinity than β and γ, directly interrelating its infectivity.As we know, humoral immunity may not be as effective as T cells in preventing the emergence of new coronavirus infections.Other research has also shown that CD8+ T cells may often target a range of SARS-CoV-2 antigens and identify epitopes from different viral antigens through a series of combinations of T-cell receptors (TCRs), which are critical for viral clearance, long-term immunity, and memory for protection [49].The competence of CD8+ T cells to prevent secondary infection Because of its high specificity and ability to elicit a potent immune response, the SARS-CoV-2 spike protein has been put the focus of vaccine development [50].Particularly, the RBD region is frequently regarded as a crucial protein target for vaccine design and the creation of therapeutic neutralizing antibodies.In this study, T cell MHC class-1 epitopes predicted which can evaluate the affinity between peptide and MHC molecule, which can infer in future vaccine development.Then predicted qualitative affinity physical and chemical properties and further studied immunogenic peptides for vaccine designing.In this study, we focused on sequence changes that occurred in the spike protein of Omicron; those changes will affect the binding of protein-based vaccine, an explanation that was useful in vaccine development and designing.The first protein-based vaccine was Nuvax-ovidTM (NVX-CoV2373) (Novavax Inc., Gaithersburg, MD, USA), which comprises the full-length S protein and possesses common epitopes that may be able to protect against all SARS-CoV-2 virus strains [51].Anhui Zhifei Longcom/Chinese Academy of Medicine (ZF2001) (Anhui Zhifei Longcom, China), COVAXX/United Biomedical Inc. (UB-612), and Clover Biopharmaceuticals/GSK/Dynavax are three other examples of protein-based vaccinations (SCB-2019) [52].Protein identification, quantification, protein-protein interactions, protein changes, and localization can all be studied using proteomics' tools, and it is a part of proteomic complexity.Understanding the interaction between one protein from the SARS-CoV-2 virus (SPIKE) and one from Homo sapiens (ACE 2) opens new pavement to answer an unanswered question of protein complexity related to the interaction between viruses and humans.In summary, our analysis shows that in different corona cases, Omicron induced greater affinity with human Ace2 compared to non-Omicron SARS-CoV-2.

Conclusions
The comparative analysis of omicron spike protein based on the hydropathy index with other variants would open up new pavements in research.Simultaneously, the greater part of the mutations in the spike protein of the omicro-hACE2 interface appears to diminish hACE2 cooperation liking and may affect the binding interactions of upcoming variants; this is conceivably emerging from choice strain to work with invulnerable departure, as an impressive number of antibodies focus on a similar connection point.This study will also impart light on the developmental program of vaccines and the identification of mutations occurring in the spike protein of the upcoming variants.
Author Contributions: M.J., N.P. and D.G. performed the computational experiments.M.J., N.G., M.K.S. and P.K. analyzed the data.D.G. and N.P. wrote the manuscript.All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Figure 1 .
Figure 1.(A-D) Spike protein sequence analysis of α, β, δ, and γ variants concerning the omicron variant.Deletions, insertions, and mutations are marked with red blocks.(E) The phylogenetic tree was built by Molecular Evolutionary Genetics Analysis [MEGA-X].The visualization of Omicron with α, β, δ, and γ was done through MAFFT.Numeric value denoting bootstrap value.

Figure 2 .
Figure 2. Secondary structure alignment of the spike protein of Omicron with spike protein of α, β, δ and γ variants.(A) Spike protein of α variant's secondary structure, (B) spike protein of Omicron, (C) aligned secondary structure of α and omicron spike protein amino acids on 123 sites with 25 deletion sites.(D) Spike protein of β variant's secondary structure, (E) Spike protein of omicron variant's secondary structure, (F) aligned secondary structure of β and omicron spike protein amino acids on 25 sites with 12 deletion sites, (G) Spike protein of δ variant's secondary structure, (H) Spike protein of omicron variant's secondary structure, (I) aligned secondary structure of δ and omicron spike protein amino acids on 24 sites with 9 deletion sites, (J) Spike protein of γ variant's secondary structure, (K) Spike protein of omicron variant's secondary structure, (L) aligned secondary structure of γ and omicron spike protein amino acids on 15 sites with 25 deletion sites.

Figure 3 .
Figure 3. Comparative mutation analysis of NTD and RBD domain of omicron spike protein with respect to other pangolineages.(A) PDB structure of alpha variant spike protein (ChainA); (F) NTD domain of alpha variant spike protein depicting amino acid residue substitution in comparison to alpha variant NTD; (G) RBD domain of alpha variant spike protein depicting amino acid residue in contrast to alpha variant RBD, (B) PDB structure of beta variant spike protein (ChainA); (H) NTD domain of beta variant spike protein depicting amino acid residue substitution in comparison to alpha variant NTD, (I) RBD domain of beta variant spike protein depicting amino acid residue substitution in comparison to alpha variant RBD, (C) PDB structure of gamma variant spike protein (ChainA); (J) NTD domain of gamma variant spike protein depicting amino acid residue substitution in comparison to alpha variant NTD, (K) RBD domain of gamma variant spike protein depicting amino acid residue substitution in comparison to alpha variant RBD (D) PDB structure of delta variant spike protein (ChainA); (L) NTD domain of delta variant spike protein depicting amino acid residue substitution in comparison to alpha variant NTD, (M) RBD domain of delta variant spike protein representing amino acid residue substitution in comparison to alpha variant RBD; (E) PDB structure of Omicron variant spike protein (ChainA).

Table 1 .
Comparison of the immunological properties of the spike protein of SARS-CoV-2 variants.

Table 2 .
Deviation in the secondary and the tertiary structure of omicron spike protein compared to other SARS-CoV-2 variants through GORIV.

Table 3 .
The maximum deviation in the tertiary structure of omicron spike protein with novel SARS-CoV-2 variants.

Table 4 .
The maximum deviation in the tertiary structure of omicron spike protein's different chains with novel SARS-CoV-2 variants.

Table 5 .
The interaction analysis of spike protein of SARS-CoV-2 variants with hACE2 through PRODIGY.