Prediction of SARS-CoV-2 Omicron Variant Immunogenicity, Immune Escape and Pathogenicity, through the Analysis of Spike Protein-Specific Core Unique Peptides

The recently discovered Omicron variant of the SARS-CoV-2 coronavirus has raised a new, global, awareness. In this study, we identified the Core Unique Peptides (CrUPs) that reside exclusively in the Omicron variant of Spike protein and are absent from the human proteome, creating a new dataset of peptides named as SARS-CoV-2 CrUPs against the human proteome (C/H-CrUPs), and we analyzed their locations in comparison to the Alpha and Delta variants. In Omicron, 115 C/H-CrUPs were generated and 119 C/H-CrUPs were lost, almost four times as many compared to the other two variants. At the Receptor Binding Motif (RBM), 8 mutations were detected, resulting in the construction of 28 novel C/H-CrUPs. Most importantly, in the Omicron variant, new C/H-CrUPs carrying two or three mutant amino acids were produced, as a consequence of the accumulation of multiple mutations in the RBM. These C/H-CrUPs could not be recognized in any other viral Spike variant. Our findings indicated that the virus binding to the ACE2 receptor is facilitated by the herein identified C/H-CrUPs in contact point mutations and Spike cleavage sites, while the immunoregulatory NF9 peptide is not detectably affected. Thus, the Omicron variant could escape immune-system attack, while the strong viral binding to the ACE2 receptor leads to the highly efficient fusion of the virus to the target cell. However, the intact NF9 peptide suggests that Omicron exhibits reduced pathogenicity compared to the Delta variant.


Introduction
The SARS-CoV-2 virus has a high mutagenesis frequency, hitherto producing 63 different variants with 39 considered as the most predominant forms, including Delta, the dominant variant of the 4th pandemic wave [1]. Recently, a new variant, Omicron (B.1.1.529), was identified in South Africa. Omicron is characterized by 30 amino acid changes, three small deletions, and one small insertion in Spike protein, as compared to the original virus, with 15 of them residing in the Receptor Binding Domain (RBD) from 319 to 541 amino acid residues [2].
In our previous studies, we have defined as Unique Peptides (UPs) the peptides whose amino acid sequence appears only in one protein across a given proteome. We also introduced the term of Core Unique Peptides (CrUPs), which are the peptides with a minimum amino acid sequence length that appear only in one protein across a given proteome, thus having a unique signature for a particular protein identification [3]. Therefore, each peptide of any size that contains a CrUP is considered a UP. Peptides of bigger sizes than CrUPs being constructed by continuous CrUPs are considered as Composite Core Unique Peptides (CmUPs). Hitherto, our results regarding the analysis of CrUPs in different species and organisms strongly suggest that CrUPs constitute a concrete group of peptides within a given proteome, with specialized properties and functions Thereby, we have introduced the new term "Uniquome", which is defined as the total set of UPs belonging to a given proteome and serving as its unique molecular signature. Hence, to map the UP landscape of a proteome under examination, we have herein developed a novel and advanced bioinformatics tool, including big data analysis, and we have applied this tool for the analysis of Uniquome typifying all model organisms. In Homo sapiens, the analysis of the 20,430 reviewed proteins resulted in the identification of 7,263,888 CrUPs which construct the human Uniquome (hUniquome) ( [3] and Kontopodis et al., 2022 (manuscript in preparation)).
Most importantly, to elucidate SARS-CoV-2 virus-host organism interactions, we have further designed a novel bioinformatics platform to analyze the Core Unique Peptides (CrUPs) of the SARS-CoV-2 virus against the human proteome (C/H-CrUPs) [1]. C/H-CrUPs represent a completely new set of peptides, which are the shortest in length peptides in a viral proteome that do not exist in the human proteome [3]. Based on their properties, the viral C/H-CrUPs could advance our knowledge regarding virus-host interactions, immune system response(s), and infectiveness and pathogenicity of the virus. Moreover, most importantly, they can be used as antigenic and diagnostic peptides, and likely druggable targets for successful therapeutic treatments.
In the present study, we have identified, cataloged, and analyzed Omicron-specific C/H-CrUPs in order to illuminate the mechanisms controlling infectivity, immune escape, and pathogenicity of the new variant.

Methods
In our previous, recent studies, we developed a bioinformatics tool that can extract the Core Unique Peptides (CrUPs) from a given proteome, thus creating its Uniquome ( Figure S1) [1,3]. We have expanded this tool by introducing a new feature that can extract the CrUPs of each individual protein of a given proteome (target) versus the proteins of a reference proteome. This new feature, like the initial implementation, will split each protein in the target proteome to all possible peptides of length minimum (4 amino acids) to length maximum (100 amino acids), and search them against the reference proteome. Each search will exclude all peptides that contain a smaller peptide already identified as CrUP ( Figure S2).
For the present study, we have engaged this new feature of our tool. We created a "custom" proteome consisting of sequences from all variants of the SARS-CoV-2 Spike proteins and used it as the target versus the human proteome. The tool produces as output the C/H-CrUPs per protein of the target proteome, thus revealing the CrUPs for each Spike variant versus the human proteome.
Once we obtained the desired data, we ran a meta-analysis to identify how many C/H-CrUPs remained the same, or were added or lost on each variant versus the wildtype Spike protein. For this analysis, initially we took the identified C/H-CrUPs of the wild-type sequence and checked their presence against the respective C/H-CrUPs of the other variants. We only cared for the amino acid sequence and not the position this could be found within the protein. If the sequence was found, then we considered the peptide to be the same, otherwise we considered it to be lost on the examined variant. Next, we analyzed the identified C/H-CrUPs of each variant versus the wild-type sequence. If the peptide was detected only on the variant's C/H-CrUPs, then we considered it as added. This meta-analysis also provided us with the position of each C/H-CrUP within the Spike protein, which we used to determine the area (e.g., RBD, RBM and S-cleavage site, as obtained by the Stanford COVID-19 Database) they resided in.

Results and Discussion
3.1. Mapping the C/H-CrUPs Landscape of Spike Protein of the SARS-CoV-2 Omicron Variant SARS-CoV-2 virus seems to be highly mutated, so far producing more than 60 distinct variants. Hitherto, the highest pathogenic form is the Delta variant (B.1.617.2), with 10 different sub-variants. Recently, a novel variant called Omicron has been identified. It is characterized by 30 amino acid changes, three small deletions, and one small insertion in the Spike protein area, as compared to the wild-type viral respective sequence ( Figure S3) [2]. Out of these genetic changes, 15 reside in the Receptor Binding Domain (RBD) from amino acid position 318 to 541, and two are located around the S-cleavage site(s) ( Figure S3).
Advanced bioinformatics analysis of the Omicron variant Spike protein showed that it contains 983 C/H-CrUPs, a number that is comparable to the one of wild-type Spike proteins (987 C/H-CrUPs) and to the mean ± SD value of Spike protein-specific C/H-CrUPs (983 ± 2 C/H-CrUPs) ( Table 1). Omicron variant Spike protein contains 34 mutations in total, which is the highest number of identified mutations among all virus variants.  These mutations seem to have a dramatic effect on the Spike protein C/H-CrUPs map. Compared to the wild-type Spike sequence, we found that 115 (new) C/H-CrUPs were created and 119 C/H-CrUPs were lost, almost twice as many when compared to the Alpha variant (51 and 56 C/H-CrUPs, respectively), and almost four times as many, compared to the other variants ( Table 1). The distribution of these new C/H-CrUPs shows that the majority carry 6 amino acids in length ( Figure S4).

Omicron-Specific C/H-CrUPs That belong to the Receptor Binding Domain
SARS-CoV-2 belongs to the β coronavirus group, which uses the plasma membrane receptor of Angiotensin-Converting Enzyme 2 (ACE2) to recognize and bind to the target cell [4]. The viral Spike protein attaches to ACE2 receptor by a Receptor Binding Domain (RBD) defined from amino acid position F318 to F541 [4,5]. The amino acid residues from W436 to Q506 inside RBD shape the Receptor Binding Motif (RBM), which carries 11 contact positions with ACE2 [5]. The RBD region has received great attention, as it seems to be a major target of antibodies against the virus and other therapeutic interventions [6][7][8].
In the RBD region, the Omicron variant carries 15 mutations, 10 of which are identified in the RBM area ( Figure 1A). This results in the identification of the highest number of newly constructed C/H-CrUPs in the RBD/RBM region, as compared to all other previous virus variants examined (Table S1). Table 2 describes all the new, herein identified, C/H-CrUPs of Omicron variant in Spike's RBD region, in comparison to the Alpha and Delta variants, which represent two of the most predominant variants of the virus in human populations. Hence, it was proven that, in contrast to Alpha and Delta variants, at the end of Omicron variant RBM area from 440 to 508 amino acid position, 8 novel mutations were identified, resulting in the production of 28 new C/H-CrUPs. The most important finding is that in Omicron variant, for the first time, new C/H-CrUPs including two or three mutant amino acids were generated, with the peptides "QAGN*K*P", "N*K*PCN", "LK*SYS*F" and "K*SYS*FR*" being characteristic examples, as a result of the accumulation of multiple mutations in the positions 440, 446, 477, 478 and 493-505. These novel C/H-CrUPs that contain several mutated amino acids could not be found in any other virus variants previously. Taking into consideration recent data about virus infectivity, the multimutated, new, C/H-CrUP collection seems to radically change the structure and the epitope regions of end positions of the RBM area in the Omicron variant, causing a serious compromise of its antigenic capacity and facilitating the immune escape of the virus [9].

Delta (AY.10)
Lam bda (C.37)    Remarkably, RBM area contains 11 out of the 12 contact points of viral Spike protein with the ACE2 cellular receptor. Among them, 7 contact points remained intact, while 4 mutations in positions Q493K, Q498R, N501Y and Y505H were identified, resulting in the construction of 17 new C/H-CrUPs (Table 3). N501Y mutation was found to be a major determinant of increased viral transmission, due to the improved binding affinity of Spike protein to ACE2 cellular receptor [10]. These findings indicate that virus binding to ACE2 receptor is notably affected by C/H-CrUP-specific mutations that can likely strengthen Spike-ACE2 protein-protein interaction(s). Interestingly, an important amino acid sequence in the RBM area is the "NYNYLYRLF" peptide (from 448 to 456 position). This Tyrosine (Y)-enriched peptide carries two contact sites (Y449 and Y453), and it is known as the NF9 peptide [11]. It seems to affect antigen recognition, by being an immunodominant HLA*24:02-restricted epitope identified by CD8 + T cells. Of note, NF9 presents immune stimulation activity, and increases cytokine production derived from CD8 + T cells, such as IFN-γ, TNF-α and IL-2 [12]. In contrast to Delta, in the Omicron variant the NF9 amino acid content is not changed by any mutation detected, thus suggesting that the NF9 peptide could induce early immune system activation and efficient cytokine production, leading to a faster immune response, and thus reducing SARS-CoV-2 virus pathogenicity.

C/H-CrUPs Altered Architecture around the Spike-Cleavage Site(s) of the Omicron Variant
The molecular mechanism of Spike protein's proteolytic activation has been shown to play a crucial role in the selection of host species, virus-cell fusion, and the viral infection of human lung cells [13][14][15]. Spike protein [SPIKE_SARS2 (P0DTC2)] contains three cleavage sites (known as S-cleavage sites) crucial for the virus fusion to the host cell: the R 685 ↓S and R 815 ↓S positions that serve as direct targets of the Furin protease, and the T 696 ↓M position that can be recognized by the TMPRSS2 protease [16][17][18].
In these cleavage sites, the Omicron variant carries only the critical mutation P681H, which also appears in the Alpha variant ( Figure 1B). Strikingly, in contrast to the Delta variant, which contains the P681R mutation, the P681H mutation constructs several new C/H-CrUPs in the Alpha and Omicron variants, thus indicating their dispensable contribution to virus fusion to the host cell (Table 4).

Conclusions
Core Unique Peptides constitute a distinct and important group of peptides within a proteome. The identification of CrUPs in an organism (e.g., virus, microbe, or mutant protein) against a distinct proteome of another organism is a completely novel approach, which could prove useful for the understanding of the action of microorganisms, the association of novel pharmacological targets with therapies, and the design of novel vaccines. It could be employed in many different kinds of diseases, such as cancer, athropozoans diseases, the design of vaccines for pathogenic viruses, and the identification of new antigenic epitopes capable for the development of new diagnostic or therapeutic antibodies. Therefore, we applied this dynamic and novel strategy, for the first time, in the identification of CrUPs derived from SARS-CoV-2 against the human proteome [1]. In that study, we analyzed all the CrUP peptides of all SARS-CoV-2 variants against the proteome of the host organism, which in our case was Human sapiens. Remarkably, this approach clearly revealed the immune escaping capacity, the contagious power and the high pathogenicity of Delta variant, in contrast to other variants. Notably, these findings have been confirmed by epidemiological data concerning the course of the disease.
In the present study, we engaged this approach to the analysis of the SARS-CoV-2 Omicron variant. The analysis of C/H-CrUP landscapes in the heavily mutated SARS-CoV-2 Omicron variant Spike protein unveiled that the Omicron variant, by the generation of novel multi-mutated C/H-CrUPs, could escape the immune system defense mechanisms, while these C/H-CrUP-specific mutations could facilitate more efficient virus binding to the ACE2 cellular receptor, and a more productive fusion of the virus to the host cell. Most importantly, in contrast to the Delta variant, the intact NF9 peptide in the Omicron variant, which has a known immunostimulatory effect, suggests that Omicron exhibits reduced pathogenicity as compared to Delta.