Immunogenic SARS-CoV-2 Epitopes: In Silico Study Towards Better Understanding of COVID-19 Disease—Paving the Way for Vaccine Development

The emergence of the COVID-19 outbreak at the end of 2019, caused by the novel coronavirus SARS-CoV-2, has, to date, led to over 13.6 million infections and nearly 600,000 deaths. Consequently, there is an urgent need to better understand the molecular factors triggering immune defense against the virus and to develop countermeasures to hinder its spread. Using in silico analyses, we showed that human major histocompatibility complex (MHC) class I cell-surface molecules vary in their capacity for binding different SARS-CoV-2-derived epitopes, i.e., short sequences of 8-11 amino acids, and pinpointed five specific SARS-CoV-2 epitopes that are likely to be presented to cytotoxic T-cells and hence activate immune responses. The identified epitopes, each one of nine amino acids, have high sequence similarity to the equivalent epitopes of SARS-CoV virus, which are known to elicit an effective T cell response in vitro. Moreover, we give a structural explanation for the binding of SARS-CoV-2-epitopes to MHC molecules. Our data can help us to better understand the differences in outcomes of COVID-19 patients and may aid the development of vaccines against SARS-CoV-2 and possible future outbreaks of novel coronaviruses.


Introduction
The ongoing pandemic outbreak of COVID-19 has resulted in the declaration of a global health emergency around the world on 30 January 2020 by the World Health Organization (WHO) [1]. The first reported case was on 31 December 2019 from the Chinese city of Wuhan, from which the virus quickly spread to 213 other countries and territories and, as of 17 July 2020, resulted in at least 13.6 million infections and over 585,000 deaths [2]. Based on the early epidemiology of the COVID-19 disease statistics, the WHO estimated that the fatality rate of the novel coronavirus is around 4%, significantly higher than the mortality rate caused by common human coronaviruses [3]. There are currently numerous scientific, clinical, and socio-economical efforts aimed at combating the COVID-19 disease and its ramifications around the world [4,5].
peptides are loaded on MHC receptors using a peptide loading complex consisting of TAP and several other proteins [23].
In order to narrow down the specific epitopes that could elicit an effective MHC class-I-mediated T cell response, we predicted linear 9-mer immunogenic SARS-CoV-2 peptides and their prominent interacting HLA allotypes using the Immune Epitope Database and Analysis Resource (IEDB) and NetCTL1.2 web servers. The identified peptides were then analyzed in conjunction with the available experimental data for SARS-CoV-derived linear T-cell epitopes. The three-dimensional structural models of selected ternary complexes of SARS-CoV-2 epitope-HLA allotype-T cell receptor were created to assess interactions at the structural level. Our results can at least partially explain individual differences in the COVID-19 severity and could potentially be used for vaccine development against SARS-CoV-2.

Source of Sequences
All 26 protein sequences encoded by the most up-to-date SARS-CoV-2 genomic sequence (RefSeq: NC_045512.2) were retrieved from NCBI RefSeq database [24] on 27 March 2020 (full accession identifiers in Table 1).

MHC Class I Epitope Prediction
The IEDB (http://tools.iedb.org/mhci/) [25] and NetCTL1.2 (http://www.cbs.dtu.dk/services/ NetCTL/) [26] web servers were used with default parameters to predict SARS-CoV-2 epitopes and their binding affinities (expressed as IC 50 ) to different HLA allotypes. The IEDB server sorts the predicted MHC-I-binding viral epitopes based on the percentile score, which is calculated by comparing the predicted binding affinities of SARS-CoV-2 peptides and affinities calculated for a large set of peptides, randomly selected from the SWISSPROT database [25]; the IEDB server integrates an artificial neural network (ANN), stabilized matrix method (SMM) and Combinatorial library (CombLib). The IEDB method is highly accurate in classifying MHC class I epitopes, having an AUC for the ROC curve greater than 0.9 [27]. The NetCTL1.2 server integrates prediction of proteasomal cleavage, TAP transport and peptide-binding to 12 MHC-I supertypes (see Table S1A for a list of HLA-A and HLA-B allotypes that belong to these 12 supertypes, with data extracted from the published scientific literature [28,29]). The NetCTL1.2 method allows for the identification of class I MHC epitopes with a sensitivity of 0.80 and specificity of 0.97 based on the default filtering threshold score of 0.75 [26]. Since the NetCTL1.2 and IEDB servers use different multistep approaches to predict the binding of SARS-CoV-2 peptides to HLAs, we used both servers.
MHC-I allotypes are known to bind epitopes with lengths of 8 to 11 amino acids. The optimal epitope length was determined by potting the IC 50 values of the predicted (IEDB) top 1 percentile epitopes against 8-to 11-mers. Moreover, the immunogenicity of the top 1 percentile epitopes were predicted using the MHC-I immunogenicity server of IEDB (http://tools.iedb.org/immunogenicity/) [25]. The epitopes with an immunogenicity score greater than 0.25 were considered for comparison with epitopes predicted using the NetCTL1.2 server.

Comparison of Predicted and Experimentally Known Epitopes
In order to identify experimental epitopes matching the predicted SARS-CoV-2 epitopes, the data of experimentally known SARS-CoV epitopes and their interacting MHC allotypes (validated using T-cell assays and MHC ligand assays) were downloaded from the IEDB database on 15 April 2020 (https://www.iedb.org/database_export_v3.php) [25]. The characteristics of each "match" (protein name, sequence, mapped start-end, MHC-I allotype, etc.) were tabulated and are presented in Table S8.

Epitopes Physicochemical Properties and eMHC-I Complex Stability
The grand average of hydropathicity index (GRAVY) was calculated using the Kyte-Doolittle hydropathy index scale [30]. TMHMM2.0 was used to search potential transmembrane helices [31]. The Jpred4 web server was used to predict secondary structures in non-transmembrane proteins [32]. The half-lives of the predicted eMHC-I complexes was predicted using the NetMHCstabpan1.0 web server [33]. Data were plotted using GraphPad Prism (GraphPad software, version 8.0.0; https://www.graphpad.com).

Docking and Structural Analyses
The crystal structures of an influenza A virus epitope in complex with the HLA-A*02:01 allotype (PDB ID: 5TEZ, chain A and C; 1.7 Å) [34] and hepatitis B core antigen with HLA-A*02:06 (PDB ID: 3OXR, chain A and C; 1.7 Å) [35] were downloaded from the Protein Data Bank (PDB) [36] (see Table S1B for the PDB codes of MHC-I allotype structures indicated in Table S1A). For docking to HLA allotypes, the Rosetta FlexPepDock web server [37] was used, after the 9-mer influenza A epitope was mutated to match selected SARS-CoV-2 epitopes using PyMOL (The PyMOL Molecular Graphics System, Schrödinger, LLC). For ternary complex analysis, the T-cell receptor (TCR)-HLA-influenza A epitope complex (PDB ID:5TEZ) was used as a template. Coordinates of the TCR α and β chains (PDB ID: 5TEZ; chain I and J, respectively) and the docked epitope-HLA-A*02:01 complex were saved using PyMOL, and interacting residues visually inspected at the interface.

Molecular Dynamics Simulations
Molecular dynamics (MD) simulations followed the general protocol detailed in [38]. Briefly, the ternary composite structure of HLA-A*02:01 in complex with the SARS-CoV-2 S protein epitope 1220 FIAGLIAIV 1228 and TCR was used as a starting structure to perform molecular dynamics simulation. The structure was initially prepared using Maestro (Schrödinger Release 2019-1: Maestro, version, Schrödinger, LLC) by adding hydrogen atoms, determining the protonation states of ionizable side-chains and energy minimizing the structure to remove bad contacts. Three independent simulations were carried out using the Amber program (version 2018) [39] and the ff14SB protein forcefield [40]. The ternary structure was solvated with TIP3P water molecules [41] in an octahedral box, keeping a 12 Å distance between solute atoms and the surface of the box. The simulation system was neutralized by adding Na + ions, with additional Na + /Cl − ions incorporated to bring the system salt concentration to 0.15 M. The system was then energy-minimized for 5000 cycles using the steepest descent and conjugate gradient methods. The minimization was carried out in six-stages, where a restraint on solute atoms was gradually lowered from 25 to 0 kcal mol −1 Å −2 . Subsequently, the system was heated to 300 K during 100 ps with a 10 kcal mol −1 Å −2 restraint on solute atoms. Next, equilibration was performed for 6 ns in four steps by systematically reducing the restraint force to 0 kcal mol −1 Å −2 . Finally, a restraint-free 100 ns production simulation was carried out at constant temperature (300 K) and pressure (1 bar). Coordinates were saved every 20 ps and the sampled conformations were analyzed using VMD [42], Cpptraj [43] and Chimera [44] programs. Root-mean-square fluctuation (RMSF) calculation was computed using the Cα atoms of the initial structure as a reference. Hydrogen bonds were defined with a bond length ≤3.5 Å and a bond angle ≥135 • .

Prediction of Binding of SARS-CoV-2-Derived Peptides to MHC Class I Receptors
In order to estimate the potential antiviral cytotoxic T-cell response linked to specific HLA allotypes, we predicted the binding affinity of all possible linear 8-to 11-mer peptides derived from the 26 proteins (Table 1) of the SARS-CoV-2 proteome (N 8 = 375, N 9 = 2105, N 10 = 1556 and N 11 = 2377) to HLA-A and HLA-B supertypes using the IEDB web server [25]. The HLA-C supertype-an extremely good ligand for killer-cell immunoglobulin-like receptor (KIR) on natural killer (NK) cells-was not selected for this analysis because it is known to be less effective in presenting antigens to cytotoxic T-cells than either HLA-A or HLA-B [45]. The class I MHC-epitope (eMHC-I) complexes were classified into three different groups based on the predicted epitope-to-MHC binding affinity scores [46]: strong binders (IC 50 ≤ 50 nM), weak binders (50 nM < IC 50 ≤ 500 nM) and non-binders (IC 50 > 500 nM).
Out of the highest scoring SARS-CoV-2 epitopes (top 1 percentile), the 9-and 10-mers had, on average, a higher binding affinity to MHC class I supertypes than either the 8-or 11-mer epitopes ( Figure 1A). Moreover, there were 52% more 9-mer peptides (3187) predicted to bind to class I MHC receptors (IC 50 ≤ 500 nM) in comparison with 10-mer peptides (2096) ( Figure 1B). Consequently, the top one percentile 9-mer peptides were selected for further analysis.
The predicted (IEDB) SARS-CoV-2 9-mer epitopes (top one percentile) for 50 different HLA allotypes (Table S1A) with an immunogenicity score ≥ 0.25 were compared to those predicted using the NetCTL1.2 server [26]. The MHC class I epitopes identified from this consensus/combined approach were classified using the binding affinity values from IEDB prediction method as strong MHC binders (Table S2), weak binders (Table S3) or non-binders (Table S4). Based on this analysis, many peptides derived from nonstructural proteins (nsp), surface glycoproteins (S) and membrane proteins (M) of SARS-CoV-2 are likely to be presented by MHC class I receptors ( Figure 1C, Table S5), and hence have a high potential to activate an immune response or the destruction of infected host cells, with many epitopes being derived from the S, E, 3C-like proteinase (3CLpro) and 3 -to-5 exonuclease (35EXO) proteins. In order to predict the most potent epitopes, including those without experimental data, we screened in silico the proteins of SARS-CoV-2. We ranked the epitopes based on predicted binding to MHC-I molecules (affinity values; IEDB) and "combined score" (NetCTL1.2), predicting the potency of an epitope to be presented by MHC-I (Tables S6 and S7). The top-ranked, common epitopes from both prediction methods (Table 2) were selected for further analysis; all five of these predicted epitopes are unique to SARS-CoV-2 and not identical in SARS-CoV. this comparison, we identified 29 common epitopes of SARS-CoV/SARS-CoV-2 (Table S8) and HLA allotypes HLA-A*02:01 and HLA-A*02:06 molecules as the top antigen presenters; both allotypes having strong binding affinities (IC50 ≤ 50 nM) to six peptide epitopes (Table 3): residues 1220 FIAGLIAIV 1228 from the S protein, 17 VLLFLAFVV 25 and 20 FLAFVVFLL 28 to the E protein, 204 VLAWLYAAV 212 to 3CLpro, 330 LLSAGIFGA 338 to nsp3 and 184 VLWAHGFEL 192 to 35EXO of SARS-CoV-2.

Analysis of Correlation between in Silico Identified SARS-CoV-2 (This Study) and Experimentally Validated SARS-CoV (from IEDB) Epitopes
In order to obtain experimental proof that the MHC-I-Binding SARS-CoV-2 epitopes predicted in this study are presented by the MHC class I antigen processing pathway in vivo, we compared the in-silico-identified 9-mer epitopes (n = 166, Table S5) to the equivalent, experimentally identified epitopes of SARS-CoV strains (n = 3760; MHC ligand assays data from the IEDB database) [25]. In this comparison, we identified 29 common epitopes of SARS-CoV/SARS-CoV-2 (Table S8) and HLA allotypes HLA-A*02:01 and HLA-A*02:06 molecules as the top antigen presenters; both allotypes having strong binding affinities (IC 50 ≤ 50 nM) to six peptide epitopes (Table 3): residues 1220 FIAGLIAIV 1228 from the S protein, 17 VLLFLAFVV 25 and 20 FLAFVVFLL 28 to the E protein, 204 VLAWLYAAV 212 to 3CLpro, 330 LLSAGIFGA 338 to nsp3 and 184 VLWAHGFEL 192 to 35EXO of SARS-CoV-2. Table 3. SARS-CoV-2-derived HLA-A*02 supertype-binding epitopes that are identical to the epitopes of SARS-CoV strains experimentally known to activate cytotoxic T-cells. To examine the evolutionary conservation or "sequence stability" of the six common epitopes of SARS-CoV/SARS-CoV-2 binding strongly to HLA-A*02:01 and HLA-A*02:06, we performed a blastp (NCBI) search against the non-redundant database of SARS-CoV-2 [47]. Only one epitope-330 LLSAGIFGA 338 from nsp3-was found to have heterogeneity in its sequence (Table 3), whereas the other five epitopes were fully conserved, i.e., not yet having changed during the evolution of SARS-CoV-2, suggesting an important role for these peptide sequences for the virus and, at the same time, making these peptides top-candidate antigens for activating the cytotoxic T-cell response against SARS-CoV-2 itself. The five conserved and experimentally proven epitopes we selected for further analyses.

Efficiency of Epitope Presentation to Stimulate an Immune Response
Since MHC class I molecules must retain bound epitopes long enough at the cell surface to successfully induce T-cell-specific immune responses [48], we estimated the half-life in hours for each of the experimentally identified epitope-HLA complexes (Table 3) using the NetMHCstabpan1.0 web server [33]. This analysis revealed that the HLA-A*02:01 and HLA-A*02:06 allotypes have longer predicted half-lives than HLA-A*68:02 when in complex with the immunogenic epitopes (Table 4): all five HLA-A*02:01-epitope complexes and three out of five HLA-A*02:06-epitope complexes had a predicted half-life longer than 4 h, whereas the single HLA-A*68:02-epitope complex had the lowest half-life of less than one hour. Table 4. Predicted half-lives of complexes of the conserved SARS-CoV-2-derived most immunogenic experimentally identified epitopes and HLA-A*02 allotypes shown in Table 3. Secondary structures, localization within SARS-CoV-2 and GRAVY (grand average of hydropathicity index) scores of the epitopes.

Epitopes
Allotypes In order to examine the distribution of the HLA-A*02-antigen complex half-lives in general, we analyzed more than 6500 experimentally known complexes of HLA-A*02 supertypes (see Table S1A for a list of allotypes) bound to bacterial-and viral-pathogen-derived epitopes, which were extracted from the IEDB database [25]. This revealed that immunogenic (IC 50 ≤ 50 nM) HLA-A*02:01-epitope complexes with predicted half-lives of more than three hours were almost double in number in comparison to the HLA-A*02:06-epitope complexes, whereas no immunogenic HLA-A*68:02-epitope complexes were found that would have a similar half-life ( Figure S1A). Interestingly, the amino acid sequences of the α chains of HLA-A*02:01 and HLA-A*02:06 are 99.6% identical; the only difference in the sequences-a phenylalanine to tyrosine substitution-is located at the epitope-binding site and is the likely reason for the difference in the epitope binding affinities and half-lives.
Out of the five most stable epitopes (Table 4), three-1220 FIAGLIAIV 1228 (S protein), 17 VLLFLAFVV 25 (E protein) and 20 FLAFVVFLL 28 (E protein)-were derived from transmembrane proteins and were, as expected, more hydrophobic (GRAVY score > 3) than the 204 VLAWLYAAV 212 (3CLpro) and 184 VLWAHGFEL 192 (35EXO) epitopes originating from intravirion proteins ( Table 4). The epitopes derived from the S and E proteins respectively map to transmembrane helical segments 1214 WYIWLGFIAGLIAIVMVTIMLCC 1236 and 12 LIVNSVLLFLAFVVFLLVTLAIL 34 that, based on analysis using the TMHMM2.0 web server [31], are bitopic in nature, i.e., the predicted transmembrane helices span the lipid bilayer only once. In fact, our predicted epitopes share features, such as being membrane associated, with the well-studied and clinically important epitopes in HIV [49] and tuberculosis [50]. This supports the idea that the transmembrane helical epitopes of SARS-CoV-2 could potentially stimulate cytotoxic T-cell-mediated immune responses.
In order to assess whether hydrophobic residues are enriched in the immunogenic epitopes, we compared the GRAVY score distribution of the immunogenic (n = 1678, IC 50 ≤ 50 nM) and non-immunogenic (n = 2228; IC 50 > 500 nM) HLA-A*02-bound epitopes of bacterial and viral pathogens (retrieved from the IEDB database) ( Figure S1B). We found that, for epitopes with a GRAVY score greater than one (having at least seven non-charged residues in a 9-mer epitope), the immunogenic epitopes were more enriched in hydrophobic residues (55%) in comparison to the non-immunogenic epitopes (19%). Thus, our analysis suggests that HLA-A*02 supertype molecules prefer binding to hydrophobic epitopes (IC 50 ≤ 50 nM), and this agrees with published results [51]. Moreover, we obtained similar results from our in-silico-identified SARS-CoV-2-derived novel epitope-MHC-I complexes ( Table 2): the most potent identified epitopes had long half-lives and were derived from either hydrophobic, transmembrane regions of SARS-CoV-2 or from intravirion proteins that were also found to be mutated among SARS-CoV-2 sequences (Table 5). Table 5. Predicted half-lives of the novel SARS-CoV-2-derived, most immunogenic in-silico-identified epitopes in complex with the allotypes shown in Table 2. Secondary structures, localization within SARS-CoV-2, GRAVY scores and known mutations in the epitopes.

Structural Properties of the Peptide-HLA-A*02:01-Complexes Defining T Cell Receptor (TCR) Recognition
In order to compare the interaction patterns adopted by the predicted top five immunogenic epitopes (Table 4) of MHC-I molecules, we docked the epitopes to the cleft between the α1 and α2 helices of HLA-A*02:01 (PDB ID: 5TEZ, chain A) and HLA-A*02:06 (PDB ID: 3OXR, chain A). This docking analysis agrees with our other prediction data and suggests that the immunogenic epitopes bind to both the HLA-A*02:01 and HLA-A*02:06 allotypes by adopting a similar backbone conformation, as has been observed for the canonical epitope 1 GILGFVFTL 9 of the influenza A virus (PDB ID: 5TEZ, chain C) ( Figure S1C,D). In more detail, we observed that the residues at position (Pos) 1, 2, 3 and 9 are fully buried within the antigen-binding cleft of HLAs and act as anchoring residues, providing steric constraints to the Nand C-terminus of the epitopes (Figure 2A,B). Comparison of the root-mean-square deviation (rmsd) of the superposed Cα atoms of the epitopes revealed a maximum deviation for the 184 VLWAHGFEL 192 epitope (35EXO), possibly due to the bulky, aromatic side chain of W186 buried at Pos3 and two charged residues: H188 at Pos5 and E191 at Pos8 (Table 6).
Interactions between the partially solvent-exposed hydrophobic residues at Pos4-Pos8 of the docked epitope 1220 FIAGLIAIV 1228 (S protein) and the solvent-exposed residues A69, K66, V76, T80, K146, V152 and Q155 of the α1 and α2 helices of HLA-A*02:01, complement what is seen in the X-ray crystal structure of the influenza A virus epitope-HLA-A*02:01 complex (PDB ID: 5TEZ, chain A and C), and suggest that the SARS-CoV-2 epitope-HLA complex interacts with TCR ( Figure 2C). and CDR3β of the TCR-β chain, recognize the HLA-A*02:01-SARS-CoV-2 FIAGLIAIV epitope ( Figure 2D); the cooperative interacting nature of residues in these loops could provide specificity towards class I MHC molecules. Moreover, the residues L96 and W99 within the CDR3β loop and I96 within the CDR3α loop could be important for antigen-HLA complex recognition due to their likely direct contacts with residues L1224, I1225 and I1227 of the 1220 FIAGLIAIV 1228 epitope and residues A69, V152 and Q155 of HLA-A*02:01 ( Figure 2E).    To understand the molecular basis of TCR binding to the SARS-CoV-2 antigen-loaded MHC-I cell surface molecules, HLA-A*02:01, in complex with the S protein epitope 1220 FIAGLIAIV 1228 , was superimposed with the HLA-A*02:01 allotype of the influenza A virus ternary complex structure (PDB ID: 5TEZ, chain A), and the atomic coordinates of TCR (TCR-α and TCR-β chain; PDB ID: 5TEZ, chain I and J, respectively) were then utilized for visual analysis of binding. This visualization suggests that the loops CDR2α, CDR1α and CDR3α of the TCR-α chain, and loops CDR2β, CDR1β and CDR3β of the TCR-β chain, recognize the HLA-A*02:01-SARS-CoV-2 1220 FIAGLIAIV 1228 epitope ( Figure 2D); the cooperative interacting nature of residues in these loops could provide specificity towards class I MHC molecules. Moreover, the residues L96 and W99 within the CDR3β loop and I96 within the CDR3α loop could be important for antigen-HLA complex recognition due to their likely direct contacts with residues L1224, I1225 and I1227 of the 1220 FIAGLIAIV 1228 epitope and residues A69, V152 and Q155 of HLA-A*02:01 ( Figure 2E).
To assess the conformational and intermolecular interaction dynamics of the HLA-A*02:01-1220 FIAGLIAIV 1228 S protein epitope-TCR complex, a 100 ns simulation was carried out on the ternary structure in triplicate. The global conformational dynamics was assessed by computing the RMSF over the Cα atoms ( Figure 3A), which shows a stable TCR and epitope, with an average RMSF of 1.86 ± 0.47 Å and 2.36 ± 0.18 Å, respectively. The α1 and α2 domains of HLA-A*02:01 were also stable (3.16 ± 0.48 Å, 3.19 ± 0.46 Å), unlike the α3 domain, which exhibits a higher fluctuation (6.4 ± 2.57 Å) likely arising from the flexibility of the loop between domains α2 and α3 and the lack of stabilizing β2-microglobulin. Figure 3B compares the complex at 0, 50 and 100 ns during the simulation. These observations were consistent among the three independent simulations ( Figure S2).
The intermolecular interactions taking place in the complex structure were also examined by visually inspecting the trajectory and calculating the number of hydrogen bonds formed during the simulation (Table S9). For example, in simulation 1, the highest number of hydrogen bond interactions recorded were between HLA-A*02:01 and backbone atoms of the epitope peptide, with T143-V1228 and W147-I1227 interactions topping the list, as they were respectively formed during 98% and 97% of the simulation time. On the other hand, hydrogen bond interactions between the TCR and the epitope were almost exclusively from interactions between W99 (β chain)-I1225 (63%) and Q101 (α chain)-I1225 (10%). The HLA-A*02:01-TCR hydrogen-bonding interactions were mostly between residues from the CDR1α, CDR2α, CDR3α, CDR1β and CDR2β loops of the TCR and the α1 and α2 helices of HLA-A*02:01. Visual analysis also revealed that hydrophobic interactions are integral to the interaction the epitope is making, both with HLA-A*02:01 and the TCR, as the hydrophobic epitope peptide was tightly enclosed by hydrophobic clusters from the proteins throughout the simulation, reflecting observations made on the basis of the original docked complex. Thus, our structural analysis suggests that the S protein epitope 1220 FIAGLIAIV 1228 of SARS-CoV-2 (and SARS-CoV), and the other epitopes listed in Table 4, could form strong complexes with HLA-A*02:01 and HLA-A*02:06 allotypes, and that the epitope-HLA complexes can also be recognized by TCRs to initiate cytotoxic T-cell-mediated immune responses. The intermolecular interactions taking place in the complex structure were also examined by visually inspecting the trajectory and calculating the number of hydrogen bonds formed during the simulation (Table S9). For example, in simulation 1, the highest number of hydrogen bond interactions recorded were between HLA-A*02:01 and backbone atoms of the epitope peptide, with T143-V1228 and W147-I1227 interactions topping the list, as they were respectively formed during 98% and 97% of the simulation time. On the other hand, hydrogen bond interactions between the TCR and the epitope were almost exclusively from interactions between W99 (β chain)-I1225 (63%) and Q101 (α chain)-I1225 (10%). The HLA-A*02:01-TCR hydrogen-bonding interactions were mostly between residues from the CDR1α, CDR2α, CDR3α, CDR1β and CDR2β loops of the TCR and the α1 Similar to the docking results obtained from the experimentally known epitopes with the HLA-A*02 supertype, we observed that residues at Pos1-3 and Pos9 (A1507, E1508, W1509 and L1515) of the novel epitope 1507 AEWFLAYIL 1515 -derived from a transmembrane segment of the nsp3 protein of SARS-CoV-2-are fully buried within the antigen-binding cleft of HLA-B*40:01 (PDB ID: 6IEX, chain A). Moreover, the location of the partially solvent-exposed residues at Pos4-Pos8 (L1511, A1512, Y1513 and I1514) of the docked epitope along with solvent-exposed residues R62, T69, T73, E76, Q155, Y159, E163 and W167 of the α1 and α2 helices suggest an interaction of the 1507 AEWFLAYIL 1515 -HLA-B*40:01 complex with TCR ( Figure 2F).

Discussion
In order to tackle the current COVID-19 pandemic, it is critically important to better understand the underlying mechanism that gives rise to the individual differences in disease severity as well as to aid the vaccine development against the causative virus, SARS-CoV-2. Effective vaccinations are needed to eradicate the virus from populations all over the world and knowledge regarding the immunological response should have a significant impact on understanding disease progression. However, due to the limited experimental and clinical data currently available on the specific immune responses against SARS-CoV-2, the development of an effective vaccine against COVID-19 will be a challenge. This study sought to better understand the individual differences in the viral antigen presentation pathway and to aid the development of vaccines against COVID-19 by predicting in silico SARS-CoV-2 immunogenic epitopes.
Based solely on in silico predictions, the most potent SARS-CoV-2-derived MHC class I binding epitopes are 1507 AEWFLAYIL 1515 and 1505 LVAEWFLAY 1513 in terms of binding affinity, hydrophobicity and stability. However, for these "in silico epitopes", only limited experimental data are available to correlate with. Therefore, in this study we mainly focused on potential SARS-CoV-2 epitopes that were conserved with SARS-CoV epitopes experimentally known to activate cytotoxic T-cells, and hence could be used in vaccine development.
The S glycoprotein-derived epitope 1220 FIAGLIAIV 1228 binds to the HLA-A*02:01 and HLA-A*02:06 allotypes with experimental IC 50 values lower than 50 nM (Table 3). Our docking analysis supports these predictions, i.e., that epitope 1220 FIAGLIAIV 1228 could bind tightly to these allotypes and that ternary complexes with TCRs could form. Moreover, recent data demonstrate that patients with a severe form of COVID-19 have a stronger T-cell response after stimulation with the SARS-CoV-2 S-protein peptide pool compared to those with a mild manifestation of the disease [22,52]. The disease progression of COVID-19 is also associated with a higher magnitude of inflammatory cytokine-producing CD8+ T cells [52]. Whether these immune responses are due to strong binding of SARS-CoV-2 epitopes, including 1220 FIAGLIAIV 1228 , to certain HLA allotypes, such as HLA-A*02:01 and HLA-A*02:06, or whether tight virus epitope-HLA interaction in general can actually be harmful for COVID-19 patients by causing, e.g., an immunological over-reaction, is not yet fully understood [53,54]. Furthermore, both CD4+ and CD8+ T-cells have been shown to be stimulated by overlapping peptides (15-mers overlapping by 10 amino acids) of the entire S glycoprotein sequence [55]. Does this mean that the S protein might function as a double-edged sword-that is, being crucial for viral entry into the host cell, but also important for overstimulating the immune responses, causing severe inflammation that aids the spread of the virus to surrounding cells? This still needs to be answered. The latter "sword" is known to be avoided at least by HIV, which has a sophisticated mechanism to limit the infection rate in order to better avoid immune surveillance [56,57].
A recent in vivo study shows that epitopes derived from the C-terminus of the S protein had a significantly stronger CD4+ T helper cell response in healthy donors in comparison to those infected with SARS-CoV-2 [22]. The CD4+ T cells' cross-reactivity to the S protein might represent the key for understanding the different disease manifestations of COVID-19, particularly in the asymptomatic infections in children and adolescents. Our predicted epitope 1220 FIAGLIAIV 1228 overlaps with the C-terminal sequence of the S protein containing the S2 subunit, which is internalized after TMPRSS2 cleavage. Therefore, this particular amino acid sequence may also be important for inducing a protective immunological response towards immunity in COVID-19. However, canonically T helper cells recognize MHC class II molecules, whereas our prediction is based on the MHC class I molecules' ability to present viral antigens for a possible cytotoxic T-cell response [58]. Indeed, we found four 15-mer epitopes that include the intact 9-mer 1220 FIAGLIAIV 1228 epitope sequence and showed binding affinities < 500 nM with DRB1*01:01 allotype of MHC class II (http://tools.iedb.org/mhcii/) [25], an allotype that is common in Caucasoid and Oriental ethnic backgrounds (https://www.ebi.ac.uk/ipd/imgt/hla/ethnicity.html) [59]. Nevertheless, there are a few reports regarding MHC class I-reactive CD4+ T helper cells, including the study where co-cultures of highly purified CD4+ T cells, together with a stimulatory MHC class II-negative cell line transfected with MHC class I molecules, were used to show the direct interaction of T helper cells with MHC class I molecules [60]. However, whether or not the 1220 FIAGLIAIV 1228 epitope is presented to both cytotoxic and helper T cells needs to be experimentally verified; a recent study that appeared while the current work was under review suggested that cross-reactivity could affect disease progression in COVID-19 [54]. Furthermore, the latest experimental reports suggest that the S glycoprotein of SARS-CoV-2 is both O-and N-glycosylated, especially on the RDB domain, which could mask immunogenic epitopes and may play an important role in SARS-CoV-2 immune evasion [61][62][63]. Fortunately, the predicted epitope 1220 FIAGLIAIV 1228 is part of a transmembrane helix, and consequently is neither O-nor N-glycosylated; the closest glycosylation site is at Pos1194, rendering this particular amino acid sequence potentially suitable for vaccine development.
Intriguingly, the SARS-CoV-2-derived, membrane glycoprotein epitope 148 HLRIAGHHL 156 , which has low binding affinity (IC 50 = 1693.63 nM; Table S4) towards HLA-B*15:02, is 78% identical in sequence with the intravirion SARS-CoV epitope HLRMAGHSL also from a membrane glycoprotein and shown to elicit a strong T-cell response in patients with the HLA-B*15:02 allotype [19]. Furthermore, HLA-B*15:02 has been shown to have a protective role against the severe forms of SARS-CoV [64]. This inspired us to do a separate in silico binding affinity analysis of the SARS-CoV HLRMAGHSL epitope to the HLA-B*15:02 allotype: a high binding affinity (IC 50 = 232.85 nM) of the HLRMAGHSL epitope with the HLA-B*15:02 was predicted and agrees with the reported [19] protective immune response against SARS-CoV. Thus, there seems to be a difference between the highly conserved SARS-CoV-derived HLRMAGHSL and SARS-CoV-2-derived 148 HLRIAGHHL 156 epitopes in their potency towards the HLA-B*15:02 allotype. Moreover, the predicted low binding affinity of the SARS-CoV-2 epitopes with the HLA-B*15:02 allotype (Table S4) might not be sufficient to induce immune responses, thus rendering the potential of this specific epitope unfavorable for vaccination.
Based on our predictions, HLA-B*46:01 is one of the worst allotypes for presenting SARS-CoV-2-derived epitopes with an average binding score (IC 50 ) of 2264 nM for the four epitopes predicted to bind (Table S4). This is in line with similar results from the predictions for SARS-CoV-2 and previous clinical data from SARS-CoV patients, demonstrating that this particular HLA allotype gives susceptibility to a more severe form of the viral disease [65]. Furthermore, our prediction shows that HLA-B*46:01 is not binding to either the S or M protein-derived peptides, strengthening the reported general view that HLA-B*46:01 is not optimal for eliciting an immune response in COVID-19 patients [58].
Taken together, identification of the predicted most immunogenic epitopes of SARS-CoV-2 could aid vaccine development. Since the sequences of the "top" epitopes of SARS-CoV-2 and SARS-CoV are highly conserved and the SARS epitopes are known to elicit an immunological response based on a previous study [66], a common vaccine protecting against both viruses and potential future strains is possible. Moreover, the selected experimentally known epitopes have been shown not to evoke an unwanted T cell cross-reactivity in vitro, further validating the potential use of the conserved SARS-CoV/CoV-2 peptides in vaccine development without disrupting self-tolerance [67]. However, there are many hurdles that need to be addressed; for example, due to the hydrophobic nature of these epitopes, they probably would need to be loaded in a liposome or nanocarrier for efficient vaccine delivery [68]. Moreover, as these conserved epitopes are presented by only a few HLA allotypes, i.e., A*02:01, A*02:06, A*68:02, which are common in American Indian, Caucasoid, Hispanic and Oriental ethnic backgrounds [59], the estimated world population coverage using these epitopes would only be around 42.1% (http://tools.iedb.org/population/) [25]. Therefore, developing a globally effective SARS-CoV-2 vaccine will probably require a pool of both novel and conserved epitopes, making a globally effective SARS-CoV-2 vaccine development a challenging task.

Conclusions
In the present study, we identified SARS-CoV-2 epitopes that were predicted to be presented by the MHC class I antigen processing pathway to the cytotoxic T cells. We report five purely in-silico-predicted most potent epitopes unique to SARS-CoV-2 and five potent SARS-CoV-2 epitope peptides identical to and experimentally determined for SARS-CoV. The novel SARS-CoV-2 epitopes were analyzed for their interaction with HLA allotypes using the IEDB and NetCTL1.2 web servers and three-dimensional structural models of selected, molecular dynamics simulation proven ternary complexes of SARS-CoV-2 pHLA-TCRs were created to assess interactions at structural level. HLA-A*02:01 and HLA-A*02:06 were found to have the greatest potential to present the selected epitopes, which are hydrophobic in nature and originated mainly from the transmembrane region of SARS-CoV-2 proteins. Our results could assist in the understanding of the individual and varying disease progression of COVID-19, as well as paving the way towards vaccine development against SARS-CoV-2.