SARS-CoV-2 Spike-Derived Peptides Presented by HLA Molecules

: The SARS-CoV-2 virus responsible for the COVID-19 pandemic has caused signiﬁcant morbidity and mortality worldwide. With the remarkable advances in medical research, vaccines were developed to prime the human immune system and decrease disease severity. Despite these achievements, the fundamental basis of immunity to the SARS-CoV-2 virus is still largely undeﬁned. Here, we solved the crystal structure of three spike-derived peptides presented by three different HLA molecules, and determined the stability of the overall peptide–HLA complexes formed. The peptide presentation of spike-derived peptides can inﬂuence the way in which CD8 + T cells can recognise infected cells, clear infection, and therefore, control the outcome of the disease.


Introduction
The SARS-CoV-2 virus is responsible for the ongoing COVID-19 pandemic declared by the World Health Organisation (WHO) in March 2020. After one year of intensive research and clinical trials, some vaccines are currently available and administrated. In addition, we have started to gain an understanding of the immune response towards this emerging virus. However, we still have a lot to discover to understand SARS-CoV-2 infection and the immune response associated with it. Studies reporting some strong T cell and B cell epitopes are emerging [1][2][3][4], and this work is paramount to gain an appreciation of the strength and level of immune response that different individuals can produce towards this novel virus.
T cells, and especially CD8 + or cytotoxic T cells, are critical in the protection against viral infections as they have the capability to recognize and eliminate infected cells in order to clear the infection [5]. CD8 + T cells recognize peptides derived from the virus that are presented by highly polymorphic human leukocyte antigen (HLA) molecules. In order to understand the CD8 + T cell response towards the SARS-CoV-2 virus, we need to determine which viral peptides activate T cells, as well as which HLA molecules can stably present them [6,7]. This information, in the context of SARS-CoV-2, is currently limited [8].
Initially, algorithms were used to predict SARS-CoV-2 peptides and their potential HLA restriction. Unfortunately, these predictions are not always accurate and could be attributed to either the wrong HLA molecule or the wrong peptide. Therefore, there is a need to further investigate which peptides are able to bind their predicted HLA molecule, activate a T cell response [9], and thus provide protective immunity. Our previous work showed that not all predicted peptides are able to bind HLA molecules, while others were unstable [7]. Thus, it is critical to have a better understanding of the peptides' ability to bind and effectively stabilize the peptide-HLA (pHLA) complex, as this impacts how the peptide will be presented, affecting the lifetime that a pHLA can be displayed on the surface of cells, and impacting the likelihood of a T cell binding to the pHLA complex. In addition, structural characterization of peptide presentation by HLA complexes also reveals which peptide residues will be accessible for binding to the T cell receptor (TCR) [6]. This information may help predict and understand which viral mutations within this peptide could be tolerated by T cells or otherwise lead to viral escape, and therefore be of concern [10].
Here, we present the crystal structures of three spike-derived peptides presented by three frequently expressed HLA molecules worldwide, namely HLA-A*02:01, HLA-A*11:01, and HLA-B*35:01. Our work provides a snapshot of the parts of the spike protein that are presented by HLA molecules to the immune system, especially to T cells. In addition, the structures reveal solvent exposed residues in each peptide, which are available for interaction with the TCR. This information could help map potential mutations on these peptides that might be tolerable or detrimental to the immune system.

Protein Expression, Refold, and Purification
DNA plasmids encoding HLA-A*02:01, HLA-A*11:01, and HLA-B*35:01 and human β-2-microglobulin were each separately transformed into a BL21 strain of Escherichia. coli (E. coli). The respective sequences were obtained from the IMGT/HLA database (doi:10.1093/nar/gkz950). The soluble part of the HLA heavy chain (1-275 residues) was ordered sub-cloned into pET30 vector for bacterial expression using the NdeI/HindIII restriction enzyme site for sub-cloning from GenScript. The presence of the insert was confirmed by sequencing by GenScript for each construct. Recombinant proteins were individually expressed and inclusion bodies were extracted and purified from the transformed E. coli cells. Thirty milligrams of each of the HLA inclusion bodies was refolded with 10 mg of β-2-microglobulin inclusion bodies and 5 mg of peptide (GenScript, Piscataway, NJ, USA) into a buffer containing 3 M urea, 0.5 M L-arginine, 0.1 M Tris-HCl pH 8.0, 2.5 mM EDTA pH 8.0, 5 mM glutathione (reduced), and 1.25 mM glutathione (oxidised). The peptide sequences are summarized in Table 1. This mixture was dialysed in 10 mM Tris-HCl pH 8.0 and purified using anion exchange chromatography using a Hi-TrapQ column (GE Healthcare, Chicago, IL, USA).

Differential Scanning Fluorimetry
Differential scanning fluorimetry (DSF) was performed to determine the stability of each pHLA using the fluorescent dye SYPRO orange, and fluorescence was measured in a Qiagen RG6 real-time PCR machine. Each of the pHLA complexes was in a solution of 10 mM Tris-HCl pH 8 and 150 mM NaCl, and was measured at two different concentrations (5 and 10 µM) in duplicate, where samples were heated from 30 to 95 • C at a rate of 0.5 • C/min. Fluorescence intensity was detected using a default excitation and emission channel set to yellow (excitation of approximately 530 nm and detection at approximately 557 nm). Fluorescence intensity data was normalised and plotted using GraphPad Prism 8 (version 8.4.2). The Tm, or thermal midpoint, represents the temperature at 50% of maximal fluorescence. The results are reported in Table 1.

Peptide Conservation within SARS-CoV-2 Isolates
Complete spike protein sequences from SARS-CoV-2 isolates (taxid ID 2697049) were obtained from the NCBI virus database http://www.ncbi.nlm.nih.gov/labs/virus (accessed on 22 March 2021). Sequences were aligned using https://www.fludb.org/brc/ home.spg?decorator=influenza (accessed on 22 March 2021). Sequences with an unknown amino acid (X) within the peptide of interest were removed from the analysis. The sequence alignment results are summarised in Table 2.
approximately 5% which were used for R free calculation. Values in parentheses are for the highest-resolution shell.

The Spike-Derived Peptides Were Able to Form a Stable Complex with Their Respective HLA Molecules
We selected three peptides derived from the SARS-CoV-2 spike protein that were originally predicted to bind to HLA molecules (Table 1) [19][20][21]. Subsequently, the S 386-395 peptide was described as recognised by CD8 + T cells using tetramer staining [22], and S 896-904 can activate CD8 + T cells using a T cell activation assay [23]. Therefore, they are good potential targets as T cell antigens, and warrant more investigation.
These three spike-derived peptides have been conserved in the sequenced SARS-CoV-2 isolates reported so far [24]. Indeed, our sequence analysis of spike proteins sequenced from >60,000 SARS-CoV-2 isolates revealed that all three peptides were >99% conserved in all geographic locations (Table 2). Therefore, they could represent good targets for therapeutic and vaccine design.
Our first aim was to determine if each peptide was able to form a stable complex with its specific HLA molecule. To this end, we refolded each of the three peptides with its respective HLA molecule, purified the pHLA complexes, and determined their thermal stability ( Table 1). The data showed that each of the pHLA complexes had a thermal midpoint temperature, or Tm, well above the human body temperature of 37 • C. Therefore, it is expected that these pHLA complexes will remain stable on the cell surface. Interestingly, in addition to binding HLA-A*11:01, the S 370-378 peptide has also been predicted to bind to the HLA-A*03:01 molecule [21], and is described as immunogenic in HLA-A68 + donors [23]. As these three HLA molecules all belong to the HLA-A3 superfamily [25], it is likely that the S 370-378 peptide is able to be presented by all three HLA molecules [26].
This data confirms that the three spike-derived peptides can form stable complexes with their respective HLA molecules, and are therefore an interesting target for T cells.

Structure of HLA-A*02:01-S 386-395 Reveals a Flat Peptide Conformation
To understand how SARS-CoV-2 spike-derived peptides are presented to T cells, we solved the structure of each peptide in complex with their respective HLA molecule using X-ray crystallography (Table 3).
We solved the structure of S 386-395 in complex with HLA-A*02:01 at a resolution of 2.35 Å, and the electron density was clear for the peptide ( Figure 1A,B). The structure shows that the S 386-395 peptide binds into the peptide-binding cleft of HLA-A*02:01 in a canonical conformation, with anchor residues at position 2 (P2-Leu) and position 10 (P10-Val) docking deep into the B and F pockets, respectively ( Figure 1C). In addition, the P3-Asn, P5-Leu, and P7-Phe were also buried, acting as secondary anchors, and interact with each other.

Structure of HLA-A*02:01-S386-395 Reveals a Flat Peptide Conformation
To understand how SARS-CoV-2 spike-derived peptides are presented to T cells, we solved the structure of each peptide in complex with their respective HLA molecule using X-ray crystallography (Table 3).
We solved the structure of S386-395 in complex with HLA-A*02:01 at a resolution of 2.35 Å, and the electron density was clear for the peptide ( Figure 1A,B). The structure shows that the S386-395 peptide binds into the peptide-binding cleft of HLA-A*02:01 in a canonical conformation, with anchor residues at position 2 (P2-Leu) and position 10 (P10-Val) docking deep into the B and F pockets, respectively ( Figure 1C). In addition, the P3-Asn, P5-Leu, and P7-Phe were also buried, acting as secondary anchors, and interact with each other. The S386-395 peptide is a 10mer peptide, longer than the classical 9mer that is highly characteristic of HLA class I molecules. While 9mer peptides fit perfectly in an extended conformation in the antigen-binding cleft of HLA class I, longer peptides have adopted conformations where some residues of the peptide are bulged out of the cleft [27]. The 10mer peptide S386-395 is not bulged out of the HLA-A*02:01 antigen-binding cleft, but adopts a rather flat conformation, likely due to the fact that half of its residues are buried in the cleft. This peptide was predicted to bind HLA-A*02:01 [19], and was recognised by T cells from unexposed donors [22]. The molecular docking prediction from Can et al. showed that P1-Lys, P3-Asn, P4-Asp, and P6-Cys were predicted to be solvent exposed. Comparison with our crystal structure revealed that the solvent-exposed residues were P1-Lys, P4-Asp, P6-Cys, P8-Thr, and P9-Asn instead ( Figure 1D). Interestingly, the P6-Cys was solvent exposed and therefore available to form a disulfide bond. Indeed, we observed a disulfide bond between the peptides of two pHLA complexes contained in the crystal asymmetric unit.  The S 386-395 peptide is a 10mer peptide, longer than the classical 9mer that is highly characteristic of HLA class I molecules. While 9mer peptides fit perfectly in an extended conformation in the antigen-binding cleft of HLA class I, longer peptides have adopted conformations where some residues of the peptide are bulged out of the cleft [27]. The 10mer peptide S 386-395 is not bulged out of the HLA-A*02:01 antigen-binding cleft, but adopts a rather flat conformation, likely due to the fact that half of its residues are buried in the cleft. This peptide was predicted to bind HLA-A*02:01 [19], and was recognised by T cells from unexposed donors [22]. The molecular docking prediction from Can et al. showed that P1-Lys, P3-Asn, P4-Asp, and P6-Cys were predicted to be solvent exposed. Comparison with our crystal structure revealed that the solvent-exposed residues were P1-Lys, P4-Asp, P6-Cys, P8-Thr, and P9-Asn instead ( Figure 1D). Interestingly, the P6-Cys was solvent exposed and therefore available to form a disulfide bond. Indeed, we observed a disulfide bond between the peptides of two pHLA complexes contained in the crystal asymmetric unit.

HLA-A*11:01-S 370-378 Presents Aromatic Residues to T Cells
HLA molecules are classified into different superfamilies based on the binding properties on their peptide-binding groove. The HLA-A3 superfamily favours a small aliphatic amino acid on position 2 (P2) and a positively charged amino acid on the C-terminus of the peptide (PΩ), which are the primary anchors of the binding groove. There are three main members in the HLA-A3 superfamily [25]: HLA-A*03:01, HLA-A*11:01, and HLA-A*68:01. The S 370-378 peptide has been predicted to bind HLA-A*03:01 and HLA-A*11:01 [21], and can activate CD8 + T cells in an HLA-A*68:01 donor [23]. It has previously been reported that certain peptides can be presented by multiple HLA molecules [26], and that they can also be immunogenic [28]. Interestingly, immunogenicity of the S 370-378 peptide is still debatable, and might depend on HLA-restriction, as it has been predicted to be both immunogenic [21] and non-immunogenic [29]. Further immunogenicity studies need to be undertaken to determine if this peptide presented by HLA-A*03:01 can activate CD8 + T cells. However, this peptide has been described as non-immunogenic in a small cohort of healthy and COVID-19-recovered individuals expressing HLA-A*11:01 [30], whereas it is immunogenic when presented by HLA-A*68:01 [23]. It is therefore possible that S 370-378 is either not immunogenic in HLA-A*11:01 + donors alone, or is non-immunogenic in only some HLA-A*11:01 + donors, and more investigation will be required to determine which of these possibilities is the case.
To gain a further understanding of how the S 370-378 peptide might be seen by CD8 + T cells, we solved the structure of the HLA-A*11:01 molecule presenting this peptide ( Table 3). The structure of the HLA-A*11:01-S 370-378 complex was solved at high resolution (1.50 Å), and the electron density was clear for the peptide (Figure 2A,B).
Biophysica 2021, 1, FOR PEER REVIEW 6 amino acid on position 2 (P2) and a positively charged amino acid on the C-terminus of the peptide (PΩ), which are the primary anchors of the binding groove. There are three main members in the HLA-A3 superfamily [25]: HLA-A*03:01, HLA-A*11:01, and HLA-A*68:01. The S370-378 peptide has been predicted to bind HLA-A*03:01 and HLA-A*11:01 [21], and can activate CD8 + T cells in an HLA-A*68:01 donor [23]. It has previously been reported that certain peptides can be presented by multiple HLA molecules [26], and that they can also be immunogenic [28]. Interestingly, immunogenicity of the S370-378 peptide is still debatable, and might depend on HLA-restriction, as it has been predicted to be both immunogenic [21] and non-immunogenic [29]. Further immunogenicity studies need to be undertaken to determine if this peptide presented by HLA-A*03:01 can activate CD8 + T cells. However, this peptide has been described as non-immunogenic in a small cohort of healthy and COVID-19-recovered individuals expressing HLA-A*11:01 [30], whereas it is immunogenic when presented by HLA-A*68:01 [23]. It is therefore possible that S370-378 is either not immunogenic in HLA-A*11:01 + donors alone, or is non-immunogenic in only some HLA-A*11:01 + donors, and more investigation will be required to determine which of these possibilities is the case.
To gain a further understanding of how the S370-378 peptide might be seen by CD8 + T cells, we solved the structure of the HLA-A*11:01 molecule presenting this peptide ( Table  3). The structure of the HLA-A*11:01-S370-378 complex was solved at high resolution (1.50 Å), and the electron density was clear for the peptide (Figure 2A,B). The S370-378 peptide adopted a canonical extended conformation in the cleft of the HLA-A*11:01 molecule, with P2-Ser and P9-Lys acting as secondary anchor residues (Figure 2C) without additional secondary anchor residues. The backbone of the peptide's central part (P4-P8) is flat and solvent exposed in the cleft of HLA-A*11:01. The surface exposed to the solvent ( Figure 2D), and therefore available for potential T cell receptor contact, is hydrophobic with three small side chains (P4-Ser, P6-Ser, and P7-Thr) and two large aromatic residues (P5-Phe and P8-Phe). The S370-378 peptide presents a lot of exposed residues that could be contacted by TCRs. The S 370-378 peptide adopted a canonical extended conformation in the cleft of the HLA-A*11:01 molecule, with P2-Ser and P9-Lys acting as secondary anchor residues ( Figure 2C) without additional secondary anchor residues. The backbone of the peptide's central part (P4-P8) is flat and solvent exposed in the cleft of HLA-A*11:01. The surface exposed to the solvent ( Figure 2D), and therefore available for potential T cell receptor contact, is hydrophobic with three small side chains (P4-Ser, P6-Ser, and P7-Thr) and two large aromatic residues (P5-Phe and P8-Phe). The S 370-378 peptide presents a lot of exposed residues that could be contacted by TCRs.

The S 896-904 Peptide Adopted a Flat Conformation in the Cleft of HLA-B*35:01
The S 896-904 peptide was also predicted to bind several HLA molecules by Al-Khafaji et al. [20]. The strongest IC 50 predicted was for HLA-B*35:01, for which the primary anchor residues of the S 896-904 peptide would be favourable (P2-Pro and P9-Tyr, Table 1). In addition, the S 896-904 peptide has been described as immunogenic in both HLA-B*51:01 + and HLA-B*35:01 + COVID-19-recovered donors [23]. In line with these studies, the Tm value of the HLA-B*35:01-S 896-904 complex showed a stable complex (Table 1). We solved, at high resolution (1.44 Å), the structure of the S 896-904 peptide presented by the HLA-B*35:01 molecule (Table 3), with clear electron density showing a stable conformation of the peptide (Figure 3A,B).

The S896-904 Peptide Adopted a Flat Conformation in the Cleft of HLA-B*35:01
The S896-904 peptide was also predicted to bind several HLA molecules by Al-Khafaji et al. [20]. The strongest IC50 predicted was for HLA-B*35:01, for which the primary anchor residues of the S896-904 peptide would be favourable (P2-Pro and P9-Tyr, Table 1). In addition, the S896-904 peptide has been described as immunogenic in both HLA-B*51:01 + and HLA-B*35:01 + COVID-19-recovered donors [23]. In line with these studies, the Tm value of the HLA-B*35:01-S896-904 complex showed a stable complex (Table 1). We solved, at high resolution (1.44 Å), the structure of the S896-904 peptide presented by the HLA-B*35:01 molecule (Table 3), with clear electron density showing a stable conformation of the peptide ( Figure 3A,B). As predicted, the P2-Pro binds to the HLA B pocket and the P9-Tyr to the HLA F pocket, both acting as primary anchor residues, with the addition of the P3-Phe that acts as a secondary anchor ( Figure 3C). Despite the presence of residues with large side chains in the central region of the peptide-namely P5-Met, P6-Gln, and P7-Met-the central part of the peptide was relatively flat in the antigen-binding cleft ( Figure 3D). The two methionines at positions 5 and 7 of the peptide were half buried against the α2-helix of the HLA, while the P6-Gln buried its amide group between the peptide backbone and the α1-helix to form hydrogen bonds with the Asn70 and Thr73 of the HLA-B*35:01 molecule. Although the residues at positions 4 and 8 are solvent exposed, they are alanines and therefore only expose a methyl group, limiting its potential contact with TCRs.

Discussion
The immune response to SARS-CoV-2 infection is still an intense area of research and requires a better understanding of the differences in disease progression between individuals, as well as better identification of immunogenic antigens that can provide protective immunity. CD8 + T cells have a critical role in viral infection, and while their part in COVID-19 is not fully understood, they are able to recognise epitopes from SARS-CoV-2 and play a role in the overall immune response [3,4,8,10,[30][31][32][33][34][35][36]. HLA molecules are the As predicted, the P2-Pro binds to the HLA B pocket and the P9-Tyr to the HLA F pocket, both acting as primary anchor residues, with the addition of the P3-Phe that acts as a secondary anchor ( Figure 3C). Despite the presence of residues with large side chains in the central region of the peptide-namely P5-Met, P6-Gln, and P7-Met-the central part of the peptide was relatively flat in the antigen-binding cleft ( Figure 3D). The two methionines at positions 5 and 7 of the peptide were half buried against the α2-helix of the HLA, while the P6-Gln buried its amide group between the peptide backbone and the α1-helix to form hydrogen bonds with the Asn70 and Thr73 of the HLA-B*35:01 molecule. Although the residues at positions 4 and 8 are solvent exposed, they are alanines and therefore only expose a methyl group, limiting its potential contact with TCRs.

Discussion
The immune response to SARS-CoV-2 infection is still an intense area of research and requires a better understanding of the differences in disease progression between individuals, as well as better identification of immunogenic antigens that can provide protective immunity. CD8 + T cells have a critical role in viral infection, and while their part in COVID-19 is not fully understood, they are able to recognise epitopes from SARS-CoV-2 and play a role in the overall immune response [3,4,8,10,[30][31][32][33][34][35][36]. HLA molecules are the targets of CD8 + T cells as they present viral peptides to signal infection. As T cells recognize a peptide bound to an HLA molecule, it is important to understand which peptides from SARS-CoV-2 will be presented to T cells, as well as their HLA restriction. Since HLA molecules are extremely polymorphic, we hereby report the analysis of three frequently expressed HLA alleles within the population.
Here, we confirmed the restriction of three spike-derived peptides to their predicted HLA molecules by refolding each HLA with a peptide and assessing the overall stability of each pHLA complex formed. All three pHLA complexes were stable and had a thermal midpoint well over physiological body temperature, suggesting that these pHLAs can remain stable on the cell surface, and would have the potential to be contacted by TCRs. In addition, we solved the crystal structure of these three pHLA complexes, showing how each peptide is presented by its specific HLA molecule. This information is important as the spike protein is prone to mutations [24,[37][38][39]. The peptides under investigation in our study are not located within the region of spike that is mutated in the new variants such as the ones from the UK (B1.1.7), South Africa (B1.1.3), or Brazil (P1). However, as more mutations are likely to arise, it is important to understand which mutations could represent an escape from the immune system or from the currently available vaccines. For example, mutation of the residue located at the second or last position of the peptide could have a devastating impact on the ability of a peptide to bind a designated HLA molecule, which would lead to viral escape due to the lack of presentation. Residues that are solvent exposed could instead directly impact T cell recognition. Therefore, we could predict the impact a mutation might have on T cell recognition, and anticipate its effects on the immune response. In addition, the spike-derived peptides studied here are able to be presented by multiple HLA molecules, which in turn could be an advantage at a population level as some HLA could be able to bind certain variants while others could not. The structure of each peptide reveals which residue might be important for T cell recognition, which could in turn provide information about the mutations within the spike protein that might impact on T cell binding, HLA binding, and whether they are likely to escape T cell surveillance.
Altogether, our work provides insight into the spike protein-derived SARS-CoV-2 peptide presentation by HLA molecules, which could help provide a better understanding of the T cell response to the virus.