1. Introduction
The studies on transplantation in the 20th century led to the discovery of the antigens determining the compatibility of various tissues during transplantation [
1]. These antigens were found to be presented by special transmembrane protein complexes called major histocompatibility complexes (MHCs). In humans, the products of this gene family were first found on leukocytes. Hence, the genes were called human leukocyte antigen (HLA) genes [
2]. There are four groups of HLA genes (classes I, II, III, and IV), which are all located on chromosome 6. The products of these genes are proteins that differ in structure and function [
3]. The HLA I and HLA II genes are among the most polymorphic human genes, and as of October 2020, 28,786 different alleles have been described for them (
https://www.ebi.ac.uk/ipd/imgt/hla/stats.html). HLA I genes include the most common HLA-A, HLA-B, and HLA-C, and rare HLA-E, HLA-F, and HLA-G genes. HLA II genes incorporate HLA-DRA, HLA-DRB, HLA-DQA, HLA-DQB, HLA-DPA, HLA-DPB, HLA-DMA, HLA-DMB, HLA-DOA, and HLA-DOB [
4]. The products of expression of these genes are transmembrane glycoproteins that present peptide antigens on the cell surface. The exceptions are HLA-DMA, HLA-DMB, HLA-DOA, and HLA-DOB, which regulate the loading of peptides onto MHC II molecules [
5]. The main function of these molecules is to participate in the T cell-mediated immune response.
Genome-wide association studies (GWAS) have demonstrated a strong association between the presence of certain diseases and a specific HLA genotype [
6,
7,
8,
9,
10]. Moreover, in several cases, the cause is HLA single nucleotide polymorphism, which affects the binding of the peptide antigen to HLA and thereby alters the repertoire of antigens presented to T cells.
Table 1 lists selected studies linking HLA genotype and diseases. For example, in GWAS of 2000 Parkinson’s disease (PD) cases and 1986 healthy donors, a strong association was found between the risk of PD and the expression of the HLA-DRB5*01 and HLA-DRB1*15:01 alleles [
9]. These alleles exist in about one-third of PD patients. At least one epitope obtained during the degradation of α-synuclein, which forms insoluble fibrils in filamentous inclusions of Lewy bodies in PD, is specifically presented by these forms of HLA II [
11]. Similar studies made it possible to associate systemic sclerosis with the expression of HLA-DRB1*15∶02 and HLA-DRB1*16∶02 (585 cases and 458 controls) [
12] and psoriasis with HLA-C*06:02 (461 psoriatic patients and 454 healthy controls) [
13]. According to in silico analysis of the binding affinity of each possible fragment of the SARS-CoV-2 proteins with the expression products of 145 HLA-A, HLA-B, and HLA-C alleles, the protein product of the HLA-B*46:01 allele had the fewest predicted binding of SARS-CoV-2 peptides, which indicates a more severe course of coronavirus infection in carriers of this allele. On the contrary, the product of the HLA-B*15:03 allele is more capable of presenting highly conserved peptides of SARS-CoV-2, which improves the capabilities of T cell-based immunity [
14]. Interestingly, persistent expression of a particular HLA gene can simultaneously lower the risk of developing one disease and increase the risk of developing another. For example, the instability of HLA-C makes the body more susceptible to HIV, which is why the virus seeks to suppress the expression of this gene using the Vpu protein [
15,
16]. On the other hand, with increased expression of HLA-C, the occurrence of Crohn’s disease [
17] and psoriatic arthritis [
13,
18] becomes more likely. Multiple examples of this kind (
Table 1), as well as larger-scale meta-analyses [
19,
20,
21], indicate the importance of studying MHC and associated peptide antigens as a promising diagnostic tool to evaluate susceptibility to various diseases and for the development of personalized immunotherapy.
The development of mass spectrometry and peptidomic approaches to the isolation and identification of low-presented native peptides made it possible to directly determine the MHC ligands. As part of personalized cancer therapy development, mass spectrometry-based immunopeptidomics has gained the interest of biotechnological and pharmaceutical companies in the determination of peptide antigens for clinical application [
34]. The goal of cancer immunotherapy is to activate the patient’s immune system and recruit their T cells, especially the CD8+ T cells, to fight the tumor. Complexes of HLA I molecules with antigenic peptides are the key to activate T killer cells. There are a significant number of oncoimmunotherapy approaches: the utilization of checkpoint blockade [
35], chimeric antigen receptor (CAR) T-cell therapy [
36], T-cell receptor (TCR)-engineered cells [
37], T cell adoptive cell transfer (ACT) [
38], and oncolytic viruses (OV)-based immunotherapy [
39]. Identification of the tumor-specific immunopeptidome, as well as strategies for the isolation and genetic modification of T cells, are essential in the development of personalized cancer immunotherapy [
40,
41]. The diverse repertoire of HLA I presented on tumor cells is a good source of potential tumor antigens [
42]. In 2018, Hilf et al. published a trial of novel personalized therapeutic vaccines (APVAC1 and APVAC2) for glioblastoma as part of the glioma actively personalized vaccine consortium (GAPVAC) [
43]. The creation of these vaccines utilized published technology that includes the search for immunogenic neoantigens based on transcriptome and immunopeptidome analysis of the patient’s tumor tissue [
44]. The immunogenicity of the identified peptides was verified using CD8+ T cells isolated from the patient’s blood. This highly personalized form of immunotherapy was first implemented in a global project involving a large number of research studies from various scientific centers.
In this review, we give general information about the immunopeptide and HLA, and we talk about the main methods of immunopeptidome isolation: mild acid elution and immunoaffinity chromatography. The main part of the review is devoted to various stages of immunopeptidome isolation by immunoaffinity chromatography: the choice of biological material, various detergents for the isolation of membrane-bound MHC, selection of specific antibodies, solid supports and methods for antibody immobilization, various immunopeptidome post-fractionation and purification techniques, approaches to LC-MS/MS data identification of isolated MHC ligands, and methods to confirm immunogenicity of the MHC I ligands.
2. General Information on Immunopeptidome and HLA
A living cell is a complex dynamic system. It has to renew its components constantly for the correct functioning. Therefore, in addition to a high-precision apparatus for protein synthesis [
45], a cell requires systems to effectively remove incorrectly folded, obsolete, or unnecessary proteins. One of the main pathways for cytosolic protein degradation is the ubiquitin–proteasome system [
46]. Protein candidates for degradation are labeled with the polyubiquitin protein consisting of ubiquitin monomers linked into a chain. A special complex of enzymes comprising of ubiquitin-activating enzyme (E1), ubiquitin-conjugating enzyme (E2), and ubiquitin ligase (E3) carries out the process of protein ubiquitination [
46]. The proteasome, a protein machine of “creative destruction”, recognizes ubiquitinated proteins [
47,
48]. The proteasome contains a regulatory subunit 19S, which recognizes the substrate labeled with the polyubiquitin chain, and the proteasome nucleus 20S, cleaving the substrate. The proteasome nucleus consists of 14 different subunits, which are arranged in four folded rings with the α7β7β7α7 stoichiometry. The two outer α-rings contain seven linked identical α-subunits (α1-α7), while the inner β-rings consist of seven different β-subunits (β1–β7). Three β-subunits (β1(Y), β2(Z), and β5(MB1)) have proteolytic activities: peptidylglutamyl peptide-hydrolyzing, trypsin-like, and chymotrypsin-like, respectively [
49]. At the moment, we know several types of proteasomes that have nuclei with different proteolytic properties. In addition to the classical one described above, these are immunoproteasome [
50] and thymoproteasome [
51]. Proteasome-mediated protein degradation results in target protein proteolysis into relatively short peptide fragments. The amino acid chain length of these fragments is regulated by endoplasmic reticulum-associated aminopeptidases (ERAP1 and ERAP2), which shorten the obtained proteolytic peptides at the N-terminus down to the size required for loading into the newly synthesized MHC I molecules [
52].
Biosynthesis of the MHC class I molecule occurs in the endoplasmic reticulum (ER) of the cell and depends mainly on the availability of a peptide suitable for presentation (
Figure 1). The synthesized MHC I heavy chain initially binds to the chaperone-like calnexin and immunoglobulin binding protein (BiP). After the non-covalent association of β2 microglobulin (light chain) with the heavy chain, calreticulin displaces calnexin. Calreticulin escorts the empty MHC I heavy chain-β2m heterodimer to a special chaperone adapter tapasin conjugated with ER-resident disulfide isomerase oxidoreductase ERp57, which forms disulfide bonds in the heavy chain of MHC I [
53,
54]. The lectin-like domain of calreticulin interacts with the glycan of MHC I, while its other domain (
P-domain) provides for the interaction of the peptide-binding groove of MHC I with the ERp57 enzyme. MHC I heavy chain and β2 microglobulin, tapasin, the ERp57, lectin-like chaperone calreticulin together make up the peptide loading complex (PLC) [
55,
56]. Tapasin interacts with a heterodimeric peptide transporter TAP (transporter associated with antigen presentation), which delivers proteasome-cut peptides from the cytosol to the ER cavity with the consumption of ATP. Peptides transported by TAP are truncated by endoplasmic reticulum aminopeptidase associated with antigen processing (ERAAP) and loaded into an MHC I molecule, which is part of the PLC. After stabilization of the structure of the peptide–MHC I complex, the practically matured molecule, according to the classical mechanism of protein secretion, passes from the endoplasmic reticulum to the Golgi apparatus [
57] and then is exposed on the cell surface as part of the vesicle (
Figure 1).
Class I MHCs present peptides derived from proteins synthesized inside cells, including viral and cancer-specific proteins, on the surface of nucleated cells. The interaction of MHCs I with the T-cell receptors on CD8+ T lymphocytes mediates the detection of virus- and cancer-specific peptides and the activation of T-killer cells. Activated killer T cells can destroy antigen-presenting cells by perforins, which are similar in structure and function to the complement C9 protein [
58], and granzymes [
59]. Since the number of various antigens associated with danger for the homeostatic state of the body is almost infinite, the immune system must have a huge potential for distinguishing non-self. The recognition is regulated by the binding affinity of an MHC-associated antigen to a T-cell receptor. A broad repertoire of T-cell receptors [
60] and a large variety of antigens associated with MHC (due to the high polymorphism of MHC within the population) are the two main mechanisms for increasing the likelihood of the appearance of a necessary MHC allele and T-cell clone in at least some individuals within the population. Therefore, it enhances the ability to fight the pathogen by adaptive immunity [
61]. In the absence of MHC I on a cell surface, natural killer (NK) or NK T cells detect and kill such cells, allowing the immune system to detect the absence of the “self” marker. NK cells use special killer immunoglobulin-like receptors (KIR) to recognize MHC I molecules. The interaction of MHCs I with the T-cell receptors of immature T lymphocytes plays an important role in the positive selection of T lymphocytes in the thymus [
62].
In 1969, Mann et al. pioneered the isolation of MHC class I from mouse tissues [
63]. The first detailed study of the structure of human MHC I was carried out on the product of the HLA gene allele A2 [
64]. The MHC class I heavy chain consists of
N-terminal signal peptide, which are typical for secreted proteins, three extracellular domains called α1, α2, and α3, a transmembrane domain, and a cytoplasmic domain. The light chain of MHC I is not encoded in the HLA gene. It is a small 12 kDa protein called β2 microglobulin. The α1, α2, α3 domains, and β2 microglobulin are structural homologs. The α3 domain and β2 microglobulin have a similar β-sandwich secondary fold organized into two opposing antiparallel β-sheets. α1 and α2 domains are above the α3 domain and β2 microglobulin and form a special platform consisting of eight β strands (four strands in each domain) organized into a beta-sheet and two antiparallel alpha-helices forming an antigen peptide-binding groove, which is the site of antigen binding to the MHC I molecule (
Figure 2). The peptide-binding groove is around 25 Å long and 10 Å wide. The peptide-binding groove contains polymorphic amino acid residues, which allows binding to a wide range of antigens. Importantly, this peptide-binding groove of MHC I is closed at both ends. This limits the size of the presented peptide, usually of 8–12 amino acid residues, depending on the HLA I allele [
65]. Certain (anchor) amino acid residues of the peptide bind to pockets of the groove. Primary anchors are usually located in the second position and
C-terminus of the peptide, while the position of the secondary anchors is less restricted and depends on HLA I allele. For example, peptides with leucine in the second position and valine or leucine in the ninth position have a high affinity for HLA-A2. HLA-B7 typically binds peptides with proline and arginine in the second and third positions and alanine or leucine in the ninth position [
66]. However, a hydrophobic C-terminal region is present in all MHC I peptides [
65]. MHC I molecules can also present longer peptides (up to 25 aa) due to the protrusion of weak affinity regions of the peptide chain and preservation of the positions of anchor amino acid residues [
67].
3. Methods for Immunopeptidome Analysis
An important milestone in the studies of the immunopeptidome of various animal cells was a creation of the method for the isolation of MHC I ligands by mild acid elution (MAE) proposed by Sugawara et al. in 1987 [
68]. The essence of this easy-to-implement method is the short-term treatment of living cells with citrate buffer (pH 3.0). As a result of such treatment, the β2 microglobulin molecule non-covalently bound to the MHC I heavy chain dissociates, destabilizing the structure of the entire complex. This reduces the peptide-binding capacity of the HLA-A, HLA-B, and HLA-C complexes, i.e., it leads to the loss of peptides associated with the MHC class I molecules [
68]. The hypothesis was made that MHC class II molecules do not lose their antigens during MAE, which increases the specificity of the technique. The assumption was confirmed a little later [
69]. Importantly, working with cells by the MAE method leaves them viable with the ability to regenerate MHC I complexes with antigens, which facilitates the accumulation of a significant amount of MHC I ligands. At the time MAE was proposed, which allowed using no more than 100 million cells, it was indeed an extremely effective technique compared to other methods used for the isolation of the MHC I peptidome (trifluoroacetic acid extraction [
70] and immunoaffinity isolation using specific antibodies [
65]), requiring 1–10 billion cells. The growing interest in immunopeptidomics and a significant amount of accumulated experimental data have stimulated the emergence of several detailed reviews and comparative works related to the MAE method [
71,
72,
73,
74,
75]. Undoubtedly, the simplicity and efficiency of MAE [
68], including a small number of purification steps, the absence of detergents [
72], the possibility of multiple processing of living cells [
76], and the reduction of losses in the case of working with low-affinity peptides [
72] made the MAE method one of the main tools of immunopeptidomics. On the other hand, the need to work with living cells is one of the most significant weaknesses of the MAE method, which is highlighted by many researchers. In addition, elution should take place in a cell suspension; that is, cells should circulate freely in solution [
73]. Hence, it is not possible to use MAE on tissues and cell lines requiring special conditions for growth. Even more problematic is the simultaneous elution of peptides present in large amounts on the cell surface and not related to the MHC I ligandome. According to Fortier et al., only about 40% of all peptides isolated by the MAE method are associated with MHC class I, while the rest are contaminants [
68,
72,
77].
Immunoaffinity chromatography is a method for the isolation and purification of a target substance from a multicomponent mixture based on a specific non-covalent interaction of an antibody immobilized on a solid support and an antigenic epitope of the target substance [
78]. Unlike MAE, immunoaffinity chromatography finds applications in various fields of biomedicine, including clinical diagnostics, detection of substances hazardous to the environment, and pharmacological research [
79]. The basic principle of immunity chromatography is still the same, despite the constant improvement of methodology [
74,
80,
81,
82]. A multicomponent mixture featuring a cell line lysate, homogenized tissue, or biological fluid sample is incubated with MHC-specific antibodies pre-immobilized on magnetic particles or agarose-based polymeric resins as solid support (
Figure 3) [
79]. The murine monoclonal antibody, clone W6/32, which specifically binds to the α2–α3 heavy chain region of the products of all classical genes HLA-A, HLA-B, and HLA-C is commonly used [
74,
82]. After purification from non-specifically bound substances, MHC molecules together with associated peptides are eluted. Currently, the method of immunoaffinity purification is the most commonly used for isolating an immunopeptidome. There are reasons for this: (1) most of the peptides isolated by this method can be true ligands of MHC; several studies bioinformatically confirm the high affinity for MHC in about 90% of identifications [
83,
84,
85], and (2) this method is less demanding on the biomaterial; it is possible to use both cell lines and tissues, biological fluids, including frozen samples.
Noteworthy, the labor and time costs of this method are higher compared to MAE. Immunoaffinity chromatography for the isolation of MHC requires a significant amount of specific antibodies; therefore, there is a need to maintain an in-house hybridoma producing the required antibodies [
86,
87]. On average, about 1 mg of antibodies per sample is required [
88]. It is not surprising that, to our knowledge, the largest published work to date is devoted to the study of the immunopeptidome of only 10 biological samples of postoperative material and 142 samples of blood plasma [
89]. Using isotopically labeled peptides, Hassan and co-authors found that losses during immunoprecipitation of the MHC ligandome reached 90–99% [
90]. Due to the large number of washes required to get rid of non-specific peptides, there is a high risk of losing low-affinity MHC ligands [
71]. In addition, it is still not precisely established how universal the antibodies are—that is, whether there are such MHC variants that bind antibodies with low affinity and, as a result, some of the MHC-ligand complexes are lost [
91]. Taking into account all sources of loss, it is not surprising that the number of cells required for successful LC-MS/MS identification of the MHC ligandome varies from 100 million to 10 billion [
92]. However, attempts are being made to improve methods of immunoaffinity purification [
93]. Chong et al. propose to accelerate and automate the protocol by carrying out immunoprecipitation in 96-well plates. The researchers isolated 42,556 unique MHC class I associated peptides belonging to 8975 precursor proteins, using 21 wells containing 100 million cells each [
93]. Out of 10 million cells, they managed to identify only 1846 peptides, but these 1846 peptides are almost the same as the most represented peptides isolated from 100 million cells. Lanoix and co-authors published a comparison of the quality of the B-cell lymphoblast immunopeptidome isolation by MAE and immunoprecipitation [
73]. As a result of the isolation of immunopeptidome from 2, 20, and 100 million cells, the authors managed to identify 2016, 3931, and 5093 unique peptides by immunoaffinity chromatography and 314, 2081, and 2996 unique peptides by MAE with MS detection. Thus, more peptides associated with HLA I were obtained by immunoaffinity purification. However, the difference in the total amount of isolated peptides with an increase in the initial number of cells aligns between the two methods.
It is the isolation of the immunopeptidome that some authors aptly call an Achilles’ heel, hinting at an inhibitory effect on the development of the research area as a whole [
88]. Indeed, back in 1992, Hunt et al. showed that the majority of peptides presented via MHC I varies from 100 to 1000 copies per cell, and only a few are present in 1000 to 3000 molecules per cell [
80]. In some cases, the representation of a single peptide can reach 10,000 copies per cell [
94]. Moreover, according to the data of Schuster et al., the average number of HLA I molecules per cell varies from 5000 to 150,000 [
95], and according to Lanoix et al., the total number of MHC I per cell can reach 0.5–3 million [
73], which theoretically allows the cell to present 10,000–30,000 different peptides. If we take into account that losses during immunoprecipitation of the MHC I ligandome can reach 90–99% [
90], we can isolate 1 to 300 million molecules of each peptide from 1 million cells, which approximately corresponds to amounts from 2 amol to 0.5 fmol. As the limiting sensitivity of LC-MS/MS, one can take the result obtained by Matthias Mann’s group in 2010 on Orbitrap Exactive [
96]. Using the Universal Proteomics Standard (UPS1), they identified 348 different peptides, in triplicate, from 45 of 48 UPS1 proteins using the 140 fmol of corresponding tryptic peptides. Although the identification was performed against a database of all human proteins, the sensitivity would be lower under conditions of a high dynamic range of real biological samples. If we take 500 fmol of a peptide as a sufficient amount, then for successful identification of the peptide in the immunopeptidome, at least 1 billion cells should be taken, which is roughly consistent with the scale of current works on immunoprecipitation [
88,
90,
92,
95].
The study on the regulation of the presentation of the HLA I peptide repertoire is an important task [
50,
97,
98,
99]. The detection of factors capable of increasing the amount of MHC presented by a cell can reduce the required volume of biological material and/or increase the number of different detectable MHC ligands. Javitt and co-authors show that pro-inflammatory cytokines tumor necrosis factor alpha (TNFα) and interferon gamma (IFNγ) increase the number of identifiable HLA I ligands in the lung epithelial cell line A549 from 3444 unique peptides without cytokine treatment to 6582 unique peptides after the treatment [
99]. About 500 million cells were used in a single experiment. The authors showed that the pro-inflammatory molecules TNFα and INFγ increased the diversity of immunopeptidome, which was due to the functioning of a special immunoproteasome synthesized in cells under the effect of these cytokines [
49].
Another method for isolation of HLA I molecules and their ligandome is the transfection of a cell line with an expression vector encoding a soluble secreted form of MHC I, without a transmembrane domain, and the further immunoprecipitation of secreted MHCs with peptides attached. The MHC I delivery methods include DNA transfection [
100,
101], transduction with retroviruses [
102], and mRNA transfection [
103]. At the same time, this method allows culturing cells for long periods, similar to MAE, which facilitates the accumulation of a significant amount of MHC ligands and gives the most specific result due to the immunoprecipitation. However, various genetic engineering procedures can cause an appreciable rearrangement of the protein composition of the cell, together with the MHC ligandome. In addition, similar to MAE, this method does not work with tissues due to the complexity of the use of genetic engineering techniques [
74].