The Class I Major Histocompatibility Complex (MHC) is a protein that plays a central role in our adaptive immune system [1
]. MHCs bind to intracellular peptides, about 8–11 amino acids in length, and the combined peptide-MHC (pMHC) complex is transported to the cell surface. Surveilling T-cells then inspect the pMHCs to determine whether a given cell is diseased or healthy. Diseased cells will tend to display a set of peptides that are different from the types of peptides that are presented by healthy cells, and an immune response is triggered if a T-cell is able to recognize such a differing peptide. Studying pMHCs has potential applications for immunotherapy, which leverages this mechanism to deliver precise treatments against certain diseases, such as cancer [2
One direction in studying pMHCs is binding prediction, since not every peptide binds to a given MHC. There are thousands of different MHC allotypes found in the human population, each with its own preference for the kinds of peptides that will bind, which is in turn determined by the MHC sequence. Experimental methods alone cannot cover the sheer number of combinations of pMHCs possible (i.e., all possible peptides presented by all MHC allotypes in the population), making computational methods an attractive complementary approach. The leading approaches for computational binding prediction are based on using sequence as the basis for prediction, typically through the use of neural networks, and are trained using a dataset of experimentally-determined binding affinities [3
]. While sequence-based methods allow for rapid prediction of pMHC binding, their performance for allotypes not included in training sets is difficult to quantify [6
An alternative computational approach is based on analyzing the structure of pMHCs. Structure-based methods have the potential to be more interpretable and general across pMHCs, since binding predictions are based on the existence of structural features, such as atomic interactions. The structural analysis can be based on structures derived from X-ray crystallography experiments for example, and there are about 600 pMHC crystal structures available in the Protein Data Bank (PDB) at the time of this writing. However, given that the number of pMHC crystal structures only covers a small fraction of pMHC combinations, computational methods have been developed to model conformations of peptides bound to MHCs [7
Toward this end, molecular docking tools can be used to generate bound pMHC structures [8
]. Molecular docking aims to predict the most likely conformation a given ligand (e.g., peptide) will take in the binding site of a receptor (e.g., MHC). The main challenge that molecular docking methods face with pMHCs is handling the high-dimensional conformational space of peptides in a computationally efficient manner. Please note that the receptor conformation must also be considered, as the sidechains in the binding site of the MHC can rearrange depending on the peptide conformation. Popular molecular docking software, such as AutoDock Vina [9
], do search in conformational space with genetic algorithms, and candidate conformations (peptide plus MHC) are evaluated with the help of a scoring function [10
]. The output is a single conformation or a select few conformations that are considered high quality by the scoring function. The accuracy of such methods is then assessed by comparing the returned conformations with those found in the reference crystal structure, typically with a metric known as root mean square deviation (RMSD) that computes a distance between two conformations. In the context of pMHCs, general molecular docking methods [11
] as well as methods built specifically for pMHCs have been applied to model bound pMHC conformations. Examples of docking methods built specifically for pMHCs include using a priori
knowledge of bound pMHC conformations to limit the conformational search [12
] or incorporating a pMHC-specific scoring function [15
]. For a more comprehensive discussion of molecular docking for pMHCs or more generally how structure-based methods have been applied to pMHCs, we refer the interested reader to a recently published review [7
However, a largely ignored component in the structural analyses is that biomolecules such as pMHCs are not static in solution. The pMHC system may adopt multiple conformations, and thus subsequent analyses involving only a single conformation per pMHC could lead to misleading conclusions. In [16
], the authors used a technique known as ensemble refinement to generate alternative conformations of pMHCs that are still consistent with the X-ray crystallography experiment. They found that when structural analyses are instead done with conformations produced from ensemble refinement, alternative conclusions can be formed due to the existence of different interactions between peptide and MHC.
Therefore, in this work, we are interested in developing a method that can generate an ensemble of conformations, as opposed to simply producing the most probable one as done with docking-based methods. Structural analysis of pMHCs can then be done on the ensemble, which takes into account the previously neglected flexibility of the peptide within the MHC binding site. Having access to such an ensemble could allow one to explore alternative bound conformations, which the pMHC may adopt naturally in solution or in response to interacting T-cells. Currently there is a lack of computationally efficient methods that can produce such an ensemble of plausible (clash-free) pMHC conformations. A naive way of generating an ensemble would be to rerun docking tools to generate multiple bound pMHC conformations. However, molecular docking methods simply were not built to perform this task since they are relatively slow to rerun often given the size and flexibility of peptide ligands, and do not aim to produce diverse bound conformations. Additional work would need to be done with molecular docking tools to keep track of what conformations have already been produced at a particular point. Another method that could be used is molecular dynamics, which simulate the interactions between atoms through time [17
]. However, besides the fact that this method requires a bound pMHC conformation to begin with, molecular dynamics is computationally demanding in that it requires massive amounts of computational resources to explore physiologically relevant timescales [20
To develop a method that is both computationally efficient and can produce diverse bound pMHC conformations, we gained insight from two previously noted observations. The first takes advantage of the fact that the ends of the peptide are known to be anchored at particular pockets within the MHC binding site. Therefore, if the ends of the peptide are more or less in fixed positions, the majority of the conformational search can focus on finding conformations for the middle of the peptide. This insight turns the problem into a loop modelling problem, for which there are methods already developed [21
], and indeed this insight has also been used by other methods for modelling pMHCs [12
]. A method that focuses on only the middle portions of the peptide makes it more efficient as it limits the conformational search. However, loop modelling software typically works by fixing the surrounding conformation, meaning that the peptide conformations are generated with a fixed receptor conformation. Thus, the peptide conformations that are sampled by loop modelling are biased by the receptor conformation. The second observation allows our method to overcome this bias. In [25
], DOCKTOPE overcomes docking with a rigid receptor conformation by alternating docking with energy minimization. Since loop modelling is done with respect to a given fixed conformation, our method can similarly alternate loop modelling with energy minimization. Multiple rounds of loop modelling followed by energy minimization then ensures a more diverse sampling of peptide conformations, since a different receptor conformation can be used in each round.
These insights allowed us to develop APE-Gen (Anchored Peptide-MHC Ensemble Generator), a fast method for generating bound pMHC conformations. APE-Gen generates an ensemble of bound conformations by iterated rounds of loop modelling followed by energy minimization, and only requires the sequence of the peptide and MHC as input. A single round consists of i) anchoring the ends of a given peptide near known pockets in the binding site of the MHC, ii) sampling peptide backbone conformations with loop modelling, and then iii) performing energy minimization to fix steric clashes. The energy minimization is done with a scoring function typically used for docking [26
] that models electrostatic, hydrogen bonding, solvation, and hydrophobic effects. The energy minimization is done over the peptide conformation as well as the receptor sidechains in the binding site. At the end of a round, the sampled conformations are pooled together with the conformations sampled from previous rounds, and the conformation with the lowest energy is used as input to the next round. The combination of loop modelling followed by energy minimization allows APE-Gen to generate a diverse ensemble of bound pMHC conformations that can be used for further structural analysis. APE-Gen is fast and naturally takes into account receptor flexibility through the energy minimization. We validate APE-Gen by assessing its ability to sample the conformation found in the corresponding crystal structures, even when only sequence information is used as input. We also discuss a few application scenarios that showcase the scalability and flexibility of APE-Gen. APE-Gen is open-source and freely available at https://github.com/KavrakiLab/APE-Gen
4. Discussion and Conclusions
APE-Gen stands out as a method that can quickly generate bound conformations of pMHCs given only sequence information. As alluded to in the Results section, APE-Gen has a wide array of potential applications. As a tool, APE-Gen can rapidly sample native-like conformations. The scalability of APE-Gen allows the modelling of thousands of different pMHCs within a reasonable timeframe. Additionally, non-canonical longer peptides (up to 15-mers) can also be modelled by APE-Gen, which is an extremely difficult task for traditional molecular docking approaches due to the additional degrees of freedom. The modelled ensemble of bound conformations can then be used as datasets for further structural analyses.
First, we evaluated APE-Gen on its ability to sample a conformation that is similar to a reference crystal structure. Current general-purpose molecular docking tools, such as AutoDock Vina [9
], simply cannot reach the scalability of APE-Gen without invoking some domain-specific knowledge, and so the same task cannot be performed with it. As shown in the Results, APE-Gen can sample a conformation that is similar to the one found in the crystal structure across all the pMHC crystal structures available in the PDB. As a point for comparison, the performance of APE-Gen is comparable to GradDock, a docking tool developed specifically for pMHCs that also features a pMHC-specific scoring function [15
], in its ability to generate conformations that have low full-atom RMSDs to a reference crystal structure. While our results show that the produced ensembles include conformations similar to those found in crystal structures, selecting such conformations out of this ensemble is a non-trivial task. For instance, the highest quality conformations in the ensembles produced by APE-Gen (using the SMINA scoring function) are not necessarily the same conformations that are nearest to the ones found in the corresponding crystal structure. This is not a problem specific to APE-Gen, and the top scoring conformations derived from other methods may also be dissimilar to the reference crystal conformation [41
]. Future work could investigate the use of other scoring functions, particularly pMHC-specific functions that may be able to better align the quality of a conformation with the crystal-like ones.
However, as mentioned in the Introduction, the structural analyses of pMHCs using a single conformation can be misleading. Interactions between peptide and MHC may be missed when only considering a single conformation and entropic effects are ignored in general. APE-Gen is a step toward the structural analyses of pMHCs in an ensemble fashion. However, the generated ensemble is by no means an optimal one. In fact, the conformations contained within can be viewed as rather coarse, given the nature of the sampling process and the use of the SMINA scoring function. As a result, it is difficult to assign a “weight” to each conformation.
Nevertheless, APE-Gen now provides a rapid way to generate an ensemble of plausible pMHC conformations and enables new kinds of analysis. For instance, in Section 2.3
, we investigated the use of the scoring function as a possible way to classify binders from non-binders, where the predicted binding affinity of a particular pMHC is taken as the best score of the highest quality conformation from the ensemble. While our results show that APE-Gen was unable to reliably do binding prediction, the ensembles produced by APE-Gen may lead to future work in improving structure-based binding prediction methods. The models produced by APE-Gen can be used as training sets for future scoring functions that aim to classify binders from non-binders or even predict binding affinities. Future work could also focus on how the ensemble as a whole could be used in predicting binding affinities, as the use of the ensemble could be a way to include previously neglected effects of peptide flexibility.
We have also shown a use case of APE-Gen as a conformational search tool. In the context of cross-reactivity, we have shown how APE-Gen can be used in principle to search for conformations that produce similar “looking” interfaces to a reference pMHC. Other analyses could be possible as one could simply filter through the conformations generated by APE-Gen to fit within some pipeline or run the APE-Gen method until some desirable conformation has been found. The search aspect of APE-Gen could be improved in future work by either making sampling more directed or incorporating some notion of memory to prevent re-sampling similar conformations.
Finally, another exciting application of APE-Gen is the ability to initialize molecular dynamics simulations from multiple diverse starting conformations as it only requires sequence information to produce models. The results of a given simulation may be heavily biased by the starting conformation, and it is becoming more apparent that molecular dynamics simulations should be instead run in an ensemble fashion [42
]. A new class of methods known as adaptive sampling are gaining popularity, where many short parallel simulations are iteratively restarted in a principled way to achieve some goal [20
]. APE-Gen has the potential to be an ideal companion for adaptive sampling methods that will enable the study of any pMHC system with molecular dynamics.