To eliminate pathogens without damaging healthy cells, the immune system must discriminate between self and foreign (nonself). The innate arm of the immune system does so to some extent using a limited number of germline-encoded receptors that recognize pathogen-associated molecular patterns. By contrast, the adaptive arm of the immune system, which is found in all jawed vertebrates and is mediated by T and B lymphocytes, uses a vastly diverse repertoire of receptors to generate specific protective responses against any pathogen it encounters [1
]. For example, humans have a repertoire of at least 107
different T cells [3
], each expressing one or two of the >1015
unique receptor sequences that can arise from the stochastic recombination of V(D)J gene segments and addition of non-templated nucleotides [4
]. These T cell receptors (TCRs) recognize short foreign peptides presented on major histocompatibility complex (MHC) molecules on the surface of infected or cancerous cells.
The random TCR generation process is required to achieve this diversity, but it inevitably also produces TCRs that recognize self peptides presented by healthy cells. It was long thought that these self-reactive receptors are effectively eliminated during T cell development in the thymus through a process termed negative selection [6
]. However, current estimates of how many self peptides each T cell encounters in the thymus range from
], at least one order of magnitude lower than the total number of possible self peptides. Indeed, recent studies have found that self-reactive T cells are abundant in the periphery after all, especially in humans [10
This confirmation that negative selection is far from complete has important implications for the relationship between self tolerance
and self-foreign discrimination
). When negative selection is “complete” and removes all self-reactive T cells, self-foreign discrimination is simply a consequence of achieving tolerance (Figure 1
, case 1). There is one exception to this rule [10
]: when the selection process removes so many T cells that “holes” arise in the repertoire, some pathogens are no longer detected either and we cannot speak of discrimination anymore—even if there is tolerance (Figure 1
, case 2). Incomplete negative selection means that the relationship between tolerance and discrimination becomes less straightforward: selection on a subset of self peptides will likely achieve only low tolerance in itself, but the resulting discrimination can range from very low to very high values (Figure 1
, cases 3 and 4). Which of these scenarios applies to our immune system then depends on the question: can negative selection give our T cell repertoire the ability to differentiate between foreign peptides and self peptides they haven’t seen
in the thymus?
Many learning systems tasked with inferring a concept
can do so based on a set of examples
. For example, children infer the concept of English grammar from example sentences they hear and can then construct other sentences they have not heard before. This effect is called generalization
], and it does not require the set of examples to cover the complete concept. Here, we hypothesize that a similar generalization effect might occur as a result of T cell negative selection. If this were the case, it could compensate for the incomplete set of self peptides in the thymus. Negatively selected T cell repertoires could then respond differently to self peptides not encountered in the thymus than to foreign peptides, even when selection has little impact on tolerance (Figure 1
, case 3). In summary, we ask: can the T cell repertoire “learn by example” during negative selection?
We approach this central question in two steps. First, we ask: can the process of negative selection cause learning by example in principle
, and if so, under which conditions can this occur? To answer this question, we investigate how a computer algorithm based on a negative selection procedure [16
] solves a basic, well-interpretable classification problem outside of immunology: distinguishing English from other languages based on short strings
(letter sequences) of text. This problem mimics the task of self-foreign discrimination because, in both cases, classes (languages or proteomes) are to be distinguished based on a limited amount of information (short strings or peptides) from only the “self” class. In addition to this analogy, the language classification problem has several useful properties: (1) it is intuitive to understand, (2) it can take on a range of difficulties depending on the languages to be compared [17
]; and, (3) since we already know this problem can be solved through generalization by other algorithms [17
], it is well-suited for a proof of concept that negative selection can do the same. Using a computational model of negative selection on strings from different languages, we will show that negative selection can indeed allow language discrimination as long as certain conditions are met.
Second, based on the insights gained in this first part, we ask: are these conditions fulfilled when we consider self-foreign discrimination by T cells? By modifying our model such that it recognizes real peptide sequences from the human proteome and various pathogens, we show that the task faced by our immune system is relatively difficult because self and foreign peptides can be very similar to each other. However, we also show that this difficulty can be overcome if the peptides used for negative selection are chosen in a “smart” way that reduces redundancy.
In our AIS model, we found that negative selection on an incomplete set of self peptides can bias a T cell repertoire towards foreign recognition. This provides a proof of the principle that, under the right circumstances, negative selection can behave like a learning algorithm: it can let T cell repertoires “learn by example” through generalization. We show that this learning function hinges on two conditions: (1) an appropriate level of cross-reactivity, and (2) sufficient dissimilarity between self and foreign peptides. The basic idea that the immune system acts like a learning system has been pursued within the AIS field for decades [20
], but, to our knowledge, our model is the first that investigates such learning using the actual “data” seen by the real immune system: the peptides presented on MHC complexes.
Our results highlight a novel role for T cell cross-reactivity. While it has long been recognized that T cells must be cross-reactive to provide sufficient coverage for the vast number of pathogenic peptides they might encounter [38
], our results suggest a second advantage of cross-reactive repertoires: they allow for generalization
. On the other hand, cross-reactivity should not be too high either: if T cells cannot sufficiently discriminate between peptides, the negatively selected repertoire will overgeneralize
because (nearly) all T cells will recognize both self and foreign peptides.
This risk of overgeneralization is especially high when self and foreign are highly similar [13
]. We demonstrate that a non-random subset of self peptides enriched for rare AAs can mitigate this danger by balancing the removal of self-reactive TCRs with the preservation of foreign-reactive receptors. This strategy works even when self and foreign peptides are not inherently different. In fact, for the pathogens we considered, the similarity to self was so high that it is hard to conceive how negative selection on random peptides could achieve any discrimination between foreign and unseen self peptides. By contrast, a “smart” peptide presentation strategy could still ensure that the peptides best recognized by the immune system are predominantly foreign—even in this difficult scenario. This notion would reconcile textbook negative selection theory with recent observations that T cells see only a fraction of all self peptides during thymic selection, and that even healthy individuals have many self-reactive T cells [10
Although we demonstrate here how negative selection can skew a developing repertoire away from recognition of self, our results also strongly suggest that “central tolerance” by itself cannot achieve reliable self-foreign discrimination. This is in line with the consensus that peripheral tolerance mechanisms are crucial to prevent and dampen immune responses by those self-reactive cells surviving negative selection. Nevertheless—under the right conditions—negative selection can at least provide a basis
for such other mechanisms to build on. The idea of a “leaky” central tolerance strengthened by peripheral mechanisms is not new [10
], and is supported for example by studies showing that more nuanced discrimination becomes possible when T cells make decisions cooperatively [40
]. However, our results clearly show that it is difficult for negative selection to provide even a starting point because it must somehow overcome the fundamental problem of similarity between self- and foreign peptides.
Our finding that non-random peptide presentation improves self-foreign discrimination raises the question how the thymus might obtain a preference for presenting low-exchangeability peptides. Although it remains unclear exactly which and how many peptides a T cell sees during selection, the importance of the thymic peptidome in shaping the TCR repertoire is evident from the existence of specialized antigen presenting cells, transcription factors such as AIRE, and even special proteasomes controlling thymic peptide presentation [42
]. We suggest that the biased presentation of low-exchangeability peptides required for self-foreign discrimination might arise from special binding preferences of thymic antigen presentation proteins. As has already been shown for the thymoproteasome during thymic positive selection [43
], such binding preferences can enrich for specific subsets of self peptides and thereby impact the ability of a TCR repertoire to recognize self and foreign. While a bias for specific AAs such as described in this paper would be one way to enrich for low-exchangeability peptides, we do not exclude that other binding preferences could have a similar impact on self-foreign discrimination.
How could our theory be tested? A first step would be to characterize the peptides present in the thymus during negative selection and to compare these to a hypothetical “random” sample from the proteome. Adamopoulou et al. [45
] used peptide elution from dendritic cells in the thymus to identify 842 peptides presented by these cells. It is, however, likely that this dataset is enriched for highly abundant peptides and severely undersamples peptides presented on thymic epithelial cells. These epithelial cells are thought to be the major driver of negative selection, but made up only a small percentage of the cells that were analyzed. More recently, Schuster et al. [46
] compiled a nice dataset consisting of MHC class I bound pepdides across different organs. While this dataset is also expected to contain only few peptides from epithelial cells, it could perhaps be used for an initial check whether amino acid distributions of presented peptides differ between the thymus and other organs. However, a key issue with datasets based on mass spectrometry is that this technique itself is biased in the peptides it detects. As such, it currently remains difficult to compare the distribution of eluted peptides to a theoretically predicted reference distribution, which our test would require.
While the discovery of non-random peptide presentation in the thymus would be a first step towards validating our theory, this would still only be indirect evidence based on observational data. A direct proof of our theory would require experimental manipulation of the peptides presented in the thymus. Indeed, the best possible test would perhaps be to choose two different peptide sets with differing amounts of redundancy, and test whether—as predicted by our model—the peptide set with lower redundancy leads to better discrimination of unseen self peptides from foreign peptides. This theoretically ideal test is not yet feasible with currently available experimental techniques. Mice models with only one single peptide present in the thymus have been available for some time [47
], and we hope that further development of such experimental models will allow a manipulation-based test of our theory in the future.
At presence, however, the absence of a direct experimental test of our theory remains a major limitation of our work. The exact composition of an “optimal” peptide subset depends on the rules dictating which peptides are recognized by specific T cell receptor sequences, which are still being discovered [29
], and more knowledge in this area would be required for a firmly testable prediction. However, even though our simple model cannot predict exactly what
the optimal set of training peptides would be, the finding that
T cell repertoires can generalize—and that this depends quite strongly on how training peptides are chosen—is independent of the exact model used.
If thymic selection indeed helps self-foreign discrimination by also reducing the recognition of peptides the T cell repertoire has not seen during selection, then this would establish an interesting connection to “slow learning” systems as described in psychology and neuroscience [14
]. This would show that generalization and “learning by example” in biological systems do not necessarily need to involve neural networks.