Immunopeptidomic Analysis of BoLA-I and BoLA-DR Presented Peptides from Theileria parva Infected Cells

The apicomplexan parasite Theileria parva is the causative agent of East Coast fever, usually a fatal disease for cattle, which is prevalent in large areas of eastern, central, and southern Africa. Protective immunity against T. parva is mediated by CD8+ T cells, with CD4+ T-cells thought to be important in facilitating the full maturation and development of the CD8+ T-cell response. T. parva has a large proteome, with >4000 protein-coding genes, making T-cell antigen identification using conventional screening approaches laborious and expensive. To date, only a limited number of T-cell antigens have been described. Novel approaches for identifying candidate antigens for T. parva are required to replace and/or complement those currently employed. In this study, we report on the use of immunopeptidomics to study the repertoire of T. parva peptides presented by both BoLA-I and BoLA-DR molecules on infected cells. The study reports on peptides identified from the analysis of 13 BoLA-I and 6 BoLA-DR datasets covering a range of different BoLA genotypes. This represents the most comprehensive immunopeptidomic dataset available for any eukaryotic pathogen to date. Examination of the immunopeptidome data suggested the presence of a large number of coprecipitated and non-MHC-binding peptides. As part of the work, a pipeline to curate the datasets to remove these peptides was developed and used to generate a final list of 74 BoLA-I and 15 BoLA-DR-presented peptides. Together, the data demonstrated the utility of immunopeptidomics as a method to identify novel T-cell antigens for T. parva and the importance of careful curation and the application of high-quality immunoinformatics to parse the data generated.


Introduction
A major challenge to the development of novel vaccines for complex intracellular pathogens is the identification of relevant T-cell antigens. One example of this is Theileria parva, the causative agent of East Coast fever (ECF), a highly pathogenic disease for cattle that is prevalent in large areas of eastern, central, and southern Africa. ECF is estimated to kill~1 million cattle a year and inflict an annual economic cost of up to 600 million USD [1]; as a major proportion of this burden is borne by smallholder farmers, ECF poses a major threat to the livelihoods and food security of some of the poorest communities in the world. T. parva is transmitted by the brown-eared tick (Rhipicephalus appendiculatus), which deposits sporozoites into the skin of the cattle host while taking a blood meal. The sporozoites rapidly invade host lymphocytes and, once within cells, transition to a schizont form.
Cells were harvested whilst in a log growth phase, and trypan blue staining was used to verify that >95% of the cells were viable at the point of harvest. Cells were washed twice with ice-cold PBS and then lysed in buffer (1% IGEPAL, 15 mM TRIS pH 8.0, 300 mM NaCl, and a complete protease inhibitor (Roche, Welwyn Garden City, UK)) at a density of 2 × 10 8 cells/mL for 1 min, diluted with PBS 1:1 and solubilized for 45 min at 4 • C. Lysates were cleared with two-step centrifugation at 500× g for 15 min, followed by 15,000× g for 45 min at 4 • C. pBoLA-I complexes were captured directly from the lysate using a pan-specific anti-BoLA-I antibody (ILA-88) covalently conjugated to protein A sepharose immunoresin (Amintra, Expedeon, Cambridge, UK) at a concentration of 5 mg/mL. BoLA-DR complexes were captured from the lysate, following a preliminary removal of pBoLA-I complexes (using BoLA-I capture as described above), using a panspecific anti-BoLA-DR antibody (ILA-21) conjugated to protein A sepharose immunoresin at a concentration of 5 mg/mL. Captured pBoLA-I and pBoLA-DR complexes were washed sequentially using buffers of 50 mM Tris buffer, pH 8.0 containing 150 mM NaCl, then 400 mM NaCl, and finally 0 mM NaCl, prior to elution of the BoLA-bound peptides, BoLA protein chains, and β2M in 10% acetic acid and stored as described previously [30].

High Performance Liquid Chromatography (HPLC) Fractionation
Affinity column-eluted material was resuspended in 120 µL loading buffer (0.1% formic acid, 1% acetonitrile in water) and loaded onto a 4.6 × 50 mm ProSwiftTM RP-1S column (Thermo Scientific, Waltham, MA, USA) for reverse-phase chromatography on an Ultimate 3000 HPLC system (Thermo Scientific). Elution was performed using a 0.5 mL/min flow rate for over 5 min on a gradient of 2-35% buffer B (0.1% formic acid in acetonitrile) in buffer A (0.1% formic acid). Eluted fractions were collected from 1 to 8.5 min, for 30 s each. Protein detection was performed at 280 nm. Even and odd eluted fractions were pooled together, vacuum, dried, and stored at −80 • C until use.

LC-MS 2 Analysis
Samples were suspended in a 20 µL loading buffer and analysed on an Ultimate 3000 nano UPLC system online coupled to either an Orbitrap Fusion Tm Tribrid Tm Mass Spectrometer or a Q Exactive™ HF-X Hybrid Quadrupole-Orbitrap™ Mass Spectrometer (Thermo Scientific). Peptides were separated on a 75 µm × 50 cm PepMap C18 column using a 1 or 2 h linear gradient from 2-5% buffer A to 35% buffer B at a flow rate of 250 nL/min (approx. 600 bar at 40 • C). Peptides were introduced into the mass spectrometer using a nano Easy Spray source (Thermo Scientific) at 2000 V. The ion transfer tube temperature was set to either 305 • C (Fusion Lumos), or 250 • C (HF-X). Subsequent isolation and higher-energy C-trap dissociation (HCD) were induced on the 20 most abundant ions per full MS scan with an accumulation time of 128 ms and an isolation width of 1.2 Da (Fusion Lumos) or 1.6 Da (HF-X). All fragmented precursor ions were actively excluded from repeated selection for 8 s (Fusion) or 15 s (HF-X). The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository [31] with the dataset identifiers PXD008151 and PXD024053.

MS Data Analysis
The sequence interpretations of mass spectrometry spectra were performed using a database containing all bovine UniProt entries (total of 41,610 entries) and 4084 entries for the T. parva Muguga proteome [14]. The spectral interpretation was performed using the novo-assisted database search with PEAKS v8.5 or 10 (Bioinformatics Solutions), in 'no enzyme' mode, with mass tolerances of 5 ppm for precursor ions and 0.03 Da for fragment ions. The data were further searched against 313 in-build peptide modifications. To be included in the downstream analyses, T. parva peptides had to meet the following criteria: (1) a peptide-spectrum matching score (−10 lgP) of >20, (2) no predicted posttranslational modifications, (3) a minimum of 2 amino acids difference from any bovine peptide sequence, and (4) for BoLA-DR peptides, predicted to not be a BoLA-I-binding peptide (the latter due to evidence of coprecipitation of BoLA-I peptides in the BoLA-DR eluted datasets [32]). Prediction of the MHC binding capacity of T. parva peptides was conducted using NetMHCpan-4.1 [33] for BoLA-I, and NetBoLA-IIpan [32] for BoLA-DR eluted peptides. Following the default parameters, peptides with a percent rank predicted score of >20% were considered to be nonbinders, whilst peptides with scores <2% and <5% were considered to be binders for BoLA-I and BoLA-DR, respectively. Coprecipitants were identified as overlapping peptides present in multiple cell-lines carrying nonhomologous BoLA-I/BoLA-DR molecules, where the lowest percent rank prediction score in at least one sample was >20% (i.e., predicted to be a nonbinder).

IFNG ELISPOT
IFNG ELISPOT was conducted using a standard format. In brief, a capture anti-IFNG monoclonal antibody (CC330-Biorad, Watford, UK) was bound to a pre-wetted PVF membrane multiscreen plate (Millipore, Watford, UK). Autologous T. annulata cells (used as antigen-presented cells −2 × 10 4 cells per well) were loaded with peptides at a concentration of 5 µg/mL for 2 h. At 37 • C before CD8 + T-cells were added at a density of 1 × 10 4 cells/well. Plates were incubated at 37 • C for 20 h before washing and identification of IFNG-producing cells by the addition of a biotinylated detection anti-IFNG antibody (CC302b-produced in-house) and development by sequential use of Vectastain (Vector Laboratories, Burlingame, CA, USA) and AEC substrate (Calbiochem, Watford, UK) solutions. Analysis of spot-forming units (SFU) was completed using an AID automated ELISpot reader (AID, Strassberg, Germany).

In Vitro Measurement of Peptide-BoLA-I Binding
The extracellular domains (positions 1-275, i.e., truncated at the transmembrane region) of BoLA-I heavy chain molecules had previously been produced as recombinant proteins and used to measure the affinity of peptide-BoLA-I interactions using human Vaccines 2022, 10, 1907 5 of 35 beta-2-microglubulin as the light chain component [34,35]. A previously described assay measuring peptide-human MHCI dissociation rate at 37 • C was adapted to these BoLA-I molecules [36]. Briefly, this assay used the dissociation of the invariant beta-2-microglubulin as a proxy to measure the dissociation of the peptides offered to the BoLA-I. The dissociation at 37 • C of 125 I-radiolabeled beta-2-microglubulin was monitored in real-time by a high-throughput scintillation proximity assay, and the half-life of the dissociation was determined. Peptides that failed to bind to BoLA-I molecules did not register a half-life period; the length of the half-life was used to infer the stability and binding strength of individual peptide-BoLA-I complexes.

Identification of BoLA-I Associated Peptides Derived from T. parva
To assess the ability to detect T. parva-derived BoLA-I-associated peptides using an immunopeptidomics approach, BoLA-I-associated peptides were purified with three T. parva-infected cell lines (TP), and the peptide fractions were analysed by LC-MS. Each TP was homozygous for a different MHC haplotype, which expressed 1, 2, or 4 BoLA-I genes (in contrast to humans and mice, there is a variable number of MHCI loci expressed in different BoLA-I haplotypes, ranging from 1 to 4 [5,37]; 641TP (A18: 6:01301), 1011TP (A10: 2:01201, 3:00201), and 2229TP (A14: 2:02501, 4:02401, 1:02301, 6:04001). From these cell lines, a total of 7672, 6961, and 6871 peptide sequences were identified, respectively ( Figure 1A). Due to the high degree of homology between the sections of the bovine and T. parva genomes, a blast search of all peptide sequences putatively derived from T. parva against the bovine proteome was completed. To remove peptides potentially derived from bovine protein, variants from further analysis sequences that were less than two amino acids (a.a.) different from a matching sequence in the bovine proteome were excluded. After filtering, 25 (0.32%), 18 (0.25%), and 25 (0.36%) peptide sequences were identified as being unambiguously derived from T. parva proteins in the 1011TP, 2229TP, and 641TP cell lines, respectively (total number and number of unique T. parva peptides identified were 68 and 62, respectively-Table 1, Figure 1A).
To verify the accuracy of the MS spectral sequence annotation of the identified T. parva peptides, we employed a spectral matching approach, performing analysis of synthetic peptides under identical LC-MS conditions for a subset of the peptides (n = 33). The spectra obtained from the synthetic peptides matched those measured for the majority of the peptides analysed (n = 31, 93.9%), confirming their correct identification (Table 1, Figure 1B).
The capacity of the identified peptides to bind to the BoLA-I molecules expressed in the cell lines was predicted using the NetMHCpan4.1 algorithm [38], with peptides achieving a percent predicted rank binding score of <2% considered to be BoLA-I binders (Table 1). Only 33.8% of the T. parva peptides were predicted to be BoLA-I binders, in contrast to the bovine-derived peptides from the same samples, where~90% of peptides were predicted to be capable of binding to a BoLA-I molecule expressed in the respective samples ( Figure 1C).
Notably, unlike the bovine-derived peptides, the T. parva peptides did not exhibit a Gaussian ('normal') length distribution, and the low percentage of T. parva peptides predicted to be BoLA-I binders appeared to be due, in part, to the presence of a substantial fraction of peptides (n = 32, 47%) that were longer than the canonical 8-12 a.a. length of MHCI-binding peptides ( Figure 1D). Only 9.4% of ≥ 13 mer peptides (n = 3/32) were predicted to be BoLA-I binders, whereas 55.6% (n = 20/36) of the 8-12 mer peptides had a predicted rank binding score of <2% (Table 1). This overrepresentation of longer peptides in the T. parva-derived peptidome suggested either a specific property for parasite peptides associated with MHC-I complexes, as has previously been observed for Toxoplasma gondii peptides [20], or a high proportion of coprecipitating peptides in the pathogen-derived fraction of the peptidome. The derivation of 13/32 (~40%) of ≥13 mer T. parva peptides from a single 28 a.a. region of one protein (hypothetical protein TpM_02g00758 549-577 ), some of which were present in multiple samples despite the disparity in the BoLA-I molecules expressed by the three cell lines, was suggestive that a substantial proportion of the longer peptides were coprecipitants rather than peptides eluted from the peptide-binding groove of the purified BoLA-I molecules. Binding predictions of B. taurus and T. parva 8-12 mer peptides stratified by their predicted BoLA-I allele of origin (peptides with a rank predicted binding score of <2% to a BoLA-I allele were considered to be binders). Peptides predicted to be MHC-binders are represented by coloured blocks, peptides predicted to be non-MHC binders are represented by grey blocks; the size of the blocks is proportional to the number of peptides in the respective datasets. (D) Length distributions of B. taurus and T. parva peptides were identified in each sample. The horizontal axis shows the length of peptides, and the vertical axis shows the proportion of the peptides identified for each length. (E) A summary of results from a subset of peptides assayed in an in vitro BoLA-I binding assay is presented in Table 2. For each BoLA allele and for all of the BoLA alleles combined (Total), the number of peptides for which the in vitro assay corroborated the in silico predicted capacity to bind to BoLA-I are shown as filled bars, whilst peptides for which the results from the in vitro binding did not support the in silico prediction are shown as hatched bars, as described in the legend. BoLA-I allele of origin (peptides with a rank predicted binding score of <2% to a BoLA-I allele were considered to be binders). Peptides predicted to be MHC-binders are represented by coloured blocks, peptides predicted to be non-MHC binders are represented by grey blocks; the size of the blocks is proportional to the number of peptides in the respective datasets. (D) Length distributions of B. taurus and T. parva peptides were identified in each sample. The horizontal axis shows the length of peptides, and the vertical axis shows the proportion of the peptides identified for each length.
(E) A summary of results from a subset of peptides assayed in an in vitro BoLA-I binding assay is presented in Table 2. For each BoLA allele and for all of the BoLA alleles combined (Total), the number of peptides for which the in vitro assay corroborated the in silico predicted capacity to bind to BoLA-I are shown as filled bars, whilst peptides for which the results from the in vitro binding did not support the in silico prediction are shown as hatched bars, as described in the legend. Table 1. T. parva peptides derived from 1011TP, 2229TP, and 641TP cell lines. For each peptide, the amino acid sequence (column A), −10 lgP score (B), peptide length (C), sample (D), number of spectra (E), BoLA-I allele to which the peptide has the highest percent rank binding score (F), core peptide sequence (G), percent rank binding prediction score (H), accession number of the source protein (I), description of the source protein (J), the peptide location within the source protein (K and L), and the results of spectral matching of synthetic peptides (M-identified as either positive (matching), negative (not-matching), or ND (not tested)) are shown.    Table 2. Summary of in vitro determined peptide-MHCI dissociation rates (expressed as half-life of peptide-BoLA-I complexes at 37 • C in hours) for a subset of peptides identified from 1011TP, 2229TP, and 641TP cell lines. The dissociation rate of each peptide binding to BoLA-I alleles 1*02301, 2*01201, 3*00201, and 6*01301 was assayed by a scintillation proximity assay, essentially as previously described (see Materials and Methods). Peptides that managed to form a complex BoLA-I molecule and for which a half-life could be measured were considered to be BoLA-I binders. Scores from the in vitro assay that corroborated the in silico predicted ability of peptides to bind to BoLA-I molecules expressed in the cell lines from which they were identified are highlighted with a dark grey background. Scores that are discrepant with the in silico predictions are highlighted with a black background in white script. Three peptides that bound to 6*01301 (BoLA-A18) but were identified from 1011Tp (BoLA-A10) are shown in light grey script. The scores of negative controls (no peptide) and positive control peptides are shown (no peptide) and positive control peptides are shown. To confirm the capacity of the T. parva peptides to bind to BoLA-I molecules in vitro, binding assays were performed on a subset (n = 19) of peptides (Table 2, Figure 1E). This included representative peptides from across a range of NetMHCpan4.1 predictive scores (0.01-22.8%) and at least one allele from each MHC haplotype. For 12/13 of the peptides predicted to be BoLA-I binders, the binding assay confirmed binding; the exception was NSFVTDTFEKL, which had the poorest ranking score of the predicted binders (rank 1.68%, 3*00201). Conversely, three peptides with a rank of >2% bound to the relevant MHCI alleles in vitro, RLFNFATKRI (rank 2.32%), and SLKSALIDTLI (rank 2.47%) bound to 6*01301 and YGDYGEFDRKTK (rank 13.4%) bound to 2*01201; however, all three exhibited weaker binding than peptides that were predicted binders. The three peptides with the poorest rank scores (all rank >14%) failed to exhibit any binding on the assay. A notable feature of the results was the high level of correlation between the predicted percent rank binding scores and the quantitative results observed in the in vitro assay, with the data from the in vitro analysis, therefore, corroborating the BoLA-I binding predictions from NetMHC-pan4.1. Interestingly, the in vitro binding assays demonstrated that three peptides identified as 2*01201-binders from the BoLA-A10 sample (1011TP) had the capacity to bind to 6*01301 (A18); however, the level of binding was very low, being generally >10 fold lower than the weakest predicted 6*01301 binding peptide. This may reflect the similarity of the peptide binding motifs of the 2*01201 and 6*01301 alleles [38].
Thus, from this primary set of samples, a total of 68 T. parva peptides were identified, of which 23 were predicted to be BoLA-I binders. Data from spectral matching, BoLA-I binding prediction, and in vitro binding assays confirmed the identity of the pMHCI-eluted peptides and their capacity to bind to the relevant BoLA-I molecules; however, ELISPOT data indicated that only 1/33 of the assayed peptides were recognised by T. parva-specific CD8 + T-cells derived from an ITM-immunised donor.
The data generated from the second sample set had a similar profile to that obtained from the preliminary set of samples. A range of between 5333 and 12119 total peptides (average = 8487) and 6-107 T. parva peptides (average = 46) were identified in each sample ( Figure 2A); thus, the average percentage of T. parva peptides in the data was 0.53% (a summary of all of the data is provided in Supplementary Data 1.1). The total number of T. parva peptides identified and number of unique sequences identified in all 10 samples combined were 456 and 294, respectively. Details of all T. parva peptides identified in the second sample set are provided in Supplementary Data 2.1. As in the preliminary dataset, the T. parva peptides exhibited an anomalous length distribution, with only 53% being 8-12 mers and 44.9% being ≥13 mers (1.8% of T. parva peptides were 7 mers, n = 8), whilst >90% of the bovine peptides were of the canonical 8-12 a.a. length, suggesting the anomalous profile was parasite-specific ( Figure 2B). Similarly, the percent rank binding prediction results were similar to the first dataset, with a high proportion of bovine derived 8-12-mer peptides predicted to be binders (for all samples combined = 91%, range in samples = 86-96%), whilst only 57% (range in samples = 20-100%) of the T. parva-derived 8-12-mer peptides were predicted to be BoLA-I binders ( Figure 2C).
Together, the data indicate that BoLA-I immunopeptidomic analysis of Theileria parvainfected cell lines generated a consistent data profile that comprised subsets of peptides of canonical length, of which~50% were predicted to be MHCI-binders, and peptides of anomalous lengths that contained few MHCI-binders.

Exclusion of Putative Coprecipitating Parasite Proteins and Application of Immunoinformatics Provides a Refined List of Putative BoLA-I-eluted T. parva Peptides
Examination of the collated data derived from the 13 samples demonstrated that ribosomal proteins, histones, and TpM_02g00758 were dominant sources for the 524 T. parva-peptides identified, accounting for 27.1%, 19.3%, and 16.2% of the peptide repertoire, respectively. The majority of peptides derived from TpM_02g00758 were of an anomalous length (average length = 17.5 a.a., with 67/85 of the peptides being ≥13 a.a. in length), were predominantly overlapping peptides originating from a small 30 a.a. region (79/85 peptides (92.9%) derived from TpM_02g00758 547-577 ), and had a poor percent rank binding prediction score (median = 93.1%, only one peptide had a predicted percent ranking binding score below the 2% threshold).
Peptides from other proteins exhibited similar, but less pronounced, characteristicsfor example, ribosomal protein S28-B 40s (TpMuguga_03g00428 1-14 ) and histone H2A variant 1 (TpMuguga_02g00611), as shown in Figure 3. The recurrence of peptides from localised regions of a small subset of proteins in multiple samples of nonsimilar BoLA-I haplotypes, which generally exhibited poor percent rank binding scores and anomalous peptide lengths, supports the designation of these peptides as co-precipitants rather than genuine MHC-binders.
Based on this, the dataset was refined by removing overlapping peptides identified in multiple samples expressing disparate BoLA-I haplotypes (for this purpose, BoLA-A14 and BoLA-A15, which express common BoLA alleles, were grouped together) where one or more of the peptides was not predicted to be a BoLA-I binder (defined as a rank-predicted binding score >20%). When applied to the combined dataset, this refining process removed 55.2% of the T. parva peptides (n = 289/524-Supplementary Data 3). The removed peptides had an average length of 14.7 amino acids and a median rank prediction score of 66.3%, with only 20 (6%) having a rank prediction score of <2%. In contrast, the remaining 235 peptides had an average length of 11.6 amino acids, a median rank prediction score of 1.43%, and the number of peptides with a percent rank prediction score of <2% was 128 (55.4%)-thus, the removal of the coprecipitants had a profound effect on the dataset, leading to a substantial improvement of the predicted percent rank score and making the enrichment for genuine BoLA-I binders in the dataset evident ( Figure 4).
These putative coprecipitant peptides were derived from 25 proteins. These included TpM_02g00758, 12 ribosomal proteins, and 3 histones, which together were the source of 88.9% of the coprecipitant peptides (n = 257/289). Based on the high representation of these proteins in the coprecipitation pool, it was decided to remove all peptides derived from these classes of proteins. This resulted in the removal of an additional 75 peptides, so that a total of 364 peptides were excluded; this peptide set had an average length of 14 a.a., a median predicted-rank score of 22.8%, and 68 peptides (18.7%) of the peptides had a rank prediction score of <2%. The retained peptide dataset consisted of 160 peptides, with an average peptide length of 11.6 a.a., a median rank predicted binding score of 2.03%, and 80 peptides (50%) that were predicted to be binders. Thus, the removal of all peptides derived from ribosomes, histones, and TpM_02g00758 caused a slight deterioration in the statistics of the retained peptide set but was considered a good compromise to decrease the retention of possible coprecipitant artefacts. As a final step to refine the peptide dataset, an immunoinformatics filter was used, and all remaining peptides that had a predicted percent rank binding score of >2% (i.e., not predicted MHCI-binders) were removed. This left a final dataset of 80 peptides, which, after consolidation of overlapping, nesting, and duplicate identifications, resulted in 74 unique peptides from 68 proteins (Table 3).  Peptides from other proteins exhibited similar, but less pronounced, characteristicsfor example, ribosomal protein S28-B 40s (TpMuguga_03g004281-14) and histone H2A variant 1 (TpMuguga_02g00611), as shown in Figure 3. The recurrence of peptides from localised regions of a small subset of proteins in multiple samples of nonsimilar BoLA-I haplotypes, which generally exhibited poor percent rank binding scores and anomalous peptide lengths, supports the designation of these peptides as co-precipitants rather than genuine MHC-binders.  Based on this, the dataset was refined by removing overlapping peptides identified in multiple samples expressing disparate BoLA-I haplotypes (for this purpose, BoLA-A14 and BoLA-A15, which express common BoLA alleles, were grouped together) where one or more of the peptides was not predicted to be a BoLA-I binder (defined as a rankpredicted binding score >20%). When applied to the combined dataset, this refining process removed 55.2% of the T. parva peptides (n = 289/524-Supplementary Data 3). The removed peptides had an average length of 14.7 amino acids and a median rank prediction score of 66.3%, with only 20 (6%) having a rank prediction score of <2%. In contrast, the remaining 235 peptides had an average length of 11.6 amino acids, a median rank prediction score of 1.43%, and the number of peptides with a percent rank prediction score of <2% was 128 (55.4%)-thus, the removal of the coprecipitants had a profound effect on the dataset, leading to a substantial improvement of the predicted percent rank score and making the enrichment for genuine BoLA-I binders in the dataset evident ( Figure 4). . The distribution of predicted percent rank binding scores for peptides considered to be coprecipitants and peptides retained in the combined BoLA-I dataset after removal of the coprecipitating peptides. Peptides with a predicted percent rank binding score of <2% are considered to be binders. A small number of short peptides (<8 amino acids long, n = 8) that did not receive a predicted percent rank binding score were ascribed a default value of 100%.
These putative coprecipitant peptides were derived from 25 proteins. These included TpM_02g00758, 12 ribosomal proteins, and 3 histones, which together were the source of 88.9% of the coprecipitant peptides (n = 257/289). Based on the high representation of these proteins in the coprecipitation pool, it was decided to remove all peptides derived from these classes of proteins. This resulted in the removal of an additional 75 peptides, so that a total of 364 peptides were excluded; this peptide set had an average length of 14 a.a., a median predicted-rank score of 22.8%, and 68 peptides (18.7%) of the peptides had a rank prediction score of <2%. The retained peptide dataset consisted of 160 peptides, with an . The distribution of predicted percent rank binding scores for peptides considered to be coprecipitants and peptides retained in the combined BoLA-I dataset after removal of the coprecipitating peptides. Peptides with a predicted percent rank binding score of <2% are considered to be binders. A small number of short peptides (<8 amino acids long, n = 8) that did not receive a predicted percent rank binding score were ascribed a default value of 100%. Table 3. Predicted BoLA-I presented T. parva peptides identified in this study. For each peptide, the accession number and description of the protein from which it is derived are shown in columns A and B, and specific comments about particular proteins are given in column C. Columns D-H provide details about the individual peptides, including their sequence, length, the sample(s) they were identified in, the BoLA-MHCI allele predicted to present the peptide, and the predicted percent rank binding score.

Analysis of the Reproducibility of the Identified T. parva BoLA-I Immunopeptidomes
In the final dataset, the average number of T. parva peptides identified per sample was approximately six, suggesting that only a small subset of the BoLA-I-presented T. parvapeptides had been identified. To evaluate what effect this had on the reproducibility of the T. parva peptidomes described, we examined the overlap of peptides identified in cell lines that had been subjected to duplicate analysis of independent samples (technical duplicates for 641TP, 2824TP, and 5350TP, respectively) and in the triplicate datasets from TP cell lines expressing BoLA-A10 and BoLA-A15 haplotypes (comprising the 1011TP/5072TP and 2123TP/2408TP samples, respectively).
For the BoLA-A18, A10, and A15 groups, there was partial, but limited, overlap between replicate samples; in contrast, for the BoLA-A19 and BoLA-A20 groups, there was no overlap between the samples ( Figure 5). As a summary statistic, the percentage of T. parva peptides identified in replicate samples was 7.3%; in comparison, the overlap between the bovine peptidomes from the same samples was greater, with 47.1% of bovine peptides identified in replicate samples. The low level of overlap observed between replicate T. parva immunopeptidomes is most likely a consequence of the low number of T. parva peptides identified (notably in the BoLA-A19 and BoLA-A20 groups, only one peptide was identified in one of the replicate samples); however, the identification of a subset of T. parva peptides in replicate samples indicates that the immunopeptidomes described in this study are at least partially reproducible, and higher resolution studies, yielding greater depth of peptide repertoire coverage, would likely produce datasets exhibiting greater reproducibility.

Analysis of T. parva Peptides Presented by BoLA-DR
We sought to expand the immunopeptidiomic analysis to bovine MHCII molecules. Cattle express two BoLA-II isotypes-DR and DQ. The peptide-binding groove of MHCII molecules is formed by a combination of the coexpressed α and β chains that form the MHCII heterodimer. Both BoLA-DQA and DQB loci exhibit polymorphism and are duplicated in some BoLA-haplotypes [39], whereas the BoLA-DRA locus is monomorphic and there is only a single function and expressed BoLA-DRB locus [40,41]. Consequently, immunopeptidomic analysis of BoLA-DR was considered less complex, and we undertook an analysis of the peptides eluted from BoLA-DR molecules of six T. parva-infected  Figure 6A). After filtering sequences with close homology to the bovine proteome, a range of 58-151 peptides (average = 101) were identified as being unambiguously derived from T. parva, representing 1.5% of the total peptides identified (total number and number of unique T. parva peptides identified were 607 and 326, respectively; Supplementary Datas 1 and 2). The average length of T. parva peptides was slightly shorter (15.0 a.a.) than the bovine peptides (15.7 a.a) ( Figure 6B) and adhered less to a classic Gaussian distribution. The proportion of peptides that were predicted to be binders (i.e., had a percent rank predicted binding score of <5% when using NetBo-LAIIpan; the threshold used for BoLA-DR binding) for 13-21-mer bovine peptides were consistently high, ranging from 82 to 84% (average = 83%). In contrast, the proportion of T. parva-derived 13-21-mer peptides that were predicted to be BoLA-DR binders was much lower, ranging from 5 to 22% (average = 12%; Figure 6C).
replicate T. parva immunopeptidomes is most likely a consequence of the low number of T. parva peptides identified (notably in the BoLA-A19 and BoLA-A20 groups, only one peptide was identified in one of the replicate samples); however, the identification of a subset of T. parva peptides in replicate samples indicates that the immunopeptidomes described in this study are at least partially reproducible, and higher resolution studies, yielding greater depth of peptide repertoire coverage, would likely produce datasets exhibiting greater reproducibility. Figure 5. Overlap between the peptidomes identified from duplicate analysis of BoLA-A18, BoLA-A19, and BoLA-A20 samples and from samples sharing the BoLA-A10 and BoLA-A15 haplotypes. Figure 5. Overlap between the peptidomes identified from duplicate analysis of BoLA-A18, BoLA-A19, and BoLA-A20 samples and from samples sharing the BoLA-A10 and BoLA-A15 haplotypes. Euler diagrams displaying the overlap in the T. parva (left) and total (right) peptidomes of duplicated samples or samples sharing BoLA-I haplotypes. The number of peptides that are unique to each sample and shared between samples is indicated. The peptides identified in replicate samples were: BoLA-A18-RMDDKSGGLL from TpMuguga_01g00736 (hypothetical protein) and GEFEKKYIPTL from TpMuguga_01g00757 (Ras family protein); BoLA-A10-AGVELDTQKKFL from TpMuguga_03g00858 (multiprotein-bridging factor 1c); BoLA-A15-FEYEFPINH from Tp-Muguga_01g00471 (bifunctional thioredoxin reductase/thioredoxin); and EEIAHVLHY from Tp-Muguga_01g02030 (hypothetical protein). Note that in the BoLA-A15 group, there were no T. parva peptides identified in sample 2123TP and so this sample is not represented in the T. parva euler diagram.
Similar to the BoLA-I data, a notable feature of the T. parva peptides in the BoLA-DR dataset was the dominant representation of peptides from a small subset of proteins. This included TpMuguga_02g00758, from which peptides were identified in all six samples and which accounted for a total of 118 peptides (19.4% of all T. parva peptides in the BoLA-DR dataset). As with the BoLA-I data, the peptides from these proteins identified in different samples were often clustered in specific regions, were overlapping, and predominantly had poor predicted percent rank binding scores, indicative of coprecipitating peptides. Application of the same process as described for the BoLA-I data to identify putative coprecipitants suggested that a substantial majority of the T. parva peptides (82.9%, n = 503/608-Supplementary Data 3) were coprecipitants. These peptides were derived from 36 individual proteins, of which 25 were either ribosomal or histone proteins. A comparison of those proteins that were identified as the sources of coprecipitant peptides in the BoLA-I and BoLA-DR datasets showed a high level of convergence (15 proteins common to both) and a correlation in the number of peptides that individual proteins contributed to the BoLA-I and BoLA-DR datasets ( Figure 7A  Similar to the BoLA-I data, a notable feature of the T. parva peptides in the BoLA-DR dataset was the dominant representation of peptides from a small subset of proteins. This included TpMuguga_02g00758, from which peptides were identified in all six samples peptides (82.9%, n = 503/608-Supplementary Data 3) were coprecipitants. These peptides were derived from 36 individual proteins, of which 25 were either ribosomal or histone proteins. A comparison of those proteins that were identified as the sources of coprecipitant peptides in the BoLA-I and BoLA-DR datasets showed a high level of convergence (15 proteins common to both) and a correlation in the number of peptides that individual proteins contributed to the BoLA-I and BoLA-DR datasets ( Figure 7A  showing the level of correlation between the proteins from which coprecipitating peptides were derived in the BoLA-I and BoLA-DR datasets. Each point represents a single protein, and the number of coprecipitating peptides from individual proteins in the BoLA-MHCI and BoLA-DR datasets is shown on the horizontal and vertical axis, respectively. The R 2 value calculated from the Pearson's coefficient of correlation is shown. (B) The distribution of percent rank prediction scores for peptides considered to be coprecipitants and peptides retained after removal of the coprecipitant peptides in the combined BoLA-DR dataset. Peptides with a percent rank prediction score of <5% are considered to be binders. A small number of short peptides (<9 amino acids long, n = 26) that did not receive a percent rank prediction score were ascribed a default value of 100%.
The removal of coprecipitated peptides had a limited impact on the average predicted rank percent binding score of the combined BoLA-DR peptide dataset (46.3% vs. 46.9%), however, the distribution of the retained peptides showed a clear bimodal pattern with peaks of peptides with a rank percent prediction score of <5% and >95%; in contrast, the profile of the coprecipitated T. parva peptides showed no evidence of showing the level of correlation between the proteins from which coprecipitating peptides were derived in the BoLA-I and BoLA-DR datasets. Each point represents a single protein, and the number of coprecipitating peptides from individual proteins in the BoLA-MHCI and BoLA-DR datasets is shown on the horizontal and vertical axis, respectively. The R 2 value calculated from the Pearson's coefficient of correlation is shown. (B) The distribution of percent rank prediction scores for peptides considered to be coprecipitants and peptides retained after removal of the coprecipitant peptides in the combined BoLA-DR dataset. Peptides with a percent rank prediction score of <5% are considered to be binders. A small number of short peptides (<9 amino acids long, n = 26) that did not receive a percent rank prediction score were ascribed a default value of 100%.
The removal of coprecipitated peptides had a limited impact on the average predicted rank percent binding score of the combined BoLA-DR peptide dataset (46.3% vs. 46.9%), however, the distribution of the retained peptides showed a clear bimodal pattern with peaks of peptides with a rank percent prediction score of <5% and >95%; in contrast, the profile of the coprecipitated T. parva peptides showed no evidence of selection of predicted BoLA-DR binders, with only a dominant peak for peptides with a predicted rank percent binding score of >95%- Figure 7B. Although removal of the coprecipitants provided an enhanced dataset, the majority of the peptides retained in the dataset were not predicted to be BoLA-DR binders; 79/104 (76%) of the peptides had a predicted rank binding score of >5%. As with the BoLA-I data, all peptides derived from TpM_02g00758, ribosomal proteins, and histones were removed (34 peptides with a median rank predicted binding score of 46.95, and only two peptides were predicted to be BoLA-DR binders), and the default threshold (i.e., a percent rank-predicted binding score of <5%) used to predict MHC binding was applied to generate a final list of BoLA-DR presented T. parva peptides. After consolidation of the nested peptides, this list included 15 peptides, each derived from a different T. parva protein (Table 4). Table 4. Predicted BoLA-DR presented T. parva peptides identified in this study. For each peptide, the accession number and description of the protein from which it is derived are shown in columns A and B, and specific comments about particular proteins are given in column C. Column D-H provide details about the individual peptides, including its sequence, length, the sample it was identified in, the BoLA-DR allele predicted to present the peptide and the predicted percent rank binding score (%RPS).

Comparison of T. parva Peptidome Data with Previously Identified T. parva Antigens
Recent work applying conventional antigen-screening techniques to identify CD4 + and CD8 + T-cell epitopes with a peptide library covering 502 T. parva proteins from the reference Muguga strain [13] has expanded the number of known T-cell antigens to 36; twenty CD4 + T-cell antigens, 10 CD8 + T-cell antigens, and six antigens containing epitopes for both CD4 + and CD8 + T-cells [12,13]. The library included peptides covering 19 out of the 105 proteins that were identified as sources of BoLA-I and/or BoLA-DR presented peptides in this study. Although none of the peptides from these 19 proteins matched experimentally mapped CD8 + or CD4 + T-cell epitopes, five of the proteins have been identified as T-cell antigens ( Table 5). This includes TpMuguga_02g00123 (Tp32-DEAD/DEAH box helicase) and TpMuguga_02g00895 (Tp9), which have been shown to contain epitopes for both CD8 + and CD4 + T-cells. Thus, 26.3% of the proteins identified as sources of MHCI/MHCII presented peptides from immunopeptidomics that have been included in the recent conventional antigen-screening study [13] have been demonstrated to contain recognised epitopes. In contrast, only 7.5% of the proteins selected for inclusion in that study were shown to contain epitopes. This suggests that, although only one of the peptides identified in this study has been validated as containing an epitope, immunopeptidomics could be used to preferentially select proteins that are sources of CD4 + /CD8 + T cell antigens. Table 5. T. parva proteins that have been shown to contain epitopes recognised by either CD4 + and/or CD8 + T-cells and as sources of peptides identified in T. parva samples during pMHC elution of either BoLA-I or BoLA-DR. Identity of confirmed CD4 + and CD8 + T-cell epitopes come from Graham et al. (2006) and Morrison et al. (2021).

Accession
Protein Name

Discussion
Identification of T-cell antigens for inclusion in vaccines against complex pathogens remains challenging. The potential for immunopeptidomics to address this critical obstacle in the development of novel subunit vaccines against a wide range of pathogens, especially nonviral pathogens with complex proteomes [42], is receiving greater attention. This is of especial relevance to intracellular eukaryotic pathogens, such as Plasmodium, where integration of data from immunopeptidomics with data from other antigen-identification approaches has been advocated as the most efficient way in which to identify antigens with potential for vaccination for pre-erythrocytic malaria vaccines [43].
Antigen-identification persists as a constraint on the development of novel T. parva subunit vaccines with the capacity to induce T-cell responses. Conventional screening approaches have yielded a number of T. parva antigens for both CD4 + and CD8 + Tcells [12,13,44] on some MHC backgrounds, and in a recent study, use of a purely 'immunoinformatic' approach to identifying T-cell antigens from Theileria has been attempted [45]. In this study, we provide the first description of MHC-peptide elution studies being used to investigate the immunopeptidome of T. parva-infected cells to define peptides presented by BoLA-I and BoLA-DR molecules.
Although the large proteome and high parasite diversity are disadvantages in conventional antigen-screening approaches, the biology of T. parva has inherent advantages with regards to the application of immunopeptidomics. Firstly, it is easy to generate and maintain rapidly and indefinitely proliferating T. parva-infected cells in vitro-enabling the accumulation of sufficient cells (>1 × 10 9 infected cells), with an infection rate of~100%, from the target host species. In malaria, the application of immunopeptidomics has been hindered by the low level of hepatocyte infection (<10%) that can be achieved [43]. The second feature of T. parva is that the in vitro infected cells express high levels of both BoLA-I and BoLA-II and are of the same phenotype as the cells infected in vivo (i.e., T and B lymphocytes). The capacity of these T. parva-infected cells to recall antigen-specific T-cells from recovered animals without the need for supplementary APCs confirms that the peptides presented in the context of BoLA-I/BoLA-II by T. parva-infected cells are those that stimulate T-cell responses [3,29]. As such, cells infected in vitro with T. parva are highly appropriate for studies to apply immunopeptidomics and avoid the potential complication of cell types selected for analysis that was observed in recent Chlamydia studies, where dendritic and epithelial cells were found to have discordant and nonoverlapping immunopeptidomes [46].
In this study, we exploited the features of T. parva to analyse the immunopeptidomes from a total of nine different cell lines representing a range of BoLA-I haplotypes and BoLA-DR molecules. This gave our study a unique structure compared to immunopeptidomic studies that have been reported for other eukaryotic parasites (Leishmania, Plasmodium, and Toxoplasma), where data have been generated for either MHCI or MHCII and from only single samples [19][20][21]. This approach was primarily driven by two factors. Firstly, as cattle are not a 'model' species, there was, at the outset of these studies, very limited information on the peptide binding motifs of bovine MHC molecules-the inclusion of multiple BoLA-I/BoLA-DR genotypes allowed us to define their peptide-binding motifs [32,38,47], which will find applications in enhancing 'immuno-informatic' based studies in cattle (and was also pivotal in subsequently refining the data presented herein). Secondly, the ultimate aim of the study was the exploration of the feasibility of using immunopeptidome analysis as an alternative and/or complementary approach contributing to CD4 + /CD8 + T-cell candidate antigen identification for a T. parva vaccine that will be used in outbred cattle populations. As such, there was an interest in generating data for both BoLA-I and BoLA-II molecules for a range of different genotypes. In this study, we purposely focused on MHC haplotypes present in Holstein-Friesian cattle. This was primarily because these high-yielding dairy cattle and their crosses with indigenous cattle continue to increase in numbers in regions of Africa where ECF occurs. These high-value animals are considered critical to improving agricultural production and meeting the growing local demand for dairy products. As an 'exotic' breed that shows minimal tolerance to T. parva infections, these animals are particularly susceptible to ECF and are, therefore, the primary target for any vaccine against T. parva. Secondly, most previous antigen-identification work has primarily focused on animals expressing MHC haplotypes present in Holstein-Friesian animals. Thirdly, at the moment there is insufficient data on the diversity and frequency of MHC genotypes in cattle breeds indigenous to T. parva-endemic regions on which to base the selection of samples to subject to immunopeptidomic analysis in these populations. This fundamental gap in our knowledge is currently being addressed by several groups through the development and application of high-throughput sequencing approaches for exploring MHC diversity in a range of cattle populations [28,48,49].
A consistent feature of the BoLA-I and BoLA-DR T. parva-derived peptide datasets obtained during this study was the predominance of peptides that were of noncanonical length and/or predicted to be non-MHC binders. Equivalent anomalous peptides were not a feature of the bovine peptidomes, suggesting that these anomalous peptides were specific to the pathogen rather than a technical fault in the protocol. Non-genuine MHC binders that co-precipitate with pMHC molecules in elution studies using equivalent protocols have been documented before (e.g., in HIV-1 [50]). However, the extent to which putative co-precipitants dominated the T. parva datasets was remarkable; a total of 55.2% and 82.9% of the peptides identified from the BoLA-I and BoLA-DR datasets were defined as co-precipitants in this study. The co-precipitant peptides came from a small subset of proteins, which showed a high level of overlap between the BoLA-I and BoLA-DR datasets, suggesting that co-precipitating peptides did not represent random 'noise' but rather a feature of specific T. parva proteins. TpMuguga_02g00758, a hypothetical protein of 640 amino acids, was the largest contributor of co-precipitating peptides to both datasets (29.5% and 23.3% in the BoLA-I and BoLA-DR datasets, respectively). The second and third most common proteins acting as sources of co-precipitants in both datasets were histones (contributing a total of 28.5% and 24.5% of co-precipitants in MHCI and DR, respectively), whilst the most commonly represented family of proteins were ribosomal proteins (11 and 21 proteins contributing 25% and 39.4% of coprecipitating peptides in the BoLA-I and BoLA-DR datasets, respectively). Together, TpMuguga_02g00758, histones, and ribosomes constituted >80% of the co-precipitants identified in both datasets.
Relatively few immunopeptidomic studies have been conducted on parasite-infected cells [19][20][21], so there is little comparative data to assess whether this magnitude of coprecipitating peptides is a feature common to intra-cellular parasites or specific to Theileria (only eukaryotes express both ribosomes and histones, preventing direct comparison with data from viral or bacterial studies). Coprecipitants were not specifically examined in any of the other parasite immunopeptidomic studies. However, direct comparison between the studies would perhaps be confounded by technical differences; in the Toxoplasma gondii study [20], a monoallelic secreted MHCI model was used and in the Plasmodium study [21], DCs were loaded by incubation with parasitized RBCs, which may, through different mechanisms, change the quantity and/or source of any coprecipitants observed. The high level of coprecipitants observed for T. parva may reflect fundamental biological differences between it and the other parasites for which immunopeptidomic studies have been completed. Foremost of these is the location of T. parva schizonts free within the host cell cytoplasm (Plasmodium, Toxoplasma, and Leishmania reside within vacuoles). This feature, combined with the active secretion of Theileria proteins into the host cell cytoplasm [51][52][53] and the close interaction between the parasite and various components of the host cell's architecture (e.g., attachment to the host cell's mitotic spindle during cell division [54] and the formation of various host-pathogen complexes on the schizont surface [53], may afford greater opportunities for T. parva proteins to non-specifically bind to MHC molecules and/or integrate into elements of the host cell's structures that are subsequently co-precipitated during the elution protocol. Notably, the TpMuguga_02g00758 gene product has a signal peptide, indicating that it is likely to be secreted into the host-cell cytoplasm. However, absence of any other annotated features of this protein precludes any inferences about its biological function and how this might relate to it being the major co-precipitant in both the BoLA-I and BoLA-DR datasets. A second pertinent feature of T. parva biology is the transformation of the host cell and rapid proliferation of both the host cell and parasite. To sustain the DNA and protein production that proliferation requires, histones and ribosomes are likely to be subject to high levels of expression and turn-over which may contribute to their overrepresentation as coprecipitants. At present there is no proteome of the schizont stage of T. parva, however transcriptomic analyses of T. parva schizonts show that several histone and ribosomal proteins are amongst the most abundantly expressed proteins [55]. Comparison of these data with proteomic analysis of schizonts of the closely related T. annulata has shown a high level of concordance-suggesting the transcriptome is quantitatively representative of the T. parva schizont proteome [56]. After removal of co-precipitants, both the BoLA-I and BoLA-DR datasets showed bias for peptides that were predicted to be strong MHC-binders (Figures 4 and 7B), demonstrating preferential selection of MHC-binding peptides. Within these datasets there remained a high number of peptides derived from TpMuguga_02g00758, ribosomes and histones.
Based on the high frequency at which these proteins were the source of co-precipitating peptides we decided to exclude all peptides from these proteins. Whilst this did not enhance the parameters by which the quality of the datasets were measured, it removed classes of proteins that were evidently sources of co-precipitating proteins and so compromised as potential candidate vaccine antigens. Following removal of these peptides, the BoLA-I and -DR datasets were reduced substantially from 524 and 607 to 160 and 70 respectively; however, within these there remained a substantial proportion of peptides that were predicted to be non-MHC binders. It is possible that these peptides may also be co-precipitants; however, there was no clear rationale in the current dataset for defining them as such and so they were excluded from the final peptide selections solely on the basis of default thresholds of predicted rank percent scores used to define BoLA-I (<2%) and BoLA-DR (<5%) binding. The high correlation seen between the predicted % rank binding scores and the results from the in vitro binding assay (Table 2), the very high proportion of bovine derived ligands with low rank scores (indicating BoLA binding) [38] and the observation that most known CD4 + and CD8 + T-cell epitopes from T. parva proteins have predicted % rank binding scores within the respective 5% and 2% thresholds [32,38], support this approach to rationalising the final peptide lists. The availability of well-defined peptide-binding motifs for these bovine MHC molecules was critical in curation of the datasets and attests to the importance of high quality immuno-informatic data to support future immunopeptidomic studies.
Recognition by T-cells from immune animals has been used as a standard component of assays attempting to 'validate' peptides identified from immunopeptidomic studies. On assaying a subset of the T. parva peptides identified from the first 3 BoLA-I samples only 1 out of 33 of the peptides (3.3%) was recognised. Prima facie, this lack of recognition was disappointing. However, it is similar to results from other immunopeptidomic studies (e.g., [57,58]) and in the context of immunodominance, which is a characteristic feature of many CD8+ T-cell responses [59], is to be anticipated. Due to factors such as TCR repertoire and APC competition [60], although a broad range of peptides may be presented by MHC molecules (and so reported in the immunopeptidome), immunodominance causes a detectable T-cell response against only a small subset of the presented peptides. Previous work by our group has confirmed that T. parva-specific CD8 + T-cell responses are subject to immunodominance, with up to 78% of the responses in BoLA-A10 and A18 homozygous animals being directed against single peptides [15]. The failure of the peptides identified in the immunopeptidome to be recognised by T-cells from immune animals does not discredit them as vaccine candidates. Studies in a range of pathogens including Trypanosoma cruzi [61], Mycobacterium tuberculosis [62,63] and various viruses [64][65][66] has shown that subdominant/cryptic epitopes can confer protection, which in some cases is better than that afforded by immunodominant epitopes. Furthermore, it has been shown in numerous studies [61,[67][68][69][70][71] across a similar range of pathogens, that induction of T-cell responses in the absence of immunodominant antigens (as can be engineered in subunit vaccines) can successfully elicit responses against epitopes that are normally 'cryptic'/subdominant following natural infection (which, in the context of T. parva, ITM immunisation essentially is). Thus, with regards to using immunopeptidomics to identify novel vaccine candidate antigens, the concept of using recognition of peptides by T-cells from immune animals to 'validate' peptides is conceptually flawed. To discount peptides not recognised by T-cells of immune cattle would mitigate against a major benefit of immunopeptidomics, the capacity to identify a large repertoire of MHC-presented peptides, amongst which may be potential epitopes that cannot be identified from T-cell screening but which potentially have the capacity to confer protection when utilised in vaccines. Data from recent studies of human melanoma [57] and African Swine Fever Virus [58] have confirmed T-cells recognising peptides identified through immunopeptidomics but which were 'cryptic' (i.e., not recognised by T-cells from the affected individuals/infected pigs) can successfully be induced by vaccination with these peptides. In future studies we plan to generate a multi-epitope vaccine construct including a selection of the peptides identified herein to directly assess their potential as vaccine candidates (see below). Encouragingly, evidence from a recent, unrelated study implies that it will be possible to induce T-cell responses against T. parva epitopes that are not recognised following natural infection. Following Adenovirus/MVA heterologous prime-boost immunisation of 4 calves with T. parva antigens (Tp2, Tp9 and Tp10) all 4 animals expressing either BoLA-DRB3*11:01 and/or 10:01 generated CD4 + T-cell responses specific for the Tp10 (TpMuguga_04g00772) antigen, which had not been previously observed as a CD4 + T-cell antigen in ITM-immunised animals bearing these or other BoLA-DR alleles [13]. These Tp10-specific CD4 + T-cells were capable of recognising T. parva-infected cells and, as far as could be determined, were functionally similar to CD4 + T-cells against immunodominant epitopes induced by ITM.
The T. parva cell lines chosen for the initial immunopeptidomic analysis were selected partly because immunodominant CD8 + T-cell epitopes restricted by MHCI proteins expressed by these cells (BoLA-A10+, BoLA-A18+ and BoLA-A14+) had already been defined. However, only the BoLA-A14-restricted TpMuguga_02g00895 67-75 epitope was identified in the immunopeptidome (as a nested peptide within a 15-mer). This most likely was a reflection that the immunopeptidomes characterised in this study were only partial. In the final dataset the average number of T. parva unique peptides identified per BoLA-I and BoLA-DR sample was only~6 and~2.5 respectively. Examination of replicate samples demonstrated very limited overlap in the BoLA-I immunopeptidomes described ( Figure 5-only 7.3% of the peptides were identified in multiple samples)-confirming that only a fraction of the T. parva peptidome was being identified from any individual sample at the depth of data generated in this study. A more intensive immunopeptidome analysis would be expected to enable a more complete characterisation of T. parva immunopeptidomes, which would potentially include the identification of the known T. parva CD4 + and CD8 + epitopes. For reasons given above, we elected in this study to prioritise the inclusion of multiple BoLA genotypes. However, future work complementing this with 'deep' immunopeptidomic profiling for a limited number of samples would be beneficial.
The ultimate aim of the study was to assess the feasibility of immunopeptidomics to identify candidate antigens to include in novel subunit vaccines. At the end of the analysis, a total of 74 and 15 unique peptides with predicted high binding capacities for BoLA-I and BoLA-DR molecules were identified. This information can be used in multiple ways to inform candidate antigen selection. For example, as described in the final results section, immunopeptidomic data could increase the efficiency of conventional antigen screening by focusing on T. parva proteins that are evidently contributing to the immunopeptidome. This may be of particular value as other parameters that have been evaluated (e.g., transcript abundance, presence of signal peptides) have not proved reliable for targeting antigen screening [13]. The simplest and most direct application of the data would be to include the identified peptides in a vaccine delivery platform (e.g., an adenoviral-vectored vaccine) and administer this to animals of the appropriate BoLA genotypes to assess immunogenicity and/or protective efficacy. However, this approach would only partially utilise the information made available from the immunopeptidomic data. The well-documented antigenic diversity of T. parva [8][9][10][11] and the polymorphism of bovine MHC in cattle in T. parva-endemic areas (Vasoya et al. submitted) preclude the direct application of immunopeptidomic studies to identify specific epitopes for all the potential permutations of MHC genotype and parasite strain. A more practicable approach would be to integrate this and future immunopeptidome data identifying proteins preferentially accessing the MHC processing pathways with data analysing the host immunogenetic and pathogen genetic diversity. The repertoire of bovine BoLA genotypes in cattle populations in T. parva-endemic areas is now being analysed using high-throughput sequencing approaches (Vasoya et al. submitted), and work has begun to develop improved immuno-informatics algorithms for cattle [32,38,47] enabling accurate prediction of binding motifs of BoLA-I and BoLA-DR molecules. Simultaneously, transcriptomic and genomic sequencing of multiple T. parva strains [8,14] has recently been undertaken, enabling the strain diversity of T. parva proteins to be assessed. Together, the outputs of these 'omics' technologies could be utilised to identify candidate T. parva antigens presented on BoLA-I/BoLA-II (immunopeptidomics) and evaluate them for their degree of conservation (genomics/transcriptomics of T. parva strains) and their content of peptides (immunoinformatics) that can be presented on multiple BoLA-I and BoLA-DR genotypes (MHC repertoire analyses). The complexity of both the host MHC and pathogen strain diversity will undoubtedly make identification of a comprehensive set of candidate antigens immensely challenging, but this approach will make the best use of the data being generated from a suite of 'omics technologies and provide a rational approach to identifying a 'minimal' set of antigens that would enable the induction of CD8 + and CD4 + T-cells from animals bearing a range of different MHC genotypes against diverse T. parva strains.
In this study, we have provided the first exploration of the immunopeptidome of T. parva-infected cells. The study encompasses data from multiple BoLA-I and BoLA-DR genotypes, identifying 74 and 15 unique BoLA-I and BoLA-DR binding peptides, and as such forms probably the most comprehensive immunopeptidomic analysis of any eukaryotic pathogen to date. However, there is clear evidence that the characterisation of the immunopeptidome completed herein is only partial and further studies are required to gain a more comprehensive understanding of the T. parva BoLA-I and BoLA-II immunopeptidomes. Based on the dataset obtained in this study, we have developed a simplified curation system that can be applied in future T. parva immunopeptidomic studies to effectively remove coprecipitating peptides that can dominate eluted peptide datasets and restrict the data to peptides that are predicted to be genuine MHC-binders. Due to the high levels of diversity in both T. parva and bovine MHC genotypes, we propose that the information derived from immunopeptidomics can most productively be used in the development of novel vaccines, not as a means to identify individual epitopes that are specific to a particular combination of T. parva strain and bovine MHC allele, but rather as part of an integrated strategy that aims to identify a repertoire of proteins that can potentially elicit CD4 + and CD8 + T-cell responses across a breadth of cattle MHC genotypes and that are able to provide protection against the spectrum of T. parva strains present in endemic areas. Such an approach will build on the synergy of the multiple new technologies that are being used to study T. parva and bovine immunogenetics and offer new opportunities to tackle the conundrum of T-cell antigen identification, which remains an obstacle to developing novel vaccines against this important disease.