Celiac disease (CD) is one of the most frequent food hypersensitivities with a global seroprevalence of 1.4% and a biopsy-confirmed prevalence of 0.7% [1
]. This chronic immune-mediated enteropathy of the small intestine is triggered by the ingestion of storage proteins (gluten) from wheat, rye, and barley in genetically predisposed individuals [2
]. Due to the high amounts of glutamine (19.7–37.1 mol%) and proline (9.4–23.0 mol%) in the amino acid sequences of gluten proteins [3
], the human gastrointestinal enzymes are unable to digest them completely. Thus, peptides with a length of more than nine amino acids reach the small intestinal epithelium [4
]. The main genetic factors of CD are the human leukocyte antigen (HLA) class II alleles HLA-DQ2 and HLA-DQ8 of the major histocompatibility complex. Most CD patients (≈90%) carry the HLA-DQ2.5 allele and the remaining patients carry the HLA-DQ8 or HLA-DQ2.2 alleles. These class II molecules are expressed on the surface of B cells and antigen-presenting cells and specifically bind gluten peptides. These peptides are then recognized by CD4+ T cells, which in turn become activated and assist in immunologic processes, like antibody production [4
Human tissue transglutaminase (TG2), a calcium-dependent protein-glutamine γ-glutamyltransferase (EC 220.127.116.11) localized in the cytoplasm is responsible for protein crosslinking, e.g., fibronectin during wound healing [6
], and the construction and stabilization of different high-molecular-weight protein structures [7
], including crosslinking to collagen [8
] and other extracellular matrix components [9
]. Extracellular TG2 is implicated in the pathogenesis of a variety of diseases, but due to the complexity of its interactions with other matrix, receptor, cytosolic, and nuclear proteins, the specific contribution of TG2 remains elusive, as does the mechanism by which it is initially secreted from the cell [10
]. TG2 catalyzes the deamidation of gluten peptides and converts certain glutamine residues (e.g., QXP or QXXF, where X designates any amino acid) into negatively charged glutamic acid residues, which are a better binding motif for HLA-DQ2.5 leading to enhanced immunogenicity in CD [8
]. In addition, TG2 is also responsible for the covalent crosslinking reaction between glutamine and lysine and the resulting formation of Nε-(γ-glutamyl)-lysine isopeptide bonds (Figure 1
Crosslinking between gluten peptides and TG2 itself as lysine donor is of particular importance, because then TG2-gluten peptide complexes are formed. CD patients’ sera contain anti-TG2 IgA (and IgG or IgM) antibodies [12
] and TG2 was identified as the predominant autoantigen of CD [13
]. The current models to explain the formation of autoantibodies assume that TG2-specific B cells receive help from gluten-specific CD4+ T cells presented in the context of HLA-DQ2.5 or -DQ8 [14
]. Then several routes are possible: (A) according to the original hapten-carrier-model [15
], the complexes are taken up by B cell receptors (BCR), the gluten peptide is recognized by gluten-specific CD4+ T cells and these provide help to B cells to secrete anti-TG2 antibodies. (B) Additionally, TG2 may form crosslinks between neighboring BCRs and this could contribute to B cell reactivity. (C) Alternatively, gluten peptides might be crosslinked to the BCRs on the B cell surface by TG2 and thus be directly involved in the uptake and presentation to CD4+ T cells either in the same TG2-BCR complex (D) or with a neighboring BCR [16
]. After the BCR-mediated endocytosis of TG2 and the BCR-gluten peptide complexes, TG2 hydrolyzes the isopeptide bond of BCR and gluten peptide and the deamidated peptide is bound immediately to HLA-DQ and presented to CD4+ T cells.
Following the discovery of covalent TG2-gluten peptide complexes, the formation of these complexes was shown in a model system with human TG2 and two model peptides (QLQPFPQPQ
LPY, binding Q are underlined) derived from α-gliadins. Six lysine residues were shown to be involved in isopeptide bonds with glutamine residues of the model peptides by matrix-assisted laser desorption time-of-flight mass spectrometry (MALDI-TOF MS) and nano-electrospray ionization (ESI)-MS/MS [17
]. In addition, TG2 also creates multimers with itself, which can readily incorporate gluten peptides and these complexes might present an antigenic structure in the pathogenesis of CD that eventually triggers autoimmunity [18
]. To gain more insights into the molecular structures of TG2-gluten peptide complexes, the identification of isopeptides and especially their crosslinking sites is necessary. Therefore, the overall aim of this work was to identify isopeptides between TG2 and synthetic CD-active gluten peptides in different model systems.
Here we present the identification of 34 isopeptides between human recombinant TG2 and a model peptide derived from α-gliadins and 36 TG2-TG2 isopeptides in TG2 multimers. After identification of the TG2 lysine residues that are involved in crosslinking to the model peptide, the reaction was expanded to three model peptides with more glutamines in their sequences. These synthetic peptides were derived from different gliadin proteins and are known to be immunoreactive in CD with potential crosslinking sites [19
]. With this extended model system, we were not only able to identify the isopeptides, but it was also possible to obtain information about the localization of the crosslinking sites within the peptides.
2. Materials and Methods
All chemicals and solvents were at least HPLC or LC-MS grade. The CD-active model peptide (known immunogenic T-cell epitopes are given in bold [21
), derived from α-gliadins, and the model isopeptide standard PFPQPQ
, (with an isopeptide bond at the amino acids Q and K underlined), were purchased from peptides&elephants (Potsdam, Germany) with a purity of >95% and amidated C-termini. The peptides PQPQLPYPQPQLPY
QPL (P2; C89
) and VQGQGIIQPQQPAQL
) were obtained from Genscript (Hongkong, PR China) with a purity of >95%. Recombinant human TG2 was purchased from Zedira (Darmstadt, Germany) as a purified and lyophilized protein produced in sf9 insect cells. Trypsin (from bovine pancreas, TPCK-treated, ≥10,000 BAEE U/mg protein) was from Sigma-Aldrich (Steinheim, Germany).
2.2. Enzyme Activity Test of TG2
The determination of TG2 enzyme activity was performed with the Tissue Transglutaminase Assay kit (Zedira) [22
]. TG2 was diluted 1:10 with deionized water. Analyses of the TG2 sample and the positive control of the test kit were carried out in triplicates against deionized water as blank. The procedure was performed strictly as described by the manufacturer. The absorbances were read at 525 nm with an Infinite M200 microplate reader (Tecan, Salzburg, Austria).
2.3. Isopeptide Standard
The isopeptide standard PFPQPQLPY-NH2/NTPSFKER-NH2 (crosslinked sites are underlined) was dissolved in acetonitrile/water/formic acid (FA) (2:98:0.1) to a concentration of 0.5 ng/µL and directly used for the nLC-MS/MS analysis.
2.4. Model Reaction of TG2 and PepQ
The model reaction of TG2 (0.32 nmol/L) with PepQ was performed in Tris-HCl buffer (0.1 mol/L, pH 7.4, 10 mmol/L CaCl2
) at a molar ratio of TG2:PepQ of 1:150 at 37 °C for 120 min [17
]. For inactivation of TG2, all samples were heated at 95 °C for 10 min. The negative controls were prepared by adding PepQ after heat inactivation of TG2. The concentration of 10 mmol/L CaCl2
was used, because the information provided by the manufacturer indicated that this concentration is needed to activate human TG2.
2.5. Model Reaction at Different Molar Ratios
To study which lysine residues are preferred binding sites, TG2 was incubated with different ratios of PepQ (1:50; 1:10; 1:1; 10:1) in Tris-HCl buffer at 37 °C for 120 min as described above.
2.6. Model Reaction with Three Different Model Peptides
The model reaction of TG2 was repeated with the simultaneous addition of the three different peptides P1, P2, and P3. According to the first model reaction, the molar ratios were TG2:P1/P2/P3 of 1:50, respectively. The molar ratios of P1:P2:P3 were 1:1:1. All model reactions were done in triplicates, respectively.
2.7. Tryptic Digestion and Clean-Up by Solid Phase Extraction
A trypsin stock solution was added at a trypsin:substrate ratio of 1:100 (w/w) in 50 mmol/L (NH4)2CO3 to all samples. The solution was incubated at 37 °C for 24 h and the hydrolysis stopped with 3 µL FA to reach a pH value below 2. All samples were purified by solid phase extraction (SPE) using 50 mg Sep-Pak tC18 cc cartridges (Waters, Eschborn, Germany). The C18-cartridges were activated with methanol (1 mL), equilibrated with acetonitrile/water/FA (80:20:0.1; 1 mL), and washed with acetonitrile/water/FA (2:98:0.1; 5 × 1 mL). After loading the samples, the cartridges were washed again, and the isopeptides and peptides were eluted with acetonitrile/water/FA (40:60:0.1; 1 mL). The solvent was removed using a vacuum centrifuge (37 °C, 4 h, 800 Pa) and the samples were reconstituted in FA (0.1%, v/v). Prior to nLC-MS/MS analysis, the peptide concentrations of the reconstituted samples were determined with a NanoDrop Micro-UV–Vis spectrophotometer (NanoDrop One, Thermo Scientific, Madison, WI, USA) at 280 nm. The samples were diluted in the 96-well plates to a concentration of 200 ng/µL with acetonitrile/water/FA (2:98:0.1).
2.8. Nanoscale Liquid Chromatography-Tandem Mass Spectrometry
nLC-MS/MS analysis was carried out on an Ultimate 3000 nanoHLPC system (Dionex, Idstein, Germany) coupled to a Q Exactive HF mass spectrometer (Thermo Fisher Scientific, Dreieich, Germany). The nanoscale LC system consisted of a trap column (75 µm × 2 cm, self-packed with Reprosil-Pur C18 ODS-3 5 µm resin, Dr. Maisch, Ammerbuch, Germany) and an analytical column (75 µm × 40 cm, self-packed with Reprosil-Gold, C18, 3 µm resin, Dr. Maisch). After an injection of 5 µL, the peptides were delivered to the trap column using solvent A0 (0.1% FA in water) at a flow rate of 5 µL/min and separated on the analytical column using a 60 min linear gradient from 4% to 32% solvent B at a flow rate of 300 nL/min (solvent A1, 5% DMSO, 0.1% FA in water; solvent B, 5% DMSO, 0.1% FA in acetonitrile) [23
]. The MS was operated in data-dependent acquisition mode, automatically switching between MS1 and MS2 spectra. The mass-to-charge (m/z
) range of the acquisition of the MS1 spectra was 360–1300 m/z
at an Orbitrap full MS scan (60,000 resolution, 3 × 106
automatic gain control (AGC) target value, 50 ms maximum injection time). In MS2, peptide precursors were selected for fragmentation by higher energy collision-induced dissociation (isolation width of 1.7 Th, maximum injection time of 50 ms, AGC value of 2 × 105
). Analysis was performed using 25% normalized collision energy at a resolution of 30,000. For the analysis of the isopeptide standard a maximum injection time of 25 ms, AGC value of 1 × 105
and a resolution of 15,000 was used.
2.9. Isopeptide Identification Using MaxQuant
For data analysis, a reciprocal search workflow using one of the most commonly used proteomics software tools MaxQuant (version 18.104.22.168) was developed (Supplemental Figure S1
). The Thermo Xcalibur raw files were directly used as input in the MaxQuant software and searched against a human transglutaminase protein database containing 110 entries (UniProtKB, status January 2019) with a peptide-spectrum match (PSM)- and protein-level false discovery rate (FDR) of 1% [24
]. All identified tryptic TG2 peptides (Supplemental Figure S2
) were filtered for the presence of at least one lysine residue, which resulted in 87 detectable, lysine-containing TG2 peptides. The chemical formulas of these 87 TG2 peptides were calculated (UniProtKB accession no. P21980). Next, we configured these peptides as variable modifications in MaxQuant (TG2-modifications, β-side of the isopeptide, Supplemental Table S1
). The used settings were “anywhere” for position, “standard” for type and “Q” for modified amino acid. Theoretical proteases were configured to cleave the model peptides from existing gluten protein sequences with the following cleavage specificities: for PepQ: QP, YP; for P1: FP, YP; for P2: FL, LI; for P3: LV, LE. The parameters were set as follows for the individual search runs: Digestion mode—specific; maximum missed cleavage sites—2; variable modifications—each TG2-modification in one single search run; deamidation at Q; fasta files—UniProtKB accession no. P18573 for PepQ and P1, B6UKP4 for P2, P08453 for P3; contaminant fasta files included; fixed modifications—amidated C-term (only for PepQ); minimum score for modified peptides—10; main search peptide tolerance—4.5 ppm; mass tolerance for fragment ions—20 ppm; all other parameters were used as default settings. To verify the identified isopeptides by reversed search, PepQ (and its deamidated form PepE) were also configured as modifications in MaxQuant (α-side of the isopeptide, PepQ: C54
, PepE: C54
) and the raw files were searched against the TG2 sequence with the following parameters: Enzyme—trypsin/P; digestion mode—specific; maximum missed cleavage sites—2; variable modifications—PepQ, PepE; fasta file—UniProtKB accession no. P21980; minimum score for modified peptides—10; main search peptide tolerance—4.5 ppm; mass tolerance for fragment ions—20 ppm; all other parameters were used as default settings. The threshold for unambiguous localization was set to a localization probability of >75%. To confirm the identities of the isopeptides and the identification of the binding site within the isopeptides, the b- and y-fragments of both sides (TG2, β-side and gluten peptides, α-side) were assigned to the respective MS/MS spectra using the software tool MaxQuant Viewer [25
2.10. Assignment of MS/MS Fragments of the Isopeptide Sequences Using ProteinProspector
To further verify the identification of the isopeptides and of the crosslinking sites within the isopeptides, the b-, y- and internal fragments of both sides were calculated with the MS-Product feature of the ProteinProspector webpage (v.5.22.1, University of California, San Francisco, CA, USA) [26
]. The sequences of PepQ and the TG2-modifications were entered and the binding Q or K was replaced by “u” for the user-specified amino acid elemental composition of the other isopeptide side, respectively. These “u” compositions for the TG2-modifications were calculated by the formal addition of C5
(peptide-bound glutamine minus NH3
) to the TG2-peptide formulas. For the PepQ modification, the “u” composition was calculated by the formal addition of C6
NO (peptide-bound lysine minus NH3
) to the PepQ formula. ProteinProspector parameters were then set to calculate b-, y- and internal fragments and associated fragments due to water- and ammonia-loss. The charge states were calculated up to 5+ for the precursors and up to 3+ for the fragments.
2.11. Isopeptide Confirmation Using Skyline
Skyline (version 22.214.171.12496) was used for confirmation of the identified isopeptides and visualization of label-free peptide precursor chromatograms. PepQ (PFPQ4
LPY) was modified with an amidated C-terminus at Y and at Q4
either with the TG2-modifications, a deamidation or both to generate the targets, followed by subsequent generation of the appropriate precursors by Skyline. Each PepQ/TG2-modification/deamidation combination was verified according to the following parameters to reject false positively identified isopeptides and confirm confident peak picking: (1) The retention time had to match with the identified retention time of the MaxQuant search, (2) the isotopic dot product score had to be >0.9 (idotp—generated from comparing the expected precursor isotopic distribution to the observed distribution; scored from 0–1, where 1 is the highest) and (3) the comparison of retention time and idotp among the triplicates using the graphical tools had to fit and no detection of the signals in the negative controls had to be observed [27
2.12. Isopeptide Identification Using pLink
For comparative data analysis, the Thermo Xcalibur raw files were directly used as input in the pLink2 software (version 2.3) [28
] and searched against a user-curated database including the fasta files human tissue transglutaminase (UniProtKB accession no. P21980), the sequence of PepQ for the model system and the fasta files of three gluten proteins (UniProtKB accession no. P18573 for P1, B6UKP4 for P2, P08453 for P3) for the extended model system. The pLink2 search parameters were: Precursor mass tolerance 20 ppm, fragment mass tolerance 20 ppm, cross-linker isopeptide (cross-linking sites K and Q, linker mass −17.031, linker composition N(−1)H(−3)), fixed modification amidated C-term for PepQ, peptide length mininum 6 amino acids and maximum 60 amino acids per chain, peptide mass minimum 600 and maximum 6000 Da per chain, enzyme trypsin, three missed cleavages, FDR ≤ 1% at PSM level.
The raw files of the model system with the three different peptides were analyzed with MaxQuant, Skyline, and pLink2, as described above.
2.13. 3D-Structure Model of TG2
To visualize the 3D-structure of TG2 and assign the locations of the crosslinking sites, the sequence models of TG2 in the open conformation (PDB ID code 4PYG) and in the closed conformation (PDB ID code 3S3P) were imported to the 3D graphic software PyMol (The PyMOL Molecular Graphics System, version 2.0 Schrödinger, LLC, New York, NY, USA).
In this study we used a workflow with the proteomics tool MaxQuant, its integrated search engine Andromeda and Skyline as well as the crosslinking software tool pLink2 to identify isopeptides between TG2 and gluten-derived model peptides. When using these tools, the whole computational part is run without a client-server on the user’s computer [34
]. We have demonstrated a workflow to identify enzymatically built isopeptides as well as the localization of the crosslinking site within these peptides. In some cases, an unambiguous identification was not possible, because of missing specific fragment ion information, but the localization probability could still be limited to a short part of the peptide sequence. In total, we identified 34 isopeptides with 20 different lysine residues as crosslinking sites. Six of these crosslinking sites were already known as TG2-gluten peptide binding sites [17
] and eleven as lysine residues involved in TG2 multimer self-crosslinking [18
]. In the model system, 36 TG2-TG2 isopeptides were additionally detected with their crosslinking glutamine and lysine residues. Nine of these TG2-TG2 isopeptide crosslinking combinations were already known [18
], as well as six of the nine identified glutamine and nine of the eleven identified lysine residues involved in TG2-multimerization. Furthermore, the four most preferred binding lysine residues (K-425, K-590, K-600 and K-649) in the model system were identified by analyzing different TG2:PepQ ratios. K-590, K-600, K649 were already known as crosslinking sites [17
] and are located in the C-terminal domain of TG2. K-425 was shown to be involved in TG2 self-multimerization [18
] and is part of the core region next to the catalytic core. All four lysine residues are exposed positions according to the published structures in the PDB database (Figure 4
TG2 performs both crosslinking and deamidation of glutamine residues. The model peptide PepQ (PFPQ4
LPY) comprises two possible targets, Q4
, for TG2. We identified isopeptides without deamidation of the second glutamine in PepQ and some with deamidated PepQ (Table 1
) at either Q4
. Vader et al. [30
] investigated the TG2 deamidation pattern depending on the neighboring C-terminal amino acid. Applied to PepQ, the motif PQ6
L is a good target and the motif PQ4
P a weak target for deamidation by TG2. The crosslink formation took place at the expected target Q6
in the isopeptides without deamidation. In earlier studies of Dorum et al. [21
was identified as a crosslinking target for TG2. When looking at the PepE sequences within isopeptides, the crosslink was almost always at Q4
and the deamidation at Q6
. In this case and in keeping with previous findings [21
], these data may indicate first deamidation at the preferred Q6
followed by crosslinking at the less preferable Q4
, both reactions implemented by TG2. In case of deamidated Q4
, the results indicate crosslinking by TG2 followed by a non-enzymatic deamidation due to the alkaline pH conditions during tryptic digestion [31
]. The deamidation of the glutamine residues in only deamidated model peptides may be caused by TG2 and additionally the pH conditions. One limitation of the current experimental design is that it does not allow a clear differentiation between enzymatic and non-enzymatic deamidation, because the original intent was to focus on the identification of crosslinking sites, rather than deamidation sites. Further experiments would be necessary to look more closely into the specific mechanisms of crosslinking versus deamidation. The localization probabilities of the modifications are given due to the measurement of specific fragments situated around the targets. This leads to probabilities <75% in some cases, when some of these fragments are missing. In these cases, only the subpart of the sequence can be identified where the possibly crosslinked glutamine residues are located.
It is well established that TG2 is very specific in its deamidation pattern [30
], which can be explained by strong effects of the neighboring C-terminal amino acids. In our expanded model system with three gluten model peptides, we demonstrated that TG2 follows the known selective deamidation pattern in almost all deamidated isopeptides. Only in a few cases, where the identification scores were low or the specific fragments were absent, it was not feasible to determine the unambiguous localization of the deamidation. The crosslinking reaction also depends on this selectivity of TG2, but with more exceptions. For the shorter model peptide P1 with less glutamine residues, most of the crosslinking sites within the isopeptides were identified clearly due to the presence of the specific fragments. For the longest peptide P2 with nine glutamine residues, the identification of one specific crosslinking site was more difficult, especially when the isopeptides were identified with a low score. In these cases, it was nevertheless possible to identify the subpart within the peptide sequence that most likely carries the modification. These findings underscore partly the known TG2 selectivity by showing a clear preference for the deamidation pattern. Our data also demonstrate a difference in the crosslinking selectivity or at least a dependence on the previously deamidated glutamine residues.
The alignment of crosslinking sites of TG2 (Supplemental Table S2
) including the four surrounding amino acids in both C- and N-terminal direction did not reveal an obvious pattern regarding preferred chemical environments around the reactive sites in the primary structure. Therefore, it seems likely that the secondary structure of TG2 is more important to determine which lysine residues are preferred crosslinking sites. Further experiments, e.g., using amino acid substitution analysis on recombinant TG2 in combination with computational modelling, would be useful to get more detailed insights into secondary structural elements that predict which lysine residues are reactive crosslinking sites and which ones are not.
In summary, using a reciprocal search workflow with commonly used proteomics tools and a recently developed crosslinking tool helps to identify many isopeptides with a high certainty and a few isopeptides just with one of the strategies. These novel insights into the molecular structures of TG2-gluten peptide complexes may help clarify the function of extracellular TG2 in the initiation of CD autoimmunity and the role of anti-TG2 autoantibodies. To shed more light on the immunological and physiological relevance of these complexes, in vivo experiments on the extent and the activation of B cells are necessary. Further experiments together with partners bringing in complementary expertise, especially in immunology, would be needed to address the most relevant point regarding the link between our findings and TG2-mediated gluten peptide presentation in CD. Crosslinking reactions are implicated in a number of inflammatory diseases, degenerative disorders, and even cancer, so this strategy may open up multiple opportunities for further research. Future efforts will aim to determine isopeptides of TG2 with physiologically relevant gluten hydrolysates from wheat, rye, and barley.