Comprehensive Detection of Isopeptides between Human Tissue Transglutaminase and Gluten Peptides

Celiac disease (CD) is a chronic inflammation of the small intestine triggered by the ingestion of gluten in genetically predisposed individuals. Tissue transglutaminase (TG2) is a key factor in CD pathogenesis, because it catalyzes both the deamidation of specific glutamine residues and the formation of covalent Nε-(γ-glutamyl)-lysine isopeptide crosslinks resulting in TG2–gluten peptide complexes. These complexes are thought to activate B cells causing the secretion of anti-TG2 autoantibodies that serve as diagnostic markers for CD, although their pathogenic role remains unclear. To gain more insight into the molecular structures of TG2-gluten peptide complexes, we used different proteomics software tools that enable the comprehensive identification of isopeptides. Thus, 34 different isopeptides involving 20 TG2 lysine residues were identified in a model system, only six of which were previously known. Additionally, 36 isopeptides of TG2-TG2 multimers were detected. Experiments with different TG2-gluten peptide molar ratios revealed the most preferred lysine residues involved in isopeptide crosslinking. Expanding the model system to three gluten peptides with more glutamine residues allowed the localization of the preferred glutamine crosslinking sites. These new insights into the structure of TG2-gluten peptide complexes may help clarify the role of extracellular TG2 in CD autoimmunity and in other inflammatory diseases.


Introduction
Celiac disease (CD) is one of the most frequent food hypersensitivities with a global seroprevalence of 1.4% and a biopsy-confirmed prevalence of 0.7% [1]. This chronic immune-mediated enteropathy of the small intestine is triggered by the ingestion of storage proteins (gluten) from wheat, rye, and barley in genetically predisposed individuals [2]. Due to the high amounts of glutamine (19.7-37.1 mol%) and proline (9.4-23.0 mol%) in the amino acid sequences of gluten proteins [3], the human gastrointestinal enzymes are unable to digest them completely. Thus, peptides with a length of more than nine amino acids reach the small intestinal epithelium [4]. The main genetic factors of CD are the human leukocyte antigen (HLA) class II alleles HLA-DQ2 and HLA-DQ8 of the major histocompatibility complex. Most CD patients (≈90%) carry the HLA-DQ2.5 allele and the remaining patients carry the HLA-DQ8 or HLA-DQ2.2 alleles. These class II molecules are expressed on the surface of B cells and antigen-presenting cells and specifically bind gluten peptides. These peptides are then recognized Crosslinking between gluten peptides and TG2 itself as lysine donor is of particular importance, because then TG2-gluten peptide complexes are formed. CD patients' sera contain anti-TG2 IgA (and IgG or IgM) antibodies [12] and TG2 was identified as the predominant autoantigen of CD [13]. The current models to explain the formation of autoantibodies assume that TG2-specific B cells receive help from gluten-specific CD4+ T cells presented in the context of HLA-DQ2.5 or -DQ8 [14]. Then several routes are possible: (A) according to the original hapten-carrier-model [15], the complexes are taken up by B cell receptors (BCR), the gluten peptide is recognized by gluten-specific CD4+ T cells and these provide help to B cells to secrete anti-TG2 antibodies. (B) Additionally, TG2 may form crosslinks between neighboring BCRs and this could contribute to B cell reactivity. (C) Alternatively, gluten peptides might be crosslinked to the BCRs on the B cell surface by TG2 and thus be directly involved in the uptake and presentation to CD4+ T cells either in the same TG2-BCR complex (D) or with a neighboring BCR [16]. After the BCR-mediated endocytosis of TG2 and the BCR-gluten Crosslinking between gluten peptides and TG2 itself as lysine donor is of particular importance, because then TG2-gluten peptide complexes are formed. CD patients' sera contain anti-TG2 IgA (and IgG or IgM) antibodies [12] and TG2 was identified as the predominant autoantigen of CD [13]. The current models to explain the formation of autoantibodies assume that TG2-specific B cells receive help from gluten-specific CD4+ T cells presented in the context of HLA-DQ2.5 or -DQ8 [14]. Then several routes are possible: (A) according to the original hapten-carrier-model [15], the complexes are taken up by B cell receptors (BCR), the gluten peptide is recognized by gluten-specific CD4+ T cells and these provide help to B cells to secrete anti-TG2 antibodies. (B) Additionally, TG2 may form crosslinks between neighboring BCRs and this could contribute to B cell reactivity. (C) Alternatively, gluten peptides might be crosslinked to the BCRs on the B cell surface by TG2 and thus be directly involved in the uptake and presentation to CD4+ T cells either in the same TG2-BCR complex (D) or with a neighboring BCR [16]. After the BCR-mediated endocytosis of TG2 and the BCR-gluten peptide complexes, TG2 hydrolyzes the isopeptide bond of BCR and gluten peptide and the deamidated peptide is bound immediately to HLA-DQ and presented to CD4+ T cells.
Following the discovery of covalent TG2-gluten peptide complexes, the formation of these complexes was shown in a model system with human TG2 and two model peptides (QLQPFPQPQLPY, PQPQLPYPQPQLPY, binding Q are underlined) derived from α-gliadins. Six lysine residues were shown to be involved in isopeptide bonds with glutamine residues of the model peptides by matrix-assisted laser desorption time-of-flight mass spectrometry (MALDI-TOF MS) and nano-electrospray ionization (ESI)-MS/MS [17]. In addition, TG2 also creates multimers with itself, which can readily incorporate gluten peptides and these complexes might present an antigenic structure in the pathogenesis of CD that eventually triggers autoimmunity [18]. To gain more insights into the molecular structures of TG2-gluten peptide complexes, the identification of isopeptides and especially their crosslinking sites is necessary. Therefore, the overall aim of this work was to identify isopeptides between TG2 and synthetic CD-active gluten peptides in different model systems.
Here we present the identification of 34 isopeptides between human recombinant TG2 and a model peptide derived from α-gliadins and 36 TG2-TG2 isopeptides in TG2 multimers. After identification of the TG2 lysine residues that are involved in crosslinking to the model peptide, the reaction was expanded to three model peptides with more glutamines in their sequences. These synthetic peptides were derived from different gliadin proteins and are known to be immunoreactive in CD with potential crosslinking sites [19,20]. With this extended model system, we were not only able to identify the isopeptides, but it was also possible to obtain information about the localization of the crosslinking sites within the peptides.

Material
All chemicals and solvents were at least HPLC or LC-MS grade. The CD-active model peptide (known immunogenic T-cell epitopes are given in bold [21]) PFPQPQLPY-NH 2 (PepQ; C 54 H 76 N 12 O 12 ), derived from α-gliadins, and the model isopeptide standard PFPQPQLPY-NH 2 /NTPSFKER-NH 2 , (with an isopeptide bond at the amino acids Q and K underlined), were purchased from peptides&elephants (Potsdam, Germany) with a purity of >95% and amidated C-termini. Recombinant human TG2 was purchased from Zedira (Darmstadt, Germany) as a purified and lyophilized protein produced in sf9 insect cells. Trypsin (from bovine pancreas, TPCK-treated, ≥10,000 BAEE U/mg protein) was from Sigma-Aldrich (Steinheim, Germany).

Enzyme Activity Test of TG2
The determination of TG2 enzyme activity was performed with the Tissue Transglutaminase Assay kit (Zedira) [22]. TG2 was diluted 1:10 with deionized water. Analyses of the TG2 sample and the positive control of the test kit were carried out in triplicates against deionized water as blank. The procedure was performed strictly as described by the manufacturer. The absorbances were read at 525 nm with an Infinite M200 microplate reader (Tecan, Salzburg, Austria).

Model Reaction of TG2 and PepQ
The model reaction of TG2 (0.32 nmol/L) with PepQ was performed in Tris-HCl buffer (0.1 mol/L, pH 7.4, 10 mmol/L CaCl 2 ) at a molar ratio of TG2:PepQ of 1:150 at 37 • C for 120 min [17]. For inactivation of TG2, all samples were heated at 95 • C for 10 min. The negative controls were prepared by adding PepQ after heat inactivation of TG2. The concentration of 10 mmol/L CaCl 2 was used, because the information provided by the manufacturer indicated that this concentration is needed to activate human TG2.

Model Reaction with Three Different Model Peptides
The model reaction of TG2 was repeated with the simultaneous addition of the three different peptides P1, P2, and P3. According to the first model reaction, the molar ratios were TG2:P1/P2/P3 of 1:50, respectively. The molar ratios of P1:P2:P3 were 1:1:1. All model reactions were done in triplicates, respectively.

Tryptic Digestion and Clean-Up by Solid Phase Extraction
A trypsin stock solution was added at a trypsin:substrate ratio of 1:100 (w/w) in 50 mmol/L (NH 4 ) 2 CO 3 to all samples. The solution was incubated at 37 • C for 24 h and the hydrolysis stopped with 3 µL FA to reach a pH value below 2. All samples were purified by solid phase extraction (SPE) using 50 mg Sep-Pak tC18 cc cartridges (Waters, Eschborn, Germany). The C 18 -cartridges were activated with methanol (1 mL), equilibrated with acetonitrile/water/FA (80:20:0.1; 1 mL), and washed with acetonitrile/water/FA (2:98:0.1; 5 × 1 mL). After loading the samples, the cartridges were washed again, and the isopeptides and peptides were eluted with acetonitrile/water/FA (40:60:0.1; 1 mL). The solvent was removed using a vacuum centrifuge (37 • C, 4 h, 800 Pa) and the samples were reconstituted in FA (0.1%, v/v). Prior to nLC-MS/MS analysis, the peptide concentrations of the reconstituted samples were determined with a NanoDrop Micro-UV-Vis spectrophotometer (NanoDrop One, Thermo Scientific, Madison, WI, USA) at 280 nm. The samples were diluted in the 96-well plates to a concentration of 200 ng/µL with acetonitrile/water/FA (2:98:0.1).

Nanoscale Liquid Chromatography-Tandem Mass Spectrometry
nLC-MS/MS analysis was carried out on an Ultimate 3000 nanoHLPC system (Dionex, Idstein, Germany) coupled to a Q Exactive HF mass spectrometer (Thermo Fisher Scientific, Dreieich, Germany). The nanoscale LC system consisted of a trap column (75 µm × 2 cm, self-packed with Reprosil-Pur C18 ODS-3 5 µm resin, Dr. Maisch, Ammerbuch, Germany) and an analytical column (75 µm × 40 cm, self-packed with Reprosil-Gold, C18, 3 µm resin, Dr. Maisch). After an injection of 5 µL, the peptides were delivered to the trap column using solvent A0 (0.1% FA in water) at a flow rate of 5 µL/min and separated on the analytical column using a 60 min linear gradient from 4% to 32% solvent B at a flow rate of 300 nL/min (solvent A1, 5% DMSO, 0.1% FA in water; solvent B, 5% DMSO, 0.1% FA in acetonitrile) [23]. The MS was operated in data-dependent acquisition mode, automatically switching between MS1 and MS2 spectra. The mass-to-charge (m/z) range of the acquisition of the MS1 spectra was 360-1300 m/z at an Orbitrap full MS scan (60,000 resolution, 3 × 10 6 automatic gain control (AGC) target value, 50 ms maximum injection time). In MS2, peptide precursors were selected for fragmentation by higher energy collision-induced dissociation (isolation width of 1.7 Th, maximum injection time of 50 ms, AGC value of 2 × 10 5 ). Analysis was performed using 25% normalized collision energy at a resolution of 30,000. For the analysis of the isopeptide standard a maximum injection time of 25 ms, AGC value of 1 × 10 5 and a resolution of 15,000 was used.

Isopeptide Identification Using MaxQuant
For data analysis, a reciprocal search workflow using one of the most commonly used proteomics software tools MaxQuant (version 1.6.0.1) was developed (Supplemental Figure S1). The Thermo Xcalibur raw files were directly used as input in the MaxQuant software and searched against a human transglutaminase protein database containing 110 entries (UniProtKB, status January 2019) with a peptide-spectrum match (PSM)-and protein-level false discovery rate (FDR) of 1% [24]. All identified tryptic TG2 peptides (Supplemental Figure S2) were filtered for the presence of at least one lysine residue, which resulted in 87 detectable, lysine-containing TG2 peptides. The chemical formulas of these 87 TG2 peptides were calculated (UniProtKB accession no. P21980). Next, we configured these peptides as variable modifications in MaxQuant (TG2-modifications, β-side of the isopeptide, Supplemental Table S1). The used settings were "anywhere" for position, "standard" for type and "Q" for modified amino acid. Theoretical proteases were configured to cleave the model peptides from existing gluten protein sequences with the following cleavage specificities: for PepQ: QP, YP; for P1: FP, YP; for P2: FL, LI; for P3: LV, LE. The parameters were set as follows for the individual search runs: Digestion mode-specific; maximum missed cleavage sites-2; variable modifications-each TG2-modification in one single search run; deamidation at Q; fasta files-UniProtKB accession no. P18573 for PepQ and P1, B6UKP4 for P2, P08453 for P3; contaminant fasta files included; fixed modifications-amidated C-term (only for PepQ); minimum score for modified peptides-10; main search peptide tolerance-4.5 ppm; mass tolerance for fragment ions-20 ppm; all other parameters were used as default settings. To verify the identified isopeptides by reversed search, PepQ (and its deamidated form PepE) were also configured as modifications in MaxQuant (α-side of the isopeptide, PepQ: C 54 H 73 N 11 O 12 , PepE: C 54 H 72 N 10 O 13 ) and the raw files were searched against the TG2 sequence with the following parameters: Enzyme-trypsin/P; digestion mode-specific; maximum missed cleavage sites-2; variable modifications-PepQ, PepE; fasta file-UniProtKB accession no. P21980; minimum score for modified peptides-10; main search peptide tolerance-4.5 ppm; mass tolerance for fragment ions-20 ppm; all other parameters were used as default settings. The threshold for unambiguous localization was set to a localization probability of >75%. To confirm the identities of the isopeptides and the identification of the binding site within the isopeptides, the b-and y-fragments of both sides (TG2, β-side and gluten peptides, α-side) were assigned to the respective MS/MS spectra using the software tool MaxQuant Viewer [25].

Assignment of MS/MS Fragments of the Isopeptide Sequences Using ProteinProspector
To further verify the identification of the isopeptides and of the crosslinking sites within the isopeptides, the b-, y-and internal fragments of both sides were calculated with the MS-Product feature of the ProteinProspector webpage (v.5.22.1, University of California, San Francisco, CA, USA) [26]. The sequences of PepQ and the TG2-modifications were entered and the binding Q or K was replaced by "u" for the user-specified amino acid elemental composition of the other isopeptide side, respectively. These "u" compositions for the TG2-modifications were calculated by the formal addition of C 5 H 5 NO 2 (peptide-bound glutamine minus NH 3 ) to the TG2-peptide formulas. For the PepQ modification, the "u" composition was calculated by the formal addition of C 6 H 9 NO (peptide-bound lysine minus NH 3 ) to the PepQ formula. ProteinProspector parameters were then set to calculate b-, y-and internal fragments and associated fragments due to water-and ammonia-loss. The charge states were calculated up to 5+ for the precursors and up to 3+ for the fragments.

Isopeptide Confirmation Using Skyline
Skyline (version 4.1.0.11796) was used for confirmation of the identified isopeptides and visualization of label-free peptide precursor chromatograms. PepQ (PFPQ 4 PQ 6 LPY) was modified with an amidated C-terminus at Y and at Q 4 and Q 6 either with the TG2-modifications, a deamidation or both to generate the targets, followed by subsequent generation of the appropriate precursors by Skyline. Each PepQ/TG2-modification/deamidation combination was verified according to the following parameters to reject false positively identified isopeptides and confirm confident peak picking: (1) The retention time had to match with the identified retention time of the MaxQuant search, (2) the isotopic dot product score had to be >0.9 (idotp-generated from comparing the expected precursor isotopic distribution to the observed distribution; scored from 0-1, where 1 is the highest) and (3) the comparison of retention time and idotp among the triplicates using the graphical tools had to fit and no detection of the signals in the negative controls had to be observed [27].

Isopeptide Identification Using pLink
For comparative data analysis, the Thermo Xcalibur raw files were directly used as input in the pLink2 software (version 2.3) [28,29] and searched against a user-curated database including the fasta files human tissue transglutaminase (UniProtKB accession no. P21980), the sequence of PepQ for the model system and the fasta files of three gluten proteins (UniProtKB accession no. P18573 for P1, B6UKP4 for P2, P08453 for P3) for the extended model system. The pLink2 search parameters were: Precursor mass tolerance 20 ppm, fragment mass tolerance 20 ppm, cross-linker isopeptide (cross-linking sites K and Q, linker mass −17.031, linker composition N(−1)H(−3)), fixed modification amidated C-term for PepQ, peptide length mininum 6 amino acids and maximum 60 amino acids per chain, peptide mass minimum 600 and maximum 6000 Da per chain, enzyme trypsin, three missed cleavages, FDR ≤ 1% at PSM level.
The raw files of the model system with the three different peptides were analyzed with MaxQuant, Skyline, and pLink2, as described above.

3D-Structure Model of TG2
To visualize the 3D-structure of TG2 and assign the locations of the crosslinking sites, the sequence models of TG2 in the open conformation (PDB ID code 4PYG) and in the closed conformation (PDB ID code 3S3P) were imported to the 3D graphic software PyMol (The PyMOL Molecular Graphics System, version 2.0 Schrödinger, LLC, New York, NY, USA).

Determination of TG2 Enzyme Activity
TG2 enzyme activity was analyzed based on the chromogenic hydroxamate detection principle using Z-QQPF as the amine acceptor substrate and hydroxylamine as amine donor [22]. TG2 incorporates hydroxylamine into Z-QQPF to form Z-glutamyl-hydroxamate-QPF that develops a colored complex with iron (III) detectable at 525 nm. The activity of one unit is defined as the amount of enzyme, which causes the formation of 1.0 µmole of Z-glutamyl-hydroxamate-QPF per minute. The activity of TG2 was 2160 units/mg (manufacturer's certified value: 2554 units/mg). Thus, the TG2 used was confirmed to be active and suitable for all further experiments.

Identification of TG2-Peptides Involved in Isopeptide Formation
In order to identify the lysine residues involved in crosslinking to a glutamine residue of PepQ, MaxQuant searches including variable modifications for all lysine-containing and tryptic TG2 peptides were performed. Altogether, 25 TG2-derived peptides with 20 different lysine residues that are involved in isopeptide formation were identified in the TG2-PepQ model system. To verify the identified isopeptides, the data was also searched with the crosslinking software tool pLink2. Table 1 shows the identified isopeptides with the lysine positions in the amino acid sequence of TG2, the sequences of the identified TG2 peptides (β-side of the isopeptide) with the crosslinking lysine residues, the sequence of PepQ and the deamidated form of PepQ (called PepE in the following, α-side of the isopeptide) with the crosslinking glutamine residues and additional deamidation sites, as well as the m/z values of the precursor ions and their charge states and the pLink2 identification E-value.
Since two reciprocal data analysis steps were performed, two scores were received for each isopeptide that differed from one another in most cases. The α-score was calculated from the search against the α-gliadin fasta for PepQ carrying either TG2-modification, whereas the β-score was calculated from the reversed search against the TG2 fasta for TG2 peptides carrying PepQ or PepE as modification ( Table 1). One of the crosslinked peptides in the isopeptide (in most cases PepQ/PepE) often fragmented better than the other one thus resulting in different scores [26]. We defined on the basis of the data that one of the two isopeptide scores should be >100 and the other one >40 (default setting for modified peptides in MaxQuant) as a threshold for confident identification. The isopeptides DLYLENPEIKIR/PepQ with the scores 164.89 (α-side) and 183.03 (β-side) and EDITHTYKYPEGSSEER/PepQ with the scores 133.60 (α-side) and 141.60 (β-side) were the highest scoring isopeptides in our analysis. Further, we manually curated the MS/MS spectra of all isopeptides using the MaxQuant Viewer. For that, the highest scoring MS/MS scan number per identified isopeptide was loaded into the Viewer tool. The signals were annotated with the b-and y-fragments. The isopeptide PepQ/LAEKEETGMAMR is highlighted as an example in Figure 2. First, the MaxQuant search result of PepQ carrying the TG2 isopeptide modification "le" (=LAEKEETGMAMR) at Q6 was loaded and all annotated y-and b-ion fragments were highlighted ( Figure 2A). In Figure 2B the identified reverse isopeptide was loaded into the MaxQuant Viewer, so here the b-and y-fragments of the LAEKEETGMAMR peptide carrying the PepQ modification at K is shown ( Figure 2B). For confirmation, a further annotation was done manually by combining the information from both spectral annotations ( Figure 2C). Here not only the fragment ions annotated by MaxQuant (msms.txt output file) [24] were indicated, but also internal fragment ions (double fragmentation on both crosslinked peptide sequences) calculated with the MS-Product feature of ProteinProspector [26].

Identification of TG2-TG2 Crosslinks
Previous studies showed that multiple glutamine and lysine residues of TG2 were involved in TG2-TG2 self-crosslinking [18]. Using the pLink2 software for the verification of the TG2-PepQ isopeptides, the TG2 multimers were identified parallel with an E-value <0.01 and at least detected in two or more MS 2 scans. In the model system 36 different TG2-TG2 isopeptides were identified (Table 2), whereas 8 combinations of the crosslinked sites were known [18].

Identification of Crosslinks and Deamidation Sites Within PepQ
TG2 performs both crosslinking and deamidation reactions at glutamine residues. PepQ has two glutamine residues in its sequence, which can be either crosslinked or deamidated. Vader et al. [30] showed that the Q is no target for TG2 in a QP sequence, but that the sequence QXP is a very good target for TG2 due to the neighboring C-terminal amino acids. The model peptide with only deamidation on one or two Qs and no crosslinking modification were also identified with the default MaxQuant search settings. The localization probabilities for the crosslinking and deamidation were obtained with MaxQuant and verified with pLink2. The isopeptide crosslinking site in PepQ was located at Q 6 in all identified isopeptides with probabilities for the correct identification ranging from 97.2-100% ( Table 1). To verify the identified isopeptides, the data were additionally searched with pLink2 and the resulting E-values are also given in Table 1. Crosslinking sites were unambiguously identified for almost all isopeptides with PepQ (Supplemental Figure S3). One example is shown in Figure 2A for the isopeptide PepQ/LAEKEETGMAMR, for which the specific fragments b 5 , b 6 , y 5 , and y 4 as the relevant fragment ions for unambiguous site determination were all confidently detected.
Considering the deamidated form PepE, the localization of the crosslinking site varies between Q 4 and Q 6 , but the probabilities for correct site localization were 95.8-100%. For example, in the isopeptide FLKNAGR/PepE, the isopeptide probability was 99.7% for Q 6 , and the deamidation probability for Q 4 was 99.7% as well, because there are only two Q's present in the peptide sequence. These findings indicate that in the isopeptides with a deamidated Q 6 (7 isopeptides detected, see Table 1), TG2 most likely first deamidated the preferred Q and subsequently built the isopeptide bond with Q 4 . In the isopeptides with a deamidated Q 4 (4 isopeptides detected, see Table 1), probably the isopeptide bond was first formed by TG2 on Q 6 , and the additional deamidation may have been caused by a non-enzymatic process due to the basic pH conditions of the tryptic digestion protocol [31]. Spontaneous deamidation of PepQ at either Q 4 or Q 6 and less frequently at both sites was detected in the control experiments with inactivated TG2, indicating that the non-enzymatic process did occur under the reaction conditions used in this study. Table 1B shows the two isopeptides with ambiguous crosslinking sites of PepQ and PepE. ANHLNKLEAK/PepE has a localization probability of 50% for Q 4 or Q 6 within PepE, because all specific fragments (b 5 , b 6 , y 5 , and y 4 ) were absent (Supplemental Figure S3R). For SVPLCILYEKYR/PepQ no determination of localization probabilities was possible, because this isopeptide was only identified from its β-side (Supplemental Figure S3S).

Visualization of Isopeptides with Skyline
In order to confirm and visualize the MS1 precursor chromatograms of all isopeptides, we used the software tool Skyline [32] to confirm both sides of the isopeptides. TG2-lysine peptides carrying crosslinked PepQ/PepE as well as PepQ/PepE carrying crosslinked TG2-lysine peptides were likewise investigated with Skyline. Each isopeptide was confirmed by retention time, idotp value, and the comparison between samples and negative controls, i.e., we obtained a Gaussian peak shape in the samples for all isopeptides, but no signal (intensity <1 × 10 2 ) in the negative controls.

Estimation of Preferred Lysine Residues
To study the preferred binding sites within the TG2-PepQ complexes, the model system was expanded to a total of five different TG2:PepQ molar ratios (1:150; 1:50; 1:10; 1:1; 10:1). By comparing the intensities of the signals of the five ratios for every isopeptide individually, the most preferred lysine residues were identified. For the isopeptides SLIVGLKISTK/PepQ, QKR/PepQ, TVEIPDPVEAGEEVKVR/PepQ, and DLYLENPEIKIR/PepQ, a signal with a peak area >1 × 10 6 and a Gaussian peak shape was already monitored at the ratio 10:1 and with increasing TG2-PepQ ratios the peak areas also increased ( Figure 3). lysine residues were identified. For the isopeptides SLIVGLKISTK/PepQ, QKR/ PepQ, TVE Thus, the four lysine residues K-425, K-590, K-600, and K-649 were the most preferred crosslinking sites in the model reaction. For the less preferred lysine residues (K-【5, K-464, K-562, K-598, K-672, K-677) within the identified isopeptides, signals were observed first in the 1:1 or 1:10 ratio samples with a peak area >1 × 10 6 (data not shown). Isopeptides with least preferred lysine residues only showed signals in the model system with the highest amount of PepQ (1:150) (data not shown, for an overview of crosslinking sites in the TG2 sequence, see Supplemental Figure S4).

Location of the Complex-Forming Lysine Residues in the 3D-Model of TG2
As previously demonstrated by Stamnaes et al. [18], the 3D-structure of TG2 shows an accumulation of the binding lysine residues in the C-terminal domain and in the catalytic core with the active site C-277, H-335 and D-358 ( Figure 4). Most of these lysine residues are located in close proximity to the active site or at exposed sites, especially in the active open conformation. The Nterminal domain of the enzyme seems to remain without modified lysine residues in the most cases, which is important, because the epitopes that are recognized by the anti-TG2 antibodies are located in this region and are not blocked [33]. Thus, the four lysine residues K-425, K-590, K-600, and K-649 were the most preferred crosslinking sites in the model reaction. For the less preferred lysine residues (K-205, K-464, K-562, K-598, K-672, K-677) within the identified isopeptides, signals were observed first in the 1:1 or 1:10 ratio samples with a peak area >1 × 10 6 (data not shown). Isopeptides with least preferred lysine residues only showed signals in the model system with the highest amount of PepQ (1:150) (data not shown, for an overview of crosslinking sites in the TG2 sequence, see Supplemental Figure S4).

Location of the Complex-Forming Lysine Residues in the 3D-Model of TG2
As previously demonstrated by Stamnaes et al. [18], the 3D-structure of TG2 shows an accumulation of the binding lysine residues in the C-terminal domain and in the catalytic core with the active site C-277, H-335 and D-358 ( Figure 4). Most of these lysine residues are located in close proximity to the active site or at exposed sites, especially in the active open conformation. The N-terminal domain of the enzyme seems to remain without modified lysine residues in the most cases, which is important, because the epitopes that are recognized by the anti-TG2 antibodies are located in this region and are not blocked [33]. ; the lysine residues K464, K468, K598, K600, and K602 are not visible in this scheme, because these parts are not resolved in the crystal structure. The identified lysine residues are marked in orange, the four preferred lysine residues are highlighted, the catalytic triad is colored in blue, the C-terminal domain in grey, the core region in dark green, and the N-terminal region in light green.

Identification of Isopeptides in the Extended Model System
The model system was extended to three different gluten peptides and the samples were analyzed with the described workflow with MaxQuant and Skyline as well as with pLink2. The deamidated forms of the model peptides were also identified without crosslinking modifications. The observed isopeptides within the extended model system were used to identify the preferred The identified lysine residues are marked in orange, the four preferred lysine residues are highlighted, the catalytic triad is colored in blue, the C-terminal domain in grey, the core region in dark green, and the N-terminal region in light green.

Identification of Isopeptides in the Extended Model System
The model system was extended to three different gluten peptides and the samples were analyzed with the described workflow with MaxQuant and Skyline as well as with pLink2. The deamidated forms of the model peptides were also identified without crosslinking modifications. The observed isopeptides within the extended model system were used to identify the preferred glutamine crosslinking sites in the gluten peptides. Tables 3-5 show all identified isopeptides with their MaxQuant scores and crosslinking probabilities for every glutamine within the gluten peptide sequence as well as with the pLink2 E-values and the number of MS 2 scans they are identified with. Parts A of the tables report the isopeptides with unambiguous localization of the crosslinking and deamidation sites, Parts B show the ambiguous ones, and Parts C the isopeptides that were only identified with pLink2.
For P1, the most preferred glutamine was Q 11 for the isopeptide bond, and Q 4 for the deamidation. For the exceptional cases when the crosslinking site was located at Q 4 (3×) always one deamidation was situated at Q 11 . When the crosslinking site was located at the unusual target Q 9 P, the most preferred deamidated sites were Q 11 and Q 4 . For identifications with a higher score, the probabilities for one specific position are also higher. These probabilities depend on whether the specific fragment ions around the crosslinking sites were identified or not. In peptide sequences with glutamine residues located close together, the fragments between them were often not identified. Some isopeptides were identified only by the workflow with MaxQuant (6×). Others were only found with pLink2 (4×) from MS 2 scans with a low intensity. The isopeptide bond in P2 tended to be at Q 2 , Q 10 or Q 14 and the deamidated glutamine residues were mostly at Q 4 , Q 10 and Q 14 . Q 10 represents the motif Q 10 XP, which is known as a preferred glutamine residue for a modification by TG2. In all isopeptides involving P2 this preferred Q 10 was either crosslinked (10×) or at least deamidated (21×). The unusual TG2 target Q 8 P was involved in isopeptide formation in just a few cases (2×). The localization probabilities for the crosslinking and deamidation sites for the isopeptides with P2 were almost all unambiguously identified. The position of the isopeptide bond within P3 was observed more in the front part of the peptide at Q 4 (10×) or Q 7 (9×), whereas Q 4 is known as a good target for TG2 [30]. For these isopeptides in which the crosslinking site was located in the rear part, it was always at Q 12 (7×), which is also known as a good TG2 motif. In turn, the deamidation took place more in the rear part of the sequence at Q 12 , Q 13 , or Q 14 , whereby Q 12 and Q 14 are known as motif with an increasing effect on TG2 modification activity, but Q 13 with an decreasing effect [30].
These results were further confirmed by annotation of MS/MS spectra with the series of b-and y-fragments from the MaxQuant output tables (msms.txt files). One isopeptide each was chosen as an example for each model peptide ( Figure 5). Fragmentation within the model peptides resulted in almost the whole b-and y-series. In the MS/MS spectrum of P1/DLYLENPEIKIR involving K-590 ( Figure 5A) the most intense signals were annotated with the smaller b-and y-fragments. The MS/MS spectrum for P2/FLKNAGR involving K-205 ( Figure 5B) showed most signals in the higher m/z range and they were annotated with the b-fragments. The fragments in the MS/MS spectrum for P3/QKR involving K-600 ( Figure 5C) were distributed over the entire m/z range with equally intensive signals for the b-and y-series. Due to the identification of specific fragments around the crosslinking site, the crosslinking glutamine residues were unambiguously confirmed for this isopeptide. Table 3. Glutamine binding and deamidation sites in P1 (PQ 2 PQ 4 LPYPQ 9 PQ 11 LPY) involved in the formation of 22 isopeptides with different lysine residues of TG2. Modified sites identified with pLink2 are given in bold.

Discussion
In this study we used a workflow with the proteomics tool MaxQuant, its integrated search engine Andromeda and Skyline as well as the crosslinking software tool pLink2 to identify isopeptides between TG2 and gluten-derived model peptides. When using these tools, the whole computational part is run without a client-server on the user's computer [34]. We have demonstrated a workflow to identify enzymatically built isopeptides as well as the localization of the crosslinking site within these peptides. In some cases, an unambiguous identification was not possible, because of missing specific fragment ion information, but the localization probability could still be limited to a short part of the peptide sequence. In total, we identified 34 isopeptides with 20 different lysine residues as crosslinking sites. Six of these crosslinking sites were already known as TG2-gluten peptide binding sites [17] and eleven as lysine residues involved in TG2 multimer self-crosslinking [18]. In the model system, 36 TG2-TG2 isopeptides were additionally detected with their crosslinking glutamine and lysine residues. Nine of these TG2-TG2 isopeptide crosslinking combinations were already known [18], as well as six of the nine identified glutamine and nine of the eleven identified lysine residues involved in TG2-multimerization. Furthermore, the four most preferred binding lysine residues (K-425, K-590, K-600 and K-649) in the model system were identified by analyzing different TG2:PepQ ratios. K-590, K-600, K649 were already known as crosslinking sites [17] and are located in the C-terminal domain of TG2. K-425 was shown to be involved in TG2 self-multimerization [18] and is part of the core region next to the catalytic core. All four lysine residues are exposed positions according to the published structures in the PDB database ( Figure 4).
TG2 performs both crosslinking and deamidation of glutamine residues. The model peptide PepQ (PFPQ 4 PQ 6 LPY) comprises two possible targets, Q 4 and Q 6 , for TG2. We identified isopeptides without deamidation of the second glutamine in PepQ and some with deamidated PepQ (Table 1) at either Q 4 or Q 6 . Vader et al. [30] investigated the TG2 deamidation pattern depending on the neighboring C-terminal amino acid. Applied to PepQ, the motif PQ 6 L is a good target and the motif PQ 4 P a weak target for deamidation by TG2. The crosslink formation took place at the expected target Q 6 in the isopeptides without deamidation. In earlier studies of Dorum et al. [21], Q 6 was identified as a crosslinking target for TG2. When looking at the PepE sequences within isopeptides, the crosslink was almost always at Q 4 and the deamidation at Q 6 . In this case and in keeping with previous findings [21,30], these data may indicate first deamidation at the preferred Q 6 followed by crosslinking at the less preferable Q 4 , both reactions implemented by TG2. In case of deamidated Q 4 , the results indicate crosslinking by TG2 followed by a non-enzymatic deamidation due to the alkaline pH conditions during tryptic digestion [31]. The deamidation of the glutamine residues in only deamidated model peptides may be caused by TG2 and additionally the pH conditions. One limitation of the current experimental design is that it does not allow a clear differentiation between enzymatic and non-enzymatic deamidation, because the original intent was to focus on the identification of crosslinking sites, rather than deamidation sites. Further experiments would be necessary to look more closely into the specific mechanisms of crosslinking versus deamidation. The localization probabilities of the modifications are given due to the measurement of specific fragments situated around the targets. This leads to probabilities <75% in some cases, when some of these fragments are missing. In these cases, only the subpart of the sequence can be identified where the possibly crosslinked glutamine residues are located.
It is well established that TG2 is very specific in its deamidation pattern [30], which can be explained by strong effects of the neighboring C-terminal amino acids. In our expanded model system with three gluten model peptides, we demonstrated that TG2 follows the known selective deamidation pattern in almost all deamidated isopeptides. Only in a few cases, where the identification scores were low or the specific fragments were absent, it was not feasible to determine the unambiguous localization of the deamidation. The crosslinking reaction also depends on this selectivity of TG2, but with more exceptions. For the shorter model peptide P1 with less glutamine residues, most of the crosslinking sites within the isopeptides were identified clearly due to the presence of the specific fragments. For the longest peptide P2 with nine glutamine residues, the identification of one specific crosslinking site was more difficult, especially when the isopeptides were identified with a low score. In these cases, it was nevertheless possible to identify the subpart within the peptide sequence that most likely carries the modification. These findings underscore partly the known TG2 selectivity by showing a clear preference for the deamidation pattern. Our data also demonstrate a difference in the crosslinking selectivity or at least a dependence on the previously deamidated glutamine residues.
The alignment of crosslinking sites of TG2 (Supplemental Table S2) including the four surrounding amino acids in both C-and N-terminal direction did not reveal an obvious pattern regarding preferred chemical environments around the reactive sites in the primary structure. Therefore, it seems likely that the secondary structure of TG2 is more important to determine which lysine residues are preferred crosslinking sites. Further experiments, e.g., using amino acid substitution analysis on recombinant TG2 in combination with computational modelling, would be useful to get more detailed insights into secondary structural elements that predict which lysine residues are reactive crosslinking sites and which ones are not.
In summary, using a reciprocal search workflow with commonly used proteomics tools and a recently developed crosslinking tool helps to identify many isopeptides with a high certainty and a few isopeptides just with one of the strategies. These novel insights into the molecular structures of TG2-gluten peptide complexes may help clarify the function of extracellular TG2 in the initiation of CD autoimmunity and the role of anti-TG2 autoantibodies. To shed more light on the immunological and physiological relevance of these complexes, in vivo experiments on the extent and the activation of B cells are necessary. Further experiments together with partners bringing in complementary expertise, especially in immunology, would be needed to address the most relevant point regarding the link between our findings and TG2-mediated gluten peptide presentation in CD. Crosslinking reactions are implicated in a number of inflammatory diseases, degenerative disorders, and even cancer, so this strategy may open up multiple opportunities for further research. Future efforts will aim to determine isopeptides of TG2 with physiologically relevant gluten hydrolysates from wheat, rye, and barley.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6643/11/10/2263/s1, Figure S1: Reciprocal search workflow to identify isopeptides with MaxQuant and Skyline, Figure S2: Sequence coverage of TG2, Figure S3: MS/MS spectra of isopeptides between human tissue transglutaminase (TG2) and PepQ (PFPQPQLPY) or its deamidated form PepE, Figure S4: Differentiation of the reactive sites of TG2, Table S1: TG2 peptides containing lysine residues identified as isopeptide crosslinking sites, Table S2: Alignment of the crosslinking sites of TG2 sorted by most preferred to least preferred in the model system.