Triglycine-Based Approach for Identifying the Substrate Recognition Site of an Enzyme

: Various peptides or non-structural amino acids are recognized by their speciﬁc target proteins, and perform a biological role in various pathways in vivo. Understanding the interactions between target protein and peptides (or non-structural amino acids) provides key information on the molecular interactions, which can be potentially translated to the development of novel drugs. However, it is experimentally challenging to determine the crystal structure of protein–peptide complexes. To obtain structural information on the substrate recognition of the peptide-recognizing enzyme, X-ray crystallographic studies were performed using triglycine (Gly-Gly-Gly) as the main-chain of the peptide. The crystal structure of Parengyodontium album Proteinase K in complex with triglcyine was determined at a 1.4 Å resolution. Two di ﬀ erent bound conformations of triglycine were observed at the substrate recognition site. The triglycine backbone forms stable interactions with β 5- α 4 and α 5- β 6 loops of the main-chain. One of the triglycine-binding conformations was identical to the binding mode of a peptide-based inhibitor from a previously reported crystal structure of Proteinase K. Triglycine has potential application in X-ray crystallography in order to identify the substrate recognition sites in the peptide binding enzymes. The crystal structure of an HSL-homolog EstE5 complex with PMSF reveals a unique conﬁguration that inhibits the nucleophile Ser144 in catalytic triads.


Introduction
Protein-protein interactions (PPIs) regulate most cellular processes, such as enzyme activities, intracellular localization, and/or physical interactions [1,2]. PPIs can be broadly classified into two types, protein-protein interactions (between domains of two proteins) and protein-peptide interactions (between a protein and linear sequence of other protein) [3]. In PPIs, proteins recognize the short peptide sequences in either a sequence-dependent or -independent manner [4]. Peptides bind to the cavities, grooves, or pockets in either the active site or substrate recognition sites on the proteins. Alternately, the peptides interact with the β-strands on the protein surface [4]. An investigation of the PPIs is necessary in order to understand the mechanistic details of the biological processes, and to develop novel drugs [5][6][7]. Among the various methods for PPI-based drug development, a structure-based approach has been widely employed, as in fragment-based drug discovery (FBDD) [8], so as to identify potential lead compounds for drug development. Although the initial success is usually less than that obtained by high-throughput screening (HTS) techniques, FBDD is considered to be more efficient in the optimization phase [9,10].
Peptide-based fragments can be used to understand the structural and mechanistic features of PPIs, and to obtain information on FBDD. I hypothesized that a crystallographic study using triglycine (Gly-Gly-Gly or glycyl-glycyl-glycine) molecules could provide information on the main-chain interactions of PPIs. Because triglycine, unlike other amino acid fragments, had a high degree of freedom in the torsion angle of the main chain, it was expected that the triglycine could be applied for the identification of the peptide binding site in peptide-recognizing proteins. To demonstrate this hypothesis, proteinase K (EC 3.4.21.64) from fungus Parengyodontium album (formerly Tritirachium album), named PaProK, was used as a model in this experiment. This enzyme is a broad-spectrum serine protease, and predominantly cleaves the internal peptide bonds adjacent to the carboxyl group of aliphatic and aromatic hydrophobic amino acids [11]. The active site consists of a catalytic triad Asp39-His69-Ser224, and the substrate recognition site is formed by Gly100-Tyr104 and Ser132-Gly136, which form a triple antiparallel β-strand with the substrate [12].
To investigate the structural features of PPI, here, I attempted X-ray crystallographic study using triglycine, and determined the crystal structure of the PaProK-triglycine complex at a 1.4 Å resolution. The triglycine was bound in two conformations to the substrate recognition site between β5-α4 and α5-β6 loops in PaProK. The conformations of the bound triglycine were compared with previously reported crystal structures of PaProK complexed with a peptide-based inhibitor. This structural investigation of the triglycine-PaProK interaction could provide useful information for understanding PPI.

Sample Preparation
Proteinase K from Parengyodontium album was purchased from Geogiachem (PR3050). The protein was dissolved and resuspended in 50 mM HEPES-NaOH (pH 7.0), and crystallized at 22 • C using the hanging-drop vapor diffusion method. The crystallization drops were set by mixing 1 µl of a PaProK (30 mg/ml) solution with 1 µl of a reservoir solution (0.1 M Tris-HCl at pH 8.0 and 2.0 M ammonium sulfate), and incubated with 0.5 ml of a reservoir solution. Suitable crystals (0.15 × 0.10 × 0.10 mm) for X-ray diffraction were obtained within two weeks. For the PaProK-triglycine complex crystals, 2 µl of a 0.3 M triglycine solution was added to the crystallization drop and was incubated for 30 min.

X-Ray Diffraction and Data Processing
The crystals were transferred to a reservoir buffer containing 25% (v/v) glycerol as cryoprotectant. The diffraction data were collected at 100 K on beamline 11C at the Pohang Light Source II (PLS-II, Pohang, South Korea) using a Pilatus 6 M detector [13]. The diffraction data sets were processed and reduced using the HKL2000 program [14]. Data collection statistics are presented in Table 1.

Structure Determination
The PaProK structure was solved by the molecular replacement method using MOLREP [15] from the CCP4 program suite [16], with the coordinates from the Proteinase K structure (PDB: 1IC6) [17], as the search model. Manual model building and refinement were performed using COOT [18] and phenix.refinement in PHENIX [19], respectively. The refinement statistics are presented in Table 1. The figures were generated with PyMOL [20]. The final structures were validated by MolProbity [21]. The coordinates and structural factors have been deposited in the Protein Data Bank with an accession code (6KKF).

Overall Structure
The crystals of the PaProK-triglycine complex belonged to the monoclinic space group P2 1 , containing one molecule in the asymmetric unit. The electron density map was clear for the interpretation from Ala106 to Gln383. Bragg peaks were observed even at the around 1.1 Å resolution, but did not satisfy reliable data completeness, thus the diffraction data was processed to a 1.4 Å PaProK displayed a typical αβ-hydrolase fold with seven α-helices and ten β-strands. The six central β-strands (β3-β8) are surrounded by five peripheral α-helices (α3-α7; Figure 1a). This structure is similar to the previously reported crystal structures of PaProK (PDB code: 1IC6) [17], with a root-mean-square deviation (RMSD) of 0.2892 Å. The Ramachandran plot revealed that the geometry of Asp144 on the β3-α2 loop is distorted because of its torsion angles, phi (Φ) and psi (Ψ), at −172 • and −142 • , respectively. The Asp144 residue is part of the catalytic triad with His222 and Ser329. This distorted geometry of the catalytic triad is often observed in the serine proteases or esterase [22][23][24][25]. The Oδ1 and Oδ2 atoms of the Asp144 residue (charge-relay) are stabilized by hydrogen bonds with the Nδ atom of His174 (proton carrier), which are at a distance of 2.70 and 3.03 Å, respectively. 2.40 Å for Oδ1 and Oδ2, respectively), and four water molecules (2.33-2.51 Å ). Ca2 was coordinated by carbonyl groups of Thr121 (2.49 Å), Asp365 (2.44 Å and 2.36 Å for Oδ1 and Oδ2, respectively), and one water molecule (2.46 Å ). The triglycine is located on the negatively charged substrate binding pocket around the catalytic triad ( Figure 1b).

Triglycine Binding Site on PaProK
The electron density for triglycine was found at the substrate binding pocket between the β5-α4 and α5-β6 loops around the active site ( The O3 atom of triglycine is 4.2 Å away from the hydroxyl group of Ser329. As a result, the triglycine backbone of both conformations can be stably maintained by hydrogen bonds on the substrate binding site. In particular, GGG-B forms a triple antiparallel β-strand with the β5-α4 and α5-β6 loops. While the proteins involved in PPI generally recognize specific amino acid sequences of the substrate, triglycine, because of the lack of side chains, is considered to bind the substrate binding site with a lower affinity compared with the general substrates of PaProK. The electron density map clearly shows that PaProK contains two Ca 2+ ions (named, Ca1 and Ca2; Figure 1a). These Ca 2+ ions are involved in the thermal stability, but not in the proteolytic activity [12]. The B-factors for the entire protein, Ca1, and Ca2, at 11.89 Å 2 , 7.4 Å 2 , and 28.0 Å 2 , respectively, indicate that Ca1 binds the protein more strongly than Ca2. Ca1 was octahedrally coordinated by carbonyl groups of Pro280 (2.42 Å) and Val282 (2.39 Å), two carboxylate oxygens Asp305 (2.65 Å and 2.40 Å for Oδ1 and Oδ2, respectively), and four water molecules (2.33-2.51 Å). Ca2 was coordinated by carbonyl groups of Thr121 (2.49 Å), Asp365 (2.44 Å and 2.36 Å for Oδ1 and Oδ2, respectively), and one water molecule (2.46 Å). The triglycine is located on the negatively charged substrate binding pocket around the catalytic triad (Figure 1b).

Triglycine Binding Site on PaProK
The electron density for triglycine was found at the substrate binding pocket between the β5-α4 and α5-β6 loops around the active site (Figure 2a and Supplementary Figure S1) [17]. The electron density for the triglycine molecule was not observed at any other positions, indicating that triglycine can bind only to the peptide binding site on PaProK. Interestingly, the electron density map of triglycine reveals two binding conformations, GGG-A and GGG-B ( Figure 2b); while the amino-terminus of triglycine is directed toward PaProK active site in GGG-A (Figure 2c (Figure 2f). The O3 atom of triglycine is 4.2 Å away from the hydroxyl group of Ser329. As a result, the triglycine backbone of both conformations can be stably maintained by hydrogen bonds on the substrate binding site. In particular, GGG-B forms a triple antiparallel β-strand with the β5-α4 and α5-β6 loops. While the proteins involved in PPI generally recognize specific amino acid sequences of the substrate, triglycine, because of the lack of side chains, is considered to bind the substrate binding site with a lower affinity compared with the general substrates of PaProK.

Structural Comparison of PaProK-Triglycine Complex with Other ProK-Inhibitor Peptide
The X-ray crystal structure confirmed that triglycine binds at the PaProK substrate recognition site in two different conformations. To confirm whether the main-chain interactions between PaProK and triglycine provide information for FBDD, the PaProK-triglycine was compared with a previously reported crystal structure of PaProK complexed with a peptide-based inhibitor (Figure 3 and Supplementary Figure S2). In PDB, five proteinase K structures in complex with peptide-based inhibitors (PDB code 1P7V, 1P7W, 1PEK, 1PFG, and 3PRK [26][27][28]) were found. These structures show structural similarity to PaProK-triglycine with an RMSD of 0.192-0.437 Å. In PaProK-1P7V (Pro-Ala-Pro-Phe-Ala-Ala-Ala) and PaProK-1P7W (Pro-Ala-Pro-Phe-Ala-Ser-Ala), the heptapeptide inhibitor is located in the cleft around the active site, the substrate binding pocket (Supplementary Figure S2). The N-terminal Pro-Ala-Pro residues of these peptide inhibitors lie in the same substraterecognition cleft as PaProK-triglycine and C-terminal residues are proximal to the β5-α4 loop (Supplementary Figure S2). The peptide-inhibitors in ProK-1PEK (N-Ac-Pro-Ala-Pro-Phe-DAla-Ala-NH2; hexapeptide inhibitor) and ProK-1PFG (N-Ac-Pro-Ala-Pro-Phe-DAla-Ala-Ala-Ala-NH2; octapeptide inhibitor), include N-terminal modification and D-form amino acids. These structures show the cleavage forms of the inhibitor. The N-terminal Pro-Ala-Pro residues of 1PEK and 1PFG are placed in the substrate recognition site, and the cleaved C-terminal fragments are located in the vicinity of the His174 residue (Supplementary Figure S2). Collectively, the N-terminal Pro-Ala-Pro residues of the inhibitors in all PaProK complex structures (1P7V, 1P7W, 1PEK, and 1PFG) were located in the same substrate binding position between β5-α4 and α5-β6 loops, similar to that observed with triglycine. However, the four Pro-Ala-Pro residues in the peptide inhibitor not only showed a different backbone geometry, but also had no similarity with the backbone interaction of triglycine. The different geometry of the backbone of these peptides is considered to be determined by the C-terminal amino acid sequence of Pro-Ala-Pro or the presence of L/D form of amino acid (Figure 3a).

Structural Comparison of PaProK-Triglycine Complex with Other ProK-Inhibitor Peptide
The X-ray crystal structure confirmed that triglycine binds at the PaProK substrate recognition site in two different conformations. To confirm whether the main-chain interactions between PaProK and triglycine provide information for FBDD, the PaProK-triglycine was compared with a previously reported crystal structure of PaProK complexed with a peptide-based inhibitor (Figure 3 and Supplementary Figure S2). In PDB, five proteinase K structures in complex with peptide-based inhibitors (PDB code 1P7V, 1P7W, 1PEK, 1PFG, and 3PRK [26][27][28]) were found. These structures show structural similarity to PaProK-triglycine with an RMSD of 0.192-0.437 Å. In PaProK-1P7V (Pro-Ala-Pro-Phe-Ala-Ala-Ala) and PaProK-1P7W (Pro-Ala-Pro-Phe-Ala-Ser-Ala), the heptapeptide inhibitor is located in the cleft around the active site, the substrate binding pocket (Supplementary Figure  S2). The N-terminal Pro-Ala-Pro residues of these peptide inhibitors lie in the same substrate-recognition cleft as PaProK-triglycine and C-terminal residues are proximal to the β5-α4 loop (Supplementary Figure S2). The peptide-inhibitors in ProK-1PEK (N-Ac-Pro-Ala-Pro-Phe-DAla-Ala-NH2; hexapeptide inhibitor) and ProK-1PFG (N-Ac-Pro-Ala-Pro-Phe-DAla-Ala-Ala-Ala-NH2; octapeptide inhibitor), include N-terminal modification and D-form amino acids. These structures show the cleavage forms of the inhibitor. The N-terminal Pro-Ala-Pro residues of 1PEK and 1PFG are placed in the substrate recognition site, and the cleaved C-terminal fragments are located in the vicinity of the His174 residue (Supplementary Figure S2). Collectively, the N-terminal Pro-Ala-Pro residues of the inhibitors in all PaProK complex structures (1P7V, 1P7W, 1PEK, and 1PFG) were located in the same substrate binding position between β5-α4 and α5-β6 loops, similar to that observed with triglycine. However, the four Pro-Ala-Pro residues in the peptide inhibitor not only showed a different backbone geometry, but also had no similarity with the backbone interaction of triglycine. The different geometry of the backbone of these peptides is considered to be determined by the C-terminal amino acid sequence of Pro-Ala-Pro or the presence of L/D form of amino acid (Figure 3a). and nitrogen atom of Ala282, and O atom of Ala282 of peptide-inhibitor interact with the O atom of Gly207 (2.72 Å), N atom of Gly207 (3.47 Å), the O atom of Gly239 (3.14 Å) and the N atom of Gly239 (3.07 Å), respectively. Notably, in ProK-3PPK, the Ala-Ala-Pro residues located at the substrate binding site between β5-α4 and α5-β6 loops exhibited almost identical backbone geometry to the GGG-B form of triglycine (Figure 3c). Meanwhile, the GGG-A of triglycine was not similar to the backbone of the peptide-based inhibitors.

Discussion
Understanding the mechanistic and structural characteristics of the substrate recognition in PPI could provide very useful insights for FBDD. To understand the mechanism underlying substrate recognition by the target protein, X-ray crystallography-based structural investigation is generally performed using peptide fragments of a substrate or similar sequence. Here, a triglycine molecule corresponding to the peptide backbone was employed to identify the substrate binding site on PaProK. Triglycine with its two configurations was correctly placed on the substrate binding site in PaProK. This approach can provide structural information on the recognition of the main chain of a peptide-recognizing protein. Furthermore, it could provide potential information for the initial FBDD. Comparison with the PaProK-peptide-based inhibitor structures revealed that one of the configurations of triglycine was found to have almost the same back-bone binding configuration as the peptide inhibitor in PaProK-3PPK. This indicates that the substrate recognition evidence obtained through triglycine can be translated to design novel peptide inhibitors. Further, the existence of two different conformations of triglycine in the substrate binding site of PaProK was confirmed. These results suggest that peptide-based inhibitors can be designed with two different conformations. However, this can provide information on nonspecific binding. Therefore, further studies on peptide interactions are needed, which could lead to improved products. To conclude, triglycine provided information on the substrate recognition mechanism in protein-protein or protein-peptide interactions that can be employed in FBDD. In PaProK-3PPK (methoxysuccinyl-Ala-Ala-Pro-Ala-chloromethyl ketone), the peptide-inhibitor was covalently modified and showed in transition state in a crystal structure (Figure 3b). In this crystal structure, the N atom of Ala281 (number in PDB of ProK3PPK), O atom of Ala281, N3, and nitrogen atom of Ala282, and O atom of Ala282 of peptide-inhibitor interact with the O atom of Gly207 (2.72 Å), N atom of Gly207 (3.47 Å), the O atom of Gly239 (3.14 Å) and the N atom of Gly239 (3.07 Å), respectively. Notably, in ProK-3PPK, the Ala-Ala-Pro residues located at the substrate binding site between β5-α4 and α5-β6 loops exhibited almost identical backbone geometry to the GGG-B form of triglycine (Figure 3c). Meanwhile, the GGG-A of triglycine was not similar to the backbone of the peptide-based inhibitors.

Discussion
Understanding the mechanistic and structural characteristics of the substrate recognition in PPI could provide very useful insights for FBDD. To understand the mechanism underlying substrate recognition by the target protein, X-ray crystallography-based structural investigation is generally performed using peptide fragments of a substrate or similar sequence. Here, a triglycine molecule corresponding to the peptide backbone was employed to identify the substrate binding site on PaProK. Triglycine with its two configurations was correctly placed on the substrate binding site in PaProK. This approach can provide structural information on the recognition of the main chain of a peptide-recognizing protein. Furthermore, it could provide potential information for the initial FBDD. Comparison with the PaProK-peptide-based inhibitor structures revealed that one of the configurations of triglycine was found to have almost the same back-bone binding configuration as the peptide inhibitor in PaProK-3PPK. This indicates that the substrate recognition evidence obtained through triglycine can be translated to design novel peptide inhibitors. Further, the existence of two different conformations of triglycine in the substrate binding site of PaProK was confirmed. These results suggest that peptide-based inhibitors can be designed with two different conformations. However, this can provide information on nonspecific binding. Therefore, further studies on peptide interactions are needed, which could lead to improved products. To conclude, triglycine provided information on the substrate recognition mechanism in protein-protein or protein-peptide interactions that can be employed in FBDD.