Key Amino Acids for Transferase Activity of GDSL Lipases

The Gly-Asp-Ser-Leu (GDSL) motif of esterase/lipase family proteins (GELPs) generally exhibit esterase activity, whereas transferase activity is markedly preferred in several GELPs, including the Tanacetum cinerariifolium GDSL lipase TciGLIP, which is responsible for the biosynthesis of the natural insecticide, pyrethrin I. This transferase activity is due to the substrate affinity regulated by the protein structure and these features are expected to be conserved in transferase activity-exhibiting GELPs (tr-GELPs). In this study, we identified two amino acid residues, [N/R]208 and D484, in GELP sequence alignments as candidate key residues for the transferase activity of tr-GELPs by two-entropy analysis. Molecular phylogenetic analysis demonstrated that each tr-GELP is located in the clusters for non-tr-GELPs, and most GELPs conserve at least one of the two residues. These results suggest that the two conserved residues are required for the acquisition of transferase activity in the GELP family. Furthermore, substrate docking analyses using ColabFold-generated structure models of both natives and each of the two amino acids-mutated TciGLIPs also revealed numerous docking models for the proper access of substrates to the active site, indicating crucial roles of these residues of TciGLIP in its transferase activity. This is the first report on essential residues in tr-GELPs for the transferase activity.


Introduction
The Gly-Asp-Ser-Leu (GDSL) motif esterase/lipase family proteins (GELPs) are lipases that feature a Gly-Asp-Ser-X (GDSX) consensus motif and are involved in a wide variety of biological functions [1], including seed germination [2], pollen interaction [3], lipid metabolism [4], and secondary metabolism [5]. Canonical GELPs exhibit esterase (hydrolysis) activity, but several GELPs preferentially function as transferases rather than esterases. In Tanacetum cinerariifolium, the species-specific insecticidal secondary metabolite pyrethrin I is biosynthesized via the esterification of chrysanthemoyl-CoA and pyrethrolone by the T. cinerariifolium GDSL lipase, TciGLIP [6]. Although TciGLIP also exhibits esterase activity against its transferase product pyrethrin I, this esterase activity is much lower than the transferase activity [6]. Tanacetum coccineum, a phylogenetically close species to T. cinerariifolium, can also produce pyrethrins, and TcoGLIP (T. coccineum GDSL lipase) has been found in the genome of T. coccineum [7,8]. In other genera, TaXAT (Triticum aestivum xanthophyll acyltransferase) catalyzes the esterification of xanthophyll and triacylglycerides to xanthophyll esters, and like TciGLIP, it exhibits much higher transferase activity than esterase activity [9]. Moreover, SlCGT (Solanum lycopersicum chlorogenate: glucarate caffeoyltransferase) has been reported as the first GELP family enzyme that has acquired transferase activity and lost its reverse esterase activity [10].
Such transferase activity of GELP family proteins is likely due to substrate affinity caused by the protein structure, and these features are expected to be shared among Int. J. Mol. Sci. 2022, 23, 15141 2 of 8 transferase activity-exhibiting GELPs (tr-GELPs). Many studies on GELPs have thus far shown that the general active site for esterase activity is formed by three amino acid residues, Gly-Asp-His, designated as a "catalytic triad" [11]. Point mutations in the catalytic triad of TciGLIP (S40A, D318A, or H321A) result in a loss of its transferase activity [6,12], whereas a similar mutation in SlCGT (H331A) does not affect its transferase activity [10], indicating that crucial residues for transferase activity do not exactly accord with that for esterase activity. These findings suggest the existence of other crucial residues for the transferase activity of tr-GELPs aside from the catalytic triad. In this study, we identified common key residues for transferase activity in tr-GELPs using a two-entropy analysis of amino acids, and the effects of these residues on TciGLIP were estimated using structure model prediction and docking simulations. To the best of our knowledge, this is the first report on amino acids responsible for the transferase activity of tr-GELPs including TciGLIP.

Sequence Alignment and Two-Entropy Analysis of GELPs
For the GELPs with known substrates, the protein sequences of four tr-GELPs, including TciGLIP [6], TcoGLIP [7], TaXAT [9], and SlCGT [10], and six esterase activity-exhibiting GELPs (est-GELPs), including AtCDEF1 (Arabidopsis thaliana cuticle destructing factor 1 [3]), BnSCE3 (Brassica napus sinapine esterase [2]), CpEST (Carica papaya esterase [13]), FvGELP1 (Fragaria vesca GDSL esterase/lipase [14], OsGLIP1 (Oryza sativa GDSL lipase [4]), and RsAAE (Rauvolfia serpentina acetylajmalan acetylesterase [5]), were collected from the NCBI database. In addition to these GELPs, putative GELPs were detected by BLASTP searches of the NCBI NR database with each of the GELPs as a query. All collected GELP sequences were aligned using CLUSTAL W-mpi [15] (Figure 1). Such transferase activity of GELP family proteins is likely due to substrate affinity caused by the protein structure, and these features are expected to be shared among transferase activity-exhibiting GELPs (tr-GELPs). Many studies on GELPs have thus far shown that the general active site for esterase activity is formed by three amino acid residues, Gly-Asp-His, designated as a "catalytic triad" [11]. Point mutations in the catalytic triad of TciGLIP (S40A, D318A, or H321A) result in a loss of its transferase activity [6,12], whereas a similar mutation in SlCGT (H331A) does not affect its transferase activity [10], indicating that crucial residues for transferase activity do not exactly accord with that for esterase activity. These findings suggest the existence of other crucial residues for the transferase activity of tr-GELPs aside from the catalytic triad. In this study, we identified common key residues for transferase activity in tr-GELPs using a twoentropy analysis of amino acids, and the effects of these residues on TciGLIP were estimated using structure model prediction and docking simulations. To the best of our knowledge, this is the first report on amino acids responsible for the transferase activity of tr-GELPs including TciGLIP.

Sequence Alignment and Two-Entropy Analysis of GELPs
For the GELPs with known substrates, the protein sequences of four tr-GELPs, including TciGLIP [6], TcoGLIP [7], TaXAT [9], and SlCGT [10], and six esterase activityexhibiting GELPs (est-GELPs), including AtCDEF1 (Arabidopsis thaliana cuticle destructing factor 1 [3]), BnSCE3 (Brassica napus sinapine esterase [2]), CpEST (Carica papaya esterase [13]), FvGELP1 (Fragaria vesca GDSL esterase/lipase [14], OsGLIP1 (Oryza sativa GDSL lipase [4]), and RsAAE (Rauvolfia serpentina acetylajmalan acetylesterase [5]), were collected from the NCBI database. In addition to these GELPs, putative GELPs were detected by BLASTP searches of the NCBI NR database with each of the GELPs as a query. All collected GELP sequences were aligned using CLUSTAL W-mpi [15] (Figure 1). To detect amino acid residues conserved preferentially in tr-GELPs, two-entropy values of amino acids at each position in the sequence alignments were calculated according to previous studies [16,17]. For instance, the two-entropy analysis has been employed to determine the ligands of adenosine receptors [18] and the ligand recognition To detect amino acid residues conserved preferentially in tr-GELPs, two-entropy values of amino acids at each position in the sequence alignments were calculated according to previous studies [16,17]. For instance, the two-entropy analysis has been employed to determine the ligands of adenosine receptors [18] and the ligand recognition mechanism of cannabinoid receptors [19]. The difference in the entropy value between certain protein groups is zero at positions where there is no correlation between the function and amino acid type. Since the entropy values of positions with a high correlation between function and amino acid type are high, the positions with a low entropy of tr-GELP and a high entropy of other GELPs (est-GELPs and putative GELPs) are the residue positions that are crucial for transferase activity. Figure 2 shows scatter plots of positions in the alignment and the distances from the catalytic triad in TciGLIP respectively, demonstrating the difference in entropy values between tr-GELPs and other GELPs. No region-specific difference was detected in entropy values, except for a higher entropy for tr-GELPs at the N-terminal signal sequence. Among the top ten residues that had lower entropy values in all four tr-GELPs compared to other GELPs, Asn or Arg at position 208 ([N/R]208) and Asp at position 484 (D484) in the alignment are not present in the six est-GELPs (Supplementary Information S1). In contrast, residues that were preferentially conserved in est-GELPs were not detected. These data indicated that [N/R]208 and D484 are crucial for the acquisition of tr-GELP activity. mechanism of cannabinoid receptors [19]. The difference in the entropy value between certain protein groups is zero at positions where there is no correlation between the function and amino acid type. Since the entropy values of positions with a high correlation between function and amino acid type are high, the positions with a low entropy of tr-GELP and a high entropy of other GELPs (est-GELPs and putative GELPs) are the residue positions that are crucial for transferase activity. Figure 2 shows scatter plots of positions in the alignment and the distances from the catalytic triad in TciGLIP respectively, demonstrating the difference in entropy values between tr-GELPs and other GELPs. No region-specific difference was detected in entropy values, except for a higher entropy for tr-GELPs at the N-terminal signal sequence. Among the top ten residues that had lower entropy values in all four tr-GELPs compared to other GELPs, Asn or Arg at position 208 ([N/R]208) and Asp at position 484 (D484) in the alignment are not present in the six est-GELPs (Supplementary Information S1). In contrast, residues that were preferentially conserved in est-GELPs were not detected. These data indicated that [N/R]208 and D484 are crucial for the acquisition of tr-GELP activity.

Molecular Phylogenetic Analysis of GELPs
A molecular phylogenetic tree of tr-GELPs, est-GELPs, and putative GELPs is shown in Figure 3. In the Asteraceae family (including T. cinerariifolium and T. coccineum) protein cluster (Figure 3

Molecular Phylogenetic Analysis of GELPs
A molecular phylogenetic tree of tr-GELPs, est-GELPs, and putative GELPs is shown in Figure 3. In the Asteraceae family (including T. cinerariifolium and T. coccineum) protein cluster (Figure 3 suggesting that the six GELPs with both [N/R]208 and D484 exhibit similar transferase activity to TaXAT. In addition to the four known tr-GELPs, nine GELPs with both [N/R]208 and D484 formed a cluster consisting of Brassicaceae proteins (Figure 3, clade D). Interestingly, this cluster is located distantly from other Brassicaceae protein clusters, including est-GELP BnSCE3 (Brassica napus sinapine esterase) (Figure 3, clade D-II), suggesting that these GELPs might have been multiplied separately in the same plant family.

Prediction of Protein Structures and Substrate Docking Simulations of TciGLIP
Access of a substrate to an active site is a prerequisite for enzymatic activity. Structural models showing proper access for a substrate to an active site are defined as "reasonable models", and the correlation between the number of reasonable models and the affinity is employed to predict the structure-function correlations of enzymes [20,21]. To examine the contribution of R153 and D336 to the transferase activity of TciGLIP (corresponding to [N/R]208 and D484 in the sequence alignment, respectively), we assessed the number of models that allowed access of the substrates of pyrethrin I (chrysanthemoyl-CoA and pyrethrolone) to the catalytic triad of native TciGLIP [12], virtual mutants (R153A-TciGLIP and D336A-TciGLIP), and experimentally validated mutants (S339A-TciGLIP and G64A-TciGLIP), since the substitution of S339 with A fails to affect the transferase activity of TciGLIP [6], whereas the substitution of G64 with A results in complete loss of the activity [12]. The amino acid sequences of the native TciGLIP and the four TciGLIP mutants (S339A, G64A, D336A, and R153A) were subjected to ColabFold analyses [22] to predict their protein structures. For each structure model, chrysanthemoyl-CoA and pyrethrolone were docked using AutoDock Vina (Figure 4), and the number of reasonable models was counted.

Prediction of Protein Structures and Substrate Docking Simulations of TciGLIP
Access of a substrate to an active site is a prerequisite for enzymatic activity. Structural models showing proper access for a substrate to an active site are defined as "reasonable models", and the correlation between the number of reasonable models and the affinity is employed to predict the structure-function correlations of enzymes [20,21]. To examine the contribution of R153 and D336 to the transferase activity of TciGLIP (corresponding to [N/R]208 and D484 in the sequence alignment, respectively), we assessed the number of models that allowed access of the substrates of pyrethrin I (chrysanthemoyl-CoA and pyrethrolone) to the catalytic triad of native TciGLIP [12], virtual mutants (R153A-TciGLIP and D336A-TciGLIP), and experimentally validated mutants (S339A-TciGLIP and G64A-TciGLIP), since the substitution of S339 with A fails to affect the transferase activity of TciGLIP [6], whereas the substitution of G64 with A results in complete loss of the activity [12]. The amino acid sequences of the native TciGLIP and the four TciGLIP mutants (S339A, G64A, D336A, and R153A) were subjected to ColabFold analyses [22] to predict their protein structures. For each structure model, chrysanthemoyl-CoA and pyrethrolone were docked using AutoDock Vina (Figure 4), and the number of reasonable models was counted. Figure 4. Scheme of selecting reasonable models. The protein sequences of TciGLIPs (native and mutant) were subjected to ColabFold to generate structure models. Chrysanthemoyl-CoA (substrate 1) was docked using AutoDock Vina for each model. Moreover, the simulated models in which substrate 1 was close to the catalytic triad of TciGLIP were subjected to AutoDock Vina with pyrethrolone (substrate 2) as with substrate 1. Simulated models in which both substrates 1 and 2 are close to the catalytic triad of TciGLIP are regarded as "reasonable models". All pieces of software used are underlined.
These analyses generated 45.3 ± 7.2 reasonable models for native TciGLIP ( Figure  5A). Likewise, 35.0 ± 4.5 reasonable models were generated for S339A-TciGLIP ( Figure  5A), and no significant difference was detected between the two transferase-positive TciGLIP proteins ( Figure 5A, p < 0.05, Dunnett's test). In contrast, the number of reasonable models for G64A-TciGLIP (25.0 ± 10.2) was significantly lower ( Figure 5A, p < 0.05, Dunnett's test). These results were in good agreement with previous experimental results demonstrating that S339A-TciGLIP has equipotent transferase activity to native TciGLIP [6], whereas G64A-TciGLIP is devoid of transferase activity [12]. Of particular interest is that both the numbers of reasonable models of the D336A mutant (25.3 ± 6.3) and the R153A mutant (21.0 ± 5.4) were significantly smaller than that of native TciGLIP Figure 4. Scheme of selecting reasonable models. The protein sequences of TciGLIPs (native and mutant) were subjected to ColabFold to generate structure models. Chrysanthemoyl-CoA (substrate 1) was docked using AutoDock Vina for each model. Moreover, the simulated models in which substrate 1 was close to the catalytic triad of TciGLIP were subjected to AutoDock Vina with pyrethrolone (substrate 2) as with substrate 1. Simulated models in which both substrates 1 and 2 are close to the catalytic triad of TciGLIP are regarded as "reasonable models". All pieces of software used are underlined.
These analyses generated 45.3 ± 7.2 reasonable models for native TciGLIP ( Figure 5A). Likewise, 35.0 ± 4.5 reasonable models were generated for S339A-TciGLIP ( Figure 5A), and no significant difference was detected between the two transferase-positive TciGLIP proteins ( Figure 5A, p < 0.05, Dunnett's test). In contrast, the number of reasonable models for G64A-TciGLIP (25.0 ± 10.2) was significantly lower ( Figure 5A, p < 0.05, Dunnett's test). These results were in good agreement with previous experimental results demonstrating that S339A-TciGLIP has equipotent transferase activity to native TciGLIP [6], whereas G64A-TciGLIP is devoid of transferase activity [12]. Of particular interest is that both the numbers of reasonable models of the D336A mutant (25.3 ± 6.3) and the R153A mutant (21.0 ± 5.4) were significantly smaller than that of native TciGLIP ( Figure 5A, p < 0.05, Dunnett's test) and comparable to that of G64A-TciGLIP. These results indicated that both D336 and R153 are crucial for the transferase activity of TciGLIP. Notably, both D336 and R153 are located distantly from the catalytic triad of TciGLIP (Figures 2B and 5B) and not in conserved regions, Blocks I, II, III, or V, among GELPs [6] (Supplemental Information S1).
Collectively, these structural analyses suggest that R153 and D336 participate in distal regulation of the active confirmation responsible for the transferase activity of TciGLIP.
( Figure 5A, p < 0.05, Dunnett's test) and comparable to that of G64A-TciGLIP. These results indicated that both D336 and R153 are crucial for the transferase activity of TciGLIP. Notably, both D336 and R153 are located distantly from the catalytic triad of TciGLIP ( Figures 2B and 5B) and not in conserved regions, Blocks I, II, III, or V, among GELPs [6] (Supplemental Information S1). Collectively, these structural analyses suggest that R153 and D336 participate in distal regulation of the active confirmation responsible for the transferase activity of TciGLIP.

Two-Entropy Analysis of GELPs
To find amino acid positions from a multiple sequence alignment that correlate with transferase activity, we calculated two of Shannon's entropy values according to a previous study [16]. In short, the entropy values for position p (Ep) for tr-GELPs and other GELPs are given by:

Two-Entropy Analysis of GELPs
To find amino acid positions from a multiple sequence alignment that correlate with transferase activity, we calculated two of Shannon's entropy values according to a previous study [16]. In short, the entropy values for position p (E p ) for tr-GELPs and other GELPs are given by: where N a,p is the number of sequences with amino acid a at alignment position p that was corrected using a BLOSUM62-based pseudo-count strategy [17]. N all represents the number of sequences within the alignment. The pseudo-count was set to 2.00.

Protein Structure Modeling and Substrate-Binding Simulations of TciGLIP
The protein sequences of native TciGLIP and four TciGLIP mutants (S339A, G64A, D336A, and R153A) were subjected to structure model prediction using ColabFold (Al-phaFold2 with MMseqs2) [22] with default parameters, and we extracted the five models with the lowest energy for subsequent analyses. The structures of pyrethrin I substrates, chrysanthemoyl-CoA (CHEBI: 143950), and pyrethrolone (CHEBI: 39111), were downloaded from the ChEBI database (https://www.ebi.ac.uk/chebi/init.do (accessed on 16 August 2022)) and minimized with CHARMM force fields using Spartan'18 v1.4.5 (Wavefunction, Inc., Irvine, CA, USA). The top five models of each TciGLIP mutant were subjected to docking modeling with chrysanthemoyl-CoA using AutoDock Vina 1.1.2 [25]. The numbers of grid points in the x-, y-, and z-axes for AutoDock were 36 × 42 × 44, with grid points separated by 0.375 Å; we set exhaustiveness to 100, num_modes to 20 (max), and other parameters to default settings. The chrysanthemoyl-CoA docked models were selected based on the distance between the sulfur atom of the chrysanthemoyl-CoA thiol ester region and the imidazole group of the His321 C-2 atom in the catalytic triad of TciGLIP. Furthermore, pyrethrolone was docked to the chrysanthemoyl-CoA docked models by using AutoDock Vina 1.1.2. The numbers of grid points in the x-, y-, and z-axes for AutoDock were 18 × 24 × 22, with grid points separated by 0.375 Å; we set exhaustiveness to 100, num_modes to 20 (max), and other parameters to default settings. The numbers of reasonable models in which both substrates (chrysanthemoyl-CoA and pyrethrolone) were close to the catalytic triad were counted for every protein sequence model, and these numbers were examined for statistical significance with Dunnett's test, with p < 0.05 indicating significance. Visualization of the protein models was performed using UCSF Chimera 1.16 [26].

Conclusions
In this study, we have originally demonstrated two candidate key residues for the transferase activity of tr-GELPs by a combination of two-entropy analysis, predictive structure modeling, and docking simulations. The present study paves the way for investigating the evolutionary molecular mechanisms underlying the acquisition of transferase activity. Experimental validation of the functional roles for R153 and D336 on the transferase activity of TciGLIP is underway.