E ﬀ ective Glycosylation of Cucurbitacin Mediated by UDP-Glycosyltransferase UGT74AC1 and Molecular Dynamics Exploration of Its Substrate Binding Conformations

: Cucurbitacins, a group of diverse tetracyclic triterpenes, display a variety of biological e ﬀ ects. Glycosylation mediated by glycosyltransferases (UGTs) plays a vital role in structural and functional diversity of natural products and inﬂuences their biological activities. In this study, GT-SM, a mutant of UGT74AC1 from Siraitia grosvenorii , was chosen as a potential catalyst in glycosylation of cucurbitacins, and its optimal pH, temperature, and divalent metal ions were detected. This enzyme showed high activity ( k cat / K m , 120 s − 1 µ M − 1 ) toward cucurbitacin F 25-O-acetate (CA-F 25 ) and only produced CA-F 25 2-O- β - d -glucose which was isolated and conﬁrmed by 1 D and 2 D nuclear magnetic resonance. A pathway for uridine diphosphate glucose (UDP-Glc) regeneration and cucurbitacin glycoside synthesis was constructed by combing GT-SM and sucrose synthase to cut down the costly UDP-Glc. The molar conversion of CA-F 25 was 80.4% in cascade reaction. Molecular docking and dynamics simulations showed that CA-F 25 was stabilized by hydrophobic interactions, and the C2-OH of CA-F 25 showed more favorable catalytic conformation than that of C3-OH, explaining the high regioselectivity toward the C2-OH rather than the ortho-C3-OH of CA-F 25 . This work proved the important potential application of UGT74AC1 in cucurbitacins and provided an understanding of glycosylation of cucurbitacins.


Introduction
Cucurbitacins, a group of tetracyclic triterpenoids found in cucurbitaceae, are active compounds that exhibit extensive pharmacological functions, such as anticancer and anti-inflammatory effects [1][2][3]. These compounds are bitter and show innate cytotoxicity, thereby impeding their medical application [4]. Their structure consists of a tetracyclic cucurbitane nucleus skeleton, namely 19-(10→9b)-abeo-10alanost-5-ene (also known as 9b-methyl-19-nor lanosta-5-ene, Figure 1), with a variety of oxygenation functionalities at different positions. According to their structures, cucurbitacins are divided into twelve categories. In the structure of cucurbitacins, modifications (e.g., glycosylation and esterification) are usually occurred at C2, C3, C16, C24, and C25-OH [4]. Integrated strategies, such as genome sequencing, RNA sequencing, and metabolomics, were applied to uncover the biosynthesis pathway of cucurbitacins in plants. The key enzymes (e.g., 2,3-oxidosqualene cyclase, cytochromes P450, and acetyltransferase) involved in the biosynthesis of cucurbitacin C in cucumber and mogrosides in Siraitia grosvenorii (Cucurbitaceae) were identified [5,6]. However, as a vital modification affecting a variety of physiological activities of the compound, the glycosylation of cucurbitacins catalyzed by glycosyltransferases is complicated due to the multiple potential hydroxyl sites [7], and the function of the enzymes needs to be characterized. Uridine diphosphate glucose (UDP)-glycosyltransferases (UGTs), which catalyze sugar moieties from activated sugar donors (e.g., UDP-sugar) to acceptors (e.g., natural products), usually play a vital role in catalyzing the glycosylation reactions of natural products in plants by influencing their bioactivity [8], water solubility [9], taste properties [10], and color [11]. To date, there are several reported UGTs that are involved in the glycosylation of cucurbitacins. UGT74AC1 [12] and UGT720-269-1 [5] from S. grosvenorii were identified to catalyze the C3-OH and C24-OH of mogrol, respectively. UGT73AM3 from cucumber showed activity toward cucurbitacin C and produced cucurbitacin C 3-O-β-glucopyranoside. However, this enzyme failed to glycosylate cucurbitacin B and cucurbitacin E that share structural similarity to cucurbitacin C and possess hydroxyl at C2 position [7]. Increasing UGTs have been identified and characterized from plants, where most of them catalyze sugar transfer at positions C3 and C28/24 of saponins [13]. No UGTs were reported to catalyze the C2-OH of cucurbitacins.
Crystal structures of protein with substrates or products play an important role in understanding enzyme-substrate interaction and the molecular basis of catalysis mechanism. Three main kinds of structural topologies of GTs (namely, GT-A, GT-B, and GT-C folds) have been identified to date [14]. In the GT1 family, an increasing number of UGT crystal structures, including multifunctional triterpene/flavonoid UGT74AC1 (PDB ID: 6L90) from S. grosvenorii [15], UGT76G1 (PDB ID: 6INI) from Stevia rebaudiana [16], and UGT71G1 (PDB ID: 2ACW) from Medicago truncatula [17], has been solved.
Like others in GT1, these UGTs contain a GT-B fold ( Figure S1), consisting of two β/α/β Rossmann-like domains. Two key residues (e.g., His 22 and Asp 121 in UGT71G1 [17] and His18 and Asp111 in UGT74AC1 [15]) are irreplaceable for catalysis. A histidine in the N-terminal acts as the general base for deprotonation of the acceptor molecule, and an aspartate stabilizes the catalytic conformation and plays an essential role in enzyme activity by interacting with the histidine. With the availability of high-resolution structures of enzyme-substrate/product complexes, structure-based molecular docking and molecular dynamics (MD) simulations can be conducted to uncover the molecular basis of substrate binding and regioselectivity of the enzyme.
In a recent study, the activity of UGT74AC1 was successfully engineered to increase activity toward triterpenoid, providing a series of mutants, including M5 (GT-SM, T79Y/L48M/R28H/L109I/S15A), M6 (T79Y/L48M/R28H/L109I/S15A/M76L), and M7 (T79Y/L48M/R28H/L109I/S15A/M76L/H47R), which showed 1 × 10 2 to 1 × 10 4 enhancement in catalytic efficiency and new activity toward different tetracyclic triterpenoids [15]. In the present study, glycosylation of cucurbitacin F 25-acetate (CA-F 25 ) with these mutants as biocatalysts was performed. GT-SM, one of the mutants of UGT74AC1, showed the highest activity towards the C2-OH of CA-F 25 . The enzymatic properties of GT-SM were characterized at different pH and temperatures. A UDP-glucose (UDP-Glc) regeneration system was developed by coupling sucrose synthase and GT-SM to conquer the requirement of supplement of costly UDP-Glc, which hampers the application of UGTs. The binding mode of the acceptor in the enzyme was researched through MD simulation to explain the glycosylation on C2-OH rather than C3-OH.

Characterization of GT-SM Subsection
In our previous study, UGT74AC1, which is obtained from S. grosvenorii, showed activity toward mogrol (a derivative of cucurbitacin), and its activity was significantly enhanced through varieties of protein engineering strategies, producing a series of mutants [15]. In this study, UGT74AC1 and its mutants including M5 (GT-SM), M6, and M7 were heterologously expressed in Escherichia coli BL21(DE3) ( Figure S2). The activity of these enzymes towards cucurbitacin F 25-acetate (CA-F 25 ) showing a similar structure to mogrol was determined with UDP-Glc as the sugar donor. As shown in Figure S3, GT-SM showed the highest activity toward the triterpenoid.
GT-SM with N-terminal His-tag and Trx•Tag was expressed in E. coli BL21 (DE3) and purified with Ni-NTA agarose affinity column. SDS-PAGE analysis showed that the resulting molecular weight of the purified recombinant GT-SM was approximately 66 kDa ( Figure S4). In order to identify the biochemical characterization of GT-SM, the reaction temperature ranged from 25 to 60 • C, and the reaction buffer was designed from pH 5.0 to 10.0 with CA-F 25 and UDP-Glc as acceptor and sugar donor, respectively. As shown in Figure 2a, GT-SM showed that the maximum activity (100%) was achieved at 50 • C, and its activity maintained more than 50% at the temperature range of 30-55 • C. Analysis of the enzyme activity ranging from pH 5.0 to 10.0 showed that the maximum activity was achieved at pH 8.0 ( Figure 2b). The effect of metal ions on the activity of GT-SM showed that the addition of Co 2+ , Fe 2+ , Zn 2+ , Ni 2+ , Cu 2+ , and Mn 2+ ions obviously reduced the activity (Figure 2c). Mg 2+ , Ba 2+ , and Ca 2+ ion exhibited a positive effect on the enzymatic activity, which is consistent with other previous reports of UGTs [18].

Kinetic Analysis of GT-SM toward Cucurbitacin F 25-Acetate
The kinetic parameters of GT-SM were determined using CA-F 25 and UDP-Glc as sugar acceptor and donor, respectively ( Figure S5). As shown in Table 1, the K m of GT-SM for CA-F 25 was 39.34 µM, and k cat was 4.66 s −1 . Compared with the kinetic parameters of S. grosvenorii UGT74AC1 toward mogrol and UGT73AM3 from cucumber toward cucurbitacin C [19,20], the catalytic efficiencies of GT-SM against CA-F 25 were considerably higher. This finding indicated that GT-SM was an effective biocatalyst for glycosylation of triterpenoids. Table 1. Kinetic parameter of GT-SM toward cucurbitacin F 25-acetate.

Enzyme
Substrate

Glycosylation of Cucurbitacin F 25-Acetate by GT-SM
As traditional Chinese medicines or folk herbal medicines, many cucurbitaceaes are well-documented pharmacologically and contain phytochemical constituents, such as cucurbitacins with potent activities. Medicinal herbs are the important source of a substantial number of new cucurbitacins. CA-F 25 was obtained from Hemsleya graciliflora, which is traditionally used for the treatment of fever, pain, and inflammation [4].
As shown in Figure 3b, high-performance liquid chromatography (HPLC) analysis of the result of reaction product showed a new peak at retention time of 12.6 min when CA-F 25 (Figure 3c). After the product was purified by preparative HPLC (yield: 45 mg, 70.3%), the structure of the compound was confirmed by 1 H NMR, 13 C NMR, and 2 D NMR, including HMBC, HSQC, and COSY (Table 2 and Figures S6-S10). The observation of significant downfield 13 C shift of the C2 carbon confirmed that a glucosyl moiety was attached to the C2-OH of CA-F 25 [21]. The detailed HMBC correlations between the C2 of CA-F 25 and the anomeric proton of the glucosyl moiety (δ H 4.34, J = 7.80 Hz), and the large anomeric proton-coupling constant, indicated that the product was the 2-O-β-D-glucoside, suggesting an inverting mechanism of the enzyme. UGTs showing the catalysis of C2-OH of cucurbitacins have not been reported before [7]. In this study, GT-SM was identified to glycosylate CA-F 25 at C2-OH, and showed strong regioselectivity toward the substrate, whereas other positions (including the C3, C16, and C20-OH) of CA-F 25 cannot be catalyzed.  The high catalytic efficiency of GT-SM toward CA-F 25 indicated that the enzyme was a promising biocatalyst for the glycosylation of cucurbitacins. However, in the glycosylation of UGTs, the costly UDP-Glc, as sugar donor, is consumed and converted into UDP, the by-product which may inhibit the progress of glycosylation and reduce the conversion rate of glycosylation. One-pot reaction was established by coupling GT-SM to sucrose synthase (Susy) to cut down the usage of UDP-Glc. Susy can efficiently convert cheap sucrose and UDP into UDPG and fructose to realize the regeneration of UDP-Glc. In the GT-SM-Susy cascade reaction, CA-F 25 and UDP-Glc were transformed into CA-F 25 2-O-glucoside and UDP. Catalyzed by Susy, UDP along with sucrose was reutilized and transferred into UDP-Glc for glycosyltransferase (Figure 3a).
Thus, the Susy gene from A. caldus (AcSusy) was amplified and heterologously expressed in E. coli BL21. The glycosylation of CA-F 25 using one-pot reaction by coupling GT-SM to AcSusy was conducted. PH 7.0 was adopted in the GT-SM-AcSusy cascade reaction to regenerate UDP-Glc in the reversible reaction catalyzed by AcSusy and maintain the activity of enzymes. The ratio of GT-SM and AcSusy was 200 mU:200 mU/mL, and the conversion of CA-F 25 was determined at 40 • C, 1.00 mM CA-F 25 , and 0.25 mM UDP-Glc. Thus, the conversion of CA-F 25 achieved 80.4% after 30 min, which was higher than that of glycosylation catalyzed by only GT-SM (38%). These results indicated that the GT-SM/AcSusy was an effective approach for the glycosylation of cucurbitacins with the regeneration of costly UDPG. To date, glycosylation of several natural products has been achieved by UGT-Susy cascade reaction. These products have included dammarane-type tetracyclic triterpenoids (e.g., protopanaxadiol) [22], oleanane-type triterpenoid saponin (e.g., glycyrrhetinic acid) [19], diterpene (e.g., rebaudioside) [23], flavonoids (e.g., phloretin) [20,24,25], and polyhydroxystilbene compound (e.g., pterostilbene) [26]. Because the glycosylations catalyzed by UGTs and sucrose synthase are reversible reactions in the one-pot reaction, the absolute conversion of the substrate did not reach. Increasing the concentration of substrates (e.g., sucrose and sugar acceptor) or removing products in time would maximize the production of glycoside.

Molecular Basis for Recognition of GT-SM towards Cucurbitacin F 25-Acetate
No UGTs were reported to glycosylate the C2-OH of cucurbitacin, and their binding mode in the enzyme was not studied. In our previous report [15], the crystal structures of UGT74AC1 and its mutant were solved by X-ray crystallography. Similar to other plant UGT structures, the overall structure of UGT74AC1 showed conserved GT-B fold, which is one of the three main types of structural topologies of GTs. UDP-Glc was completely buried in a long narrow channel within the C-terminal domain. The sugar acceptor adjacent to UDP-Glc binding site was mainly interacted with N-terminal domain. The dyad, His18-Asp111, was demonstrated to be crucial and irreplaceable in the S N 2-like mechanism, thereby providing the structure basis for understanding the interaction between protein and substrate. In order to uncover the molecular basis of substrate binding and the glycosylation of CA-F 25 on C2-OH rather than C3-OH, two conformations were chosen from molecular docking according to the reaction mechanism, with C2-OH or C3-OH near the catalytic residues His18 and sugar ring of UDP-Glc. A 200 ns unconstrained MD simulation was performed for each conformation, and the last 100 ns MD trajectory with 10,000 frames in total was utilized to calculate the emerging frequency of catalytic conformations, which supports catalysis. The analysis of the (GT-SM)-(UDP-Glc)-(CA-F25) complex trajectory showed that the frequency value of C2-OH in the catalytic conformation was 84.42% ( Figure 4). In its representative conformation, the C2-OH was in the catalytic conformation, and the distance between C2-OH and catalytic residue His18 was 2.8 Å. At the same time, the substrate was stabilized by hydrophobic interactions (between the CA-F 25 and A11, F13, G66, L188, I185, R70, and W370) and hydrogen bonds (between His18, W370, and CA-F 25 ).
For tetracyclic triterpenes, which do not possess C2-OH, such as mogrol, C3-OH of those compounds was glycosylated by the enzyme. However, only CA-F 25 2-O-β-D-Glc was produced, and C3-OH was not utilized in the glycosylation of CA-F 25 . To explain the reasons for the disability of C3-OH, the trajectory of C3-OH near the catalytic residues His18 and sugar ring of UDP-Glc was analyzed. As shown in Figure 4b, the frequency value of C3-OH in the catalytic conformation was only 6.32%, which is lower than that of C2-OH (84.42%). At the same time, significant steric hindrance was created by two methyl groups in the ortho position of C3-OH, thereby keeping C3-OH of the acceptor away from the C1 of sugar donor and hindering the reaction. Thus, the highly favorable formation of catalytically competent poses and favorable catalytic conformation of C2-OH resulted in the easier glycosylation at C2-OH rather than ortho-C3-OH.

Activity Assay of Glycosyltransferase
Sugar acceptors were dissolved by dimethyl sulfoxide (DMSO), and UDP and UDP-Glc were dissolved in distilled water (ddH 2 O). Reactions (300 µL) containing 50 mM Tris-HCl (pH 8.0), 0.2 mM substrates, 1 mM UDP-Glc, 10 mM MgCl 2 , and 5 µg of purified enzyme were performed at 40 • C to test the enzyme activity of UGTs. Methanol (300 µL) was added to quench the reactions after an incubation time of 10 min. The reactions were then analyzed through high-performance liquid chromatography (HPLC) with a reverse-phase Ultimate C18 column (4.6 mm × 250 mm, 5 µm particle, Welch, Shanghai, China) and an ultraviolet detector at 205 nm. Mobile phase A was ddH 2 O with 0.1% formic acid, and mobile phase B contained CH 3 CN and 0.1% formic acid. The gradient was as follows: 0-25 min, 25-80% B, and the flow rate was 1 mL/min. Electrospray ionization-mass spectrometry parameters were as follows: the scan range was 100-1500 m/z in positive ion mode, spray voltage was 4500 V, capillary temperature was 400 • C, dry gas was 6 mL/min, dry temperature was 180 • C, and nebulizer pressure was 1 bar.

Optimization of Glycosyltransferase Activity
Glycosylations were conducted at various pH and temperatures to identify the effects of pH and temperature on the activity of UGT. The reaction buffers ranged from pH 5.0 to 10.0 (CH 3 COOH-CH 3 COONa: 5.0 and 6.0; Tris-HCl: 6.0, 7.0, 8.0 and 9.0; Glycine-NaOH: 9.0 and 10.0), and temperature varied from 25 to 60 • C. Divalent metal ions, including MnCl 2 , ZnCl 2 , CaCl 2 , CoCl 2 , CuCl 2 , MgCl 2 , and EDTA, were tested for UGT activity. The assays were performed with individually divalent metal ions (5 mM, final concentration) mentioned above using UDP-Glc as a donor and CA-F 25 as acceptors.

Kinetic Analysis of Glycosyltransferase towards Cucurbitacin F 25-acetate
The reactions were measured in 300 µL of the reaction buffer, including 50 mM Tris-HCl (pH 8.0), 1 mM UDP-Glc, 10 mM MgCl 2 , and 0.5 µg of the purified enzyme, to detect the kinetic analysis of UGT toward CA-F 25 . The mixtures were performed at 40 • C for 5 min, with the concentration of CA-F 25 varying from 0.01 to 0.4 mM, and then quenched by adding an equal volume of methanol. The reaction mixture was analyzed through HPLC as described before.

Molecular Dynamics Simulations
UDP-Glc and CA-F 25 were docked into the structure of GT-SM using Schrödinger2017-2 [28]. In accordance with the catalytic mechanism, two conformations, in which C2-OH or C3-OH is near the catalytic residues His18 and sugar ring of UDP-Glc, were chosen and applied to MD. A constraint (3.0 Å, 50 kcal mol −1 ) between the O 2 or O 3 of CA-F 25 and the NE2 atom of the catalytic His18 was applied in the first 10 ns of MD to simulate the induced-fit process of substrate binding. Unconstrained MD simulation (200 ns) was conducted with the AMBER18 MD package. The complete simulation methodology used in this work was as previously reported [14] and is available in the supporting information. A total of 10,000 frames were analyzed in the last 100 ns to calculate the emerging frequency of catalytic conformations, in which the distance between the 2/3-hydroxyl-O of CA-F 25 and the NE2 nitrogen of the catalytic residue His18 was less than 3.6 Å, and the angle of NE2/2(3)-hydroxyl-H/2(3)-hydroxyl-O was larger than 135 • . Their representative conformation was obtained by aligning and clustering the catalytic conformations based on the backbone atoms of the protein. Pymol [29] was used to visualize models and construct graphical illustrative figures.

Purification and Structural Analysis of Glycoside
The product was purified with a preparative HPLC system, as described previously [14]. The purified glycosylated product was redissolved in dimethyl sulfoxide-d 6 , subjected to an AVANCE III 600 MHz spectrometer and characterized by 1 H NMR, 13 C NMR, and 2D NMR spectra, including heteronuclear single quantum coherence (HSQC), heteronuclear multiple bond correlation (HMBC), and correlation spectroscopy (COSY).

Conclusions
In summary, GT-SM, which is a mutant of triterpene glycosyltransferase UGT74AC1 from S. grosvenorii, was characterized and successfully glycosylated the C2-OH of cucurbitacin F 25-acetate. Its product was purified and confirmed by NMR. GT-SM-AcSusy cascade reaction was utilized to reduce the expensive UDP-Glc, providing a new biocatalyst for the glycosylation of cucurbitacin. MD simulations were performed to provide the interactions between the enzyme and substrate and a better understanding of the function of C2-OH glycosylation of cucurbitacin by the enzyme. The molecular-level mechanisms of the glycosylation of CA-F 25 at C2-OH serve as guide for the development of efficient and regioselective UGTs for glycosylation of natural products.

Conflicts of Interest:
The authors declare no conflict of interest.