Identification of a Novel Specific Cucurbitadienol Synthase Allele in Siraitia grosvenorii Correlates with High Catalytic Efficiency

Mogrosides, the main bioactive compounds isolated from the fruits of Siraitia grosvenorii, are a group of cucurbitane-type triterpenoid glycosides that exhibit a wide range of notable biological activities and are commercially available worldwide as natural sweeteners. However, the extraction cost is high due to their relatively low contents in plants. Therefore, molecular breeding needs to be achieved when conventional plant breeding can hardly improve the quality so far. In this study, the levels of 21 active mogrosides and two precursors in 15 S. grosvenorii varieties were determined by HPLC-MS/MS and GC-MS, respectively. The results showed that the variations in mogroside V content may be caused by the accumulation of cucurbitadienol. Furthermore, a total of four wild-type cucurbitadienol synthase protein variants (50R573L, 50C573L, 50R573Q, and 50C573Q) based on two missense mutation single nucleotide polymorphism (SNP) sites were discovered. An in vitro enzyme reaction analysis indicated that 50R573L had the highest activity, with a specific activity of 10.24 nmol min−1 mg−1. In addition, a site-directed mutant, namely, 50K573L, showed a 33% enhancement of catalytic efficiency compared to wild-type 50R573L. Our findings identify a novel cucurbitadienol synthase allele correlates with high catalytic efficiency. These results are valuable for the molecular breeding of luohanguo.


Introduction
Siraitia grosvenorii (luohanguo or monk fruit) is an herbaceous perennial of the Cucurbitaceae family. It is principally cultivated in Guilin city, Guangxi Province, China [1]. The fruit of S. grosvenorii has been used in China as a natural sweetener and as a folk remedy for the treatment of lung congestion, sore throat and constipation for hundreds of years [2]. To date, luohanguo products have been approved as dietary supplements in Japan, the US, New Zealand and Australia [3,4]. Mogrosides, the major bioactive components isolated from the fruits of S. grosvenorii, are a mixture of cucurbitane-type triterpenoid glycosides that have been proven to be powerful and zero-calorie sweeteners and can hence be used as a sucrose substitute for patients with diabetes and patients who are obese [5].
Because of their complex structures (Table A1), the chemical synthesis of these compounds is inherently difficult [6]. Currently, these valuable chemicals are mainly produced through their extraction from the fruits of S. grosvenorii. With the rapid rise in market demand, the production of extraction from the fruits of S. grosvenorii. With the rapid rise in market demand, the production of luohanguo extracts has increased rapidly from two tons in 2002 to 60 tons in 2007, becoming one of the fastest growing industries of traditional Chinese medicine extracts [7]. However, the extraction yield of these ingredients is limited by difficulties in S. grosvenorii cultivation, including a requirement for heavy artificial pollination, a scarcity of appropriate cultivatable land and a high purification cost due to the presence of seeds [8,9]. In addition, the low contents of the main active components also result in a high cost of extraction. According to the statistics of our team, the extraction cost can be reduced by 1% when the mogroside V (MV) content increases by 0.1%. Therefore, the production of high-quality S. grosvenorii is an urgent issue for research and in the production field.
In recent years, our team has selected many new S. grosvenorii varieties with better agronomic traits, such as higher yield, improved resistance to various diseases and seedless fruits [10][11][12][13]. In this study, we chose 15 S. grosvenorii varieties (E1, E2, E3, E5, E12, E23, E29, S2, S3, S10, S13, C2, C3, C6 and W4). From these varieties, elite germplasms, which constitute an important resource for S. grosvenorii breeding, may be developed. The macroscopic characteristics of the 15 S. grosvenorii varieties are very similar, especially beyond the flowering and fruiting periods. However, the quality varies dramatically. Therefore, further evaluations of S. grosvenorii germplasms based on the active component content are important to better serve molecular breeding purposes. Because all of these varieties are cultivated under the same conditions, we hypothesize that effective genetic polymorphisms in functional genes involved in MV biosynthesis are one of the most important reasons for the MV content change.
To date, key genes involved in the biosynthesis of mogrosides have been successfully cloned and characterized, including genes from five enzyme families: the squalene epoxidase (SQE) [14], cucurbitadienol synthase (CS) [15], epoxide hydrolase (EPH), cytochrome P450 (CYP450) [2,9] and UDP-glucosyltransferase (UGT) families ( Figure 1) [16]. Squalene is thought to be the initial substrate and precursor for triterpenoid and sterol biosynthesis. SQE has been generally recognized as the common rate-limiting enzyme in the common pathway from mevalonate (MVA) and methylerythritol phosphate (MEP) pathways, catalyzing squalene to 2,3-oxidosqualene [17,18]. In S. grosvenorii, the initial step in the biosynthesis of cucurbitane-type mogrosides is the cyclization of 2,3-oxidosqualene to form the triterpenoid skeleton of cucurbitadienol, which is catalyzed by CS. EPH and CYP450 further oxidize cucurbitadienols to produce mogrol, which is glycosylated by UGT to form MV. Single nucleotide polymorphism (SNP) derived markers, identified in coding sequences of different genes, have been developed to discriminate very similar cultivars [19][20][21]. Technological improvements make the use of SNP attractive for high throughput use in marker-assisted breeding, for population studies [22] and to develop high-density linkage maps for map-based gene discovery [23,24]. In the resulting 2.44 Mbp of aligned sequence of soybean, a total of 5551 SNPs were discovered, including 4712 single-base changes and 839 indels for an average nucleotide diversity of Theta = 0.000997 [25]. No SNP were analyzed in S grosvenorii so far though the genome was assembled successfully [26].
This study analyzes the variation of 23 profiles in 15 S. grosvenorii varieties aimed to identify favorite allele of SgSC gene that can be used increase the production of them in the breeding program. A holistic targeted secondary metabolomics analysis was conducted to quantitatively determine the contents of 21 mogrosides and two intermediates in 15 varieties of luohanguo samples using high-performance liquid chromatography with tandem mass spectrometry (HPLC-MS/MS) or gas chromatography mass spectrometry (GC-MS). SgCS genes from a total of 15 S. grosvenorii varieties were cloned to investigate the enzyme activity of the allele gene products. Using a combined approach that relied on SNP analyses, yeast expression, and site-directed mutagenesis, the key amino acid residues underlying triterpene product efficiency were identified, providing new insight into the molecular breeding of high-quality luohanguo.

Determination of the Levels of 21 Mogrosides in 15 S. grosvenorii Varieties
The developed HPLC-MS/MS method was applied to determine the levels of the 21 mogrosides in the fruits of different S. grosvenorii varieties. The quantitative analyses were performed by means of an external standard method [27]. The results are summarized in Table A2, and a graphical representation of the results is shown in Figure 2. The content of each targeted mogroside and the total content of each targeted analyte in the fruits of different S. grosvenorii varieties varied dramatically (7.77-19.97 mg/g). Among these, the level of M5 was much higher than that of any other mogroside. In the fruit of S. grosvenorii, the average total content of M5 was 63.86-fold, 15.86-fold, 5.05-fold and 11.26-fold higher than the contents of M2, M3, M4 and M6, respectively. According to a previous report, ripe fruits mainly contain M5, while unripe fruits have higher levels of M2 and M3. The M5 content significantly increased, while the levels of M2 and M3 dramatically decreased with increasing growing time (disappeared after 70 DAA) [28,29]. In our study, we did not find MIIA1 in most of the samples and found only a small amount of MIII. The average values of MIII, MIIIE, MIIIA1, and MIIIA2 were 0.10, 0.62, 0.06 and 0.15 mg/g, respectively.
In terms of individual constituents, MV was the predominant component in all samples. The MV content accounts for 45.23-63.58% of the total content of the 21 targeted analytes. This result is consistent with previous reports that MV was found in a sample of whole fruits of S. grosvenorii at levels of 49.29-66.96% [27]. Moreover, its content was in the range of 4.86-13.49 mg/g in 15 varieties of luohanguo samples, all of whose contents, except the S10 variety, were all higher than the 5 mg/g set in the Chinese Pharmacopeia. Then, 11-E-MV and MIVE were the second and third most abundant sweet mogrosides in the tested samples, respectively. The average values of 11-E-MV and MIVE were 2.67 and 1.12 mg/g, respectively.
MIIE, MIIIX, MIVA and SI are the biosynthetic precursor components for the synthesis of MV. MIIE was detected in E2, E3, E5 and E12 but not in the other samples. MIIIX was not detected because the standard was unavailable. The MIVA content ranged from 0.16 to 1.80 mg/g, with a mean value of 0.73 mg/g. As the sweetest among the mogrosides, SI was the fifth most abundant sweet mogroside in the tested varieties. The mean content of SI was 0.81 mg/g and ranged from 0.26 to 1.21 mg/g. The results showed that the contents of the mogrosides were significantly different between the different varieties of S. grosvenorii.
When the contents of related intermediates (squalene, cucurbitadienol) in luohanguo were analyzed by GC-MS, the results showed significant variations among the different varieties (p < 0.05 and 0.01, respectively). For example, the squalene content of C2 was 1.24 mg/g, whereas the squalene content of E3 was only 0.03 mg/g (Figure 3a). Similar to squalene, the contents of cucurbitadienol in the different varieties of luohanguo were obviously different. The mean content of cucurbitadienol was 0.50 mg/g and ranged from 0.17 to 1.80 mg/g (Figure 3b). The results revealed a wide range of variability among the 15 S. grosvenorii varieties for 23 quantitative components (Appendix Tables A2 and A3). The squalene and cucurbitadienol coefficients of variation (CVs) were higher than the mogroside CV. The CVs of squalene and cucurbitadienol were 135.03 and 86.03%, respectively. In contrast, the lowest CV belonged to the MV (21.12%). Because the M2, M3 and M4 contents were very low in ripe fruit, we speculated that catalysis by UGTs is not a rate-limiting step for MV biosynthesis in plants. Moreover, squalene and cucurbitadienol, the precursor of mogrosides, accumulates in fresh fruit. SgCS and CYP450 are the enzymes that can catalyze these two substrates, respectively. We proposed that the conversion of squalene and cucurbitadienol through improved activity of SgCS and CYP450 was proportional to the accumulation of MV. In this paper, we mainly focus on the enzyme catalytic efficiency of SgCS, and we think it is necessary to study the catalytic efficiency of CYP450 in the future. When cucurbitadienol increases under high SgCS catalytic efficiency, the content of M2, M3 and M4 may increase, and then it may lead to the increase of pharmaceutical ingredients MV via the same biosynthetic route they shared.

SNP Identification in ORF Region of the SgCS Gene
SgCS is a member of the oxidosqualene cyclase (OSC) gene family and catalyzes the cyclization of 2,3-oxidosqualene to cucurbitadienol. This step, catalyzed by OSCs, is the key branch-point leading to triterpenoid or sterol synthesis [30][31][32]. A 2800 bp full-length cDNA sequence of the SgCS gene encoding a 759-residue protein (between 200 and 2479 bp) was obtained from 15 varieties. To detect sequence polymorphisms among different cultivars, we aligned the ORF sequences, which revealed a total of 4 SNPs among 15 varieties. No InDels were distributed in the ORF region of the SgCS gene. The changes in nucleotides at the 84 and 148 sites were due to transitions (A-G or C-T), whereas transversions (A-T, A-C) existed in 618 and 1962 sites. Of the SNPs, only 148 sites were missense mutations that caused changes in amino acid sequences between Arg (R) and Cys (C). To date, a universal platform that can allow data comparisons across different laboratories has been used. The sequence alignment analysis of the 15 SgCS clones against the expressed sequence tags (ESTs) within GenBank (GenBank accession number: HQ128567) identified another SNP site in 1718 bp, and this SNP is a missense mutation ( Figure A2). Therefore, four types of SgCS protein variants The results revealed a wide range of variability among the 15 S. grosvenorii varieties for 23 quantitative components (Appendix A Tables A2 and A3). The squalene and cucurbitadienol coefficients of variation (CVs) were higher than the mogroside CV. The CVs of squalene and cucurbitadienol were 135.03 and 86.03%, respectively. In contrast, the lowest CV belonged to the MV (21.12%). Because the M2, M3 and M4 contents were very low in ripe fruit, we speculated that catalysis by UGTs is not a rate-limiting step for MV biosynthesis in plants. Moreover, squalene and cucurbitadienol, the precursor of mogrosides, accumulates in fresh fruit. SgCS and CYP450 are the enzymes that can catalyze these two substrates, respectively. We proposed that the conversion of squalene and cucurbitadienol through improved activity of SgCS and CYP450 was proportional to the accumulation of MV. In this paper, we mainly focus on the enzyme catalytic efficiency of SgCS, and we think it is necessary to study the catalytic efficiency of CYP450 in the future. When cucurbitadienol increases under high SgCS catalytic efficiency, the content of M2, M3 and M4 may increase, and then it may lead to the increase of pharmaceutical ingredients MV via the same biosynthetic route they shared.

SNP Identification in ORF Region of the SgCS Gene
SgCS is a member of the oxidosqualene cyclase (OSC) gene family and catalyzes the cyclization of 2,3-oxidosqualene to cucurbitadienol. This step, catalyzed by OSCs, is the key branch-point leading to triterpenoid or sterol synthesis [30][31][32]. A 2800 bp full-length cDNA sequence of the SgCS gene encoding a 759-residue protein (between 200 and 2479 bp) was obtained from 15 varieties. To detect sequence polymorphisms among different cultivars, we aligned the ORF sequences, which revealed a total of 4 SNPs among 15 varieties. No InDels were distributed in the ORF region of the SgCS gene. The changes in nucleotides at the 84 and 148 sites were due to transitions (A-G or C-T), whereas transversions (A-T, A-C) existed in 618 and 1962 sites. Of the SNPs, only 148 sites were missense mutations that caused changes in amino acid sequences between Arg (R) and Cys (C). To date, a universal platform that can allow data comparisons across different laboratories has been used. The sequence alignment analysis of the 15 SgCS clones against the expressed sequence tags (ESTs) within GenBank (GenBank accession number: HQ128567) identified another SNP site in 1718 bp, and this SNP is a missense mutation ( Figure A2). Therefore, four types of SgCS protein variants (50R573L, 50C573L, 50R573Q and 50C573Q) based on the missense mutation SNP sites ( Figure 4) were identified.

Activity Comparison
To measure cucurbitadienol synthetic activity, the four different types (50R573L, 50C573L, 50R573Q and 50C573Q) of CS genes from S. grosvenorii were codon-optimized, synthesized and subcloned into the yeast expression vector pCEV-G4-Km. The resulting recombinant plasmids and the empty vector pCEV-G4-Km, used as a negative control, were transformed into the BY4742 strain. Colonies were randomly picked from the plate and verified by PCR amplification. All correct colonies were cultivated in yeast extract peptone dextrose medium (YPD) medium with 2% glucose for 3 days. GC-MS analysis of the cell extract confirmed production of cucurbitadienol. Protein variants 50R573L and 50C573L generated products with similar cucurbitadienol yields (0.365 and 0.300 mg/g yeast cells), whereas 50R573Q and 50C573Q produced much smaller amounts of cucurbitadienol (0.015 and 0.022 mg/g yeast cells). At the same time, 50C573Q accumulated 3.7 times more squalene than 50R573L. These results demonstrate that SgCS variants from cultivated S. grosvenorii are highly divergent, as inferred from their product contents (Figure 5a,b).
The approach adopted here was to search for the SNP site of the gene that is known to be closely linked to a trait of interest, namely, content. Unexpectedly, SNPs were tested for significant deviation from equal distribution between the two groups of high cucurbitadienol content and low cucurbitadienol content with the χ 2 test. The results suggested that there were no significant relationships between these SNP sites and cucurbitadienol content in the different S. grosvenorii varieties. A possible explanation is that the accumulation of cucurbitadienol is a dynamic process because these classes of secondary metabolites share a common synthetic pathway [33]. The cucurbitadienol content in plants depends not only on the activity of SgCS but also on the downstream enzyme that can use cucurbitadienol as a substrate for synthase mogrosides.

Activity Comparison
To measure cucurbitadienol synthetic activity, the four different types (50R573L, 50C573L, 50R573Q and 50C573Q) of CS genes from S. grosvenorii were codon-optimized, synthesized and subcloned into the yeast expression vector pCEV-G4-Km. The resulting recombinant plasmids and the empty vector pCEV-G4-Km, used as a negative control, were transformed into the BY4742 strain. Colonies were randomly picked from the plate and verified by PCR amplification. All correct colonies were cultivated in yeast extract peptone dextrose medium (YPD) medium with 2% glucose for 3 days. GC-MS analysis of the cell extract confirmed production of cucurbitadienol. Protein variants 50R573L and 50C573L generated products with similar cucurbitadienol yields (0.365 and 0.300 mg/g yeast cells), whereas 50R573Q and 50C573Q produced much smaller amounts of cucurbitadienol (0.015 and 0.022 mg/g yeast cells). At the same time, 50C573Q accumulated 3.7 times more squalene than 50R573L. These results demonstrate that SgCS variants from cultivated S. grosvenorii are highly divergent, as inferred from their product contents (Figure 5a,b).
The approach adopted here was to search for the SNP site of the gene that is known to be closely linked to a trait of interest, namely, content. Unexpectedly, SNPs were tested for significant deviation from equal distribution between the two groups of high cucurbitadienol content and low cucurbitadienol content with the χ 2 test. The results suggested that there were no significant relationships between these SNP sites and cucurbitadienol content in the different S. grosvenorii varieties. A possible explanation is that the accumulation of cucurbitadienol is a dynamic process because these classes of secondary metabolites share a common synthetic pathway [33]. The cucurbitadienol content in plants depends not only on the activity of SgCS but also on the downstream enzyme that can use cucurbitadienol as a substrate for synthase mogrosides.
Measuring Km values in the presence of detergents is notoriously difficult because the concentrations of the substrate and enzyme are distorted by the biphasic aqueous and micellar system [34,35]. Although most of the enzyme and substrate are probably constrained to the restricted volume of the micelle, the soluble proportion is not readily determined. We consequently compared the catalytic competence of these enzymes using homogenate assays with substrate at a concentration (250 µM) well above the literature Km of 25-125 µM described for several plant cyclases. The activities of purified variants were tested, and the results are shown in Table 1. 50R573L showed the highest activity as the best variant. The specific activity of 50C573L showed a slight decline, with a loss of approximately 20% of activity compared to 50R573L. Inferences from in vitro experiments of 50R573L and 50C573L were nearly in agreement with the results of the in vivo experiments, indicating that the in vivo assay can serve as an alternative to the in vitro assay when the enzyme activities of 50R573Q and 50C573Q are too weak for measurements of Km and k cat . Measuring Km values in the presence of detergents is notoriously difficult because the concentrations of the substrate and enzyme are distorted by the biphasic aqueous and micellar system [34,35]. Although most of the enzyme and substrate are probably constrained to the restricted volume of the micelle, the soluble proportion is not readily determined. We consequently compared the catalytic competence of these enzymes using homogenate assays with substrate at a concentration (250 µM) well above the literature Km of 25-125 µM described for several plant cyclases. The activities of purified variants were tested, and the results are shown in Table 1. 50R573L showed the highest activity as the best variant. The specific activity of 50C573L showed a slight decline, with a loss of approximately 20% of activity compared to 50R573L. Inferences from in vitro experiments of 50R573L and 50C573L were nearly in agreement with the results of the in vivo experiments, indicating that the in vivo assay can serve as an alternative to the in vitro assay when the enzyme activities of 50R573Q and 50C573Q are too weak for measurements of Km and kcat.
The amino acid sequence of SgCS shares high similarity with the sequences of CSs (85% with CcCS from C. colocynthis, 89% with CpCS from C. pepo and 84% with CsCS from C. sativus). Residues 50R and 573L in SgCS correspond to 52R and 578L, respectively, in CcCS and CpCS and 70R and 599L, respectively, in CsCS. Previous study has shown that SgCS, which had the highest cucurbitadienol yield, is the wild-type CS enzyme with highest catalytic efficiency [36]. Therefore,  The amino acid sequence of SgCS shares high similarity with the sequences of CSs (85% with CcCS from C. colocynthis, 89% with CpCS from C. pepo and 84% with CsCS from C. sativus). Residues 50R and 573L in SgCS correspond to 52R and 578L, respectively, in CcCS and CpCS and 70R and 599L, respectively, in CsCS. Previous study has shown that SgCS, which had the highest cucurbitadienol yield, is the wild-type CS enzyme with highest catalytic efficiency [36]. Therefore, SgCS can be used as a guideline for other CS enzyme modification.

Improving the Activity of SgCS by Site-Directed Mutagenesis
To rationally produce a more catalytically efficient SgCS, site-directed mutagenesis of 50R573L was performed. We constructed five variants (50A573L, 50D573L, 50E573L, 50H573L and 50K573L) to verify the effects of 50 amino acid residues in SgCS. The subsequent enzyme assay by GC-MS revealed that the cucurbitadienol contents of 50A573L, 50E573L and 50H573L were reduced slightly in comparison with that of the wild-type, whereas that of 50K573L increased by approximately 33.5% ( Figure 5c). Notably, the catalytic efficiency of the 50K573L mutant was enhanced by approximately 1.62-fold compared to that of 50C573L. These results showed that lysine was the optimal amino acid for the 50 position, with higher activity. Therefore, to rationally produce more catalytically efficient SgCSs, site-saturation mutagenesis of the 50 position needs to be further performed individually.

Discussion
S. grosvenorii is an important herbal crop with multiple economic and pharmacological uses. Mogrosides, the main effective components of luohanguo, are partial substitutes for sucrose because of their extremely sweet and noncaloric characteristics, and increasing progress is being made in terms of molecular breeding and purification processes. That the production and distribution of mogrosides in monk fruits is regulated by genetic factors, growth times, environmental factors, physiological factors, and chemical factors is generally accepted. In this study, 15 varieties of S. grosvenorii were cultivated under the same conditions and harvested at the same time, suggesting that the differences in the active component contents were most likely caused by genetic differences. We hypothesize that effective genetic polymorphisms in functional genes involved in MV biosynthesis constitute one of the most important mechanisms of changes in MV content.
The biosynthesis pathway of mogrosides has been extensively studied, and several genes have been identified. The initial committed step in the biosynthesis of cucurbitane-type mogrosides is the cyclization of 2,3-oxidosqualene to form the triterpenoid skeleton of cucurbitadienol. This step is catalyzed by CS, which has been functionally characterized in several plants, including Cucurbita pepo [37], Citrullus colocynthis [38], Cucumis sativus [39] and S. grosvenorii [15]. However, no kinetic data for CS from these plants have been reported. In this study, the two key amino acid sites that determine the enzyme activity of cucurbitadienol synthesis were successfully identified through comprehensive SNP analysis among 15 varieties of S. grosvenorii. An in vitro study indicated that the wild-type 50R573L enzyme has quite good efficiency considering that many triterpene cyclases have very low efficiencies [34]. The apparent Km values for 3(S)-oxisqualene were determined to be 55µM for lanosterol cyclase in rat [40] and 33.8 µM for β-Amyrin cyclase in E. tirucalli L. [35], the reported values for pea cyclases being 25 and 50 µM, respectively [41]. The use of SNP information combined with site-directed mutation is an effective approach to enhance enzyme activity. In this study, this approach enabled us to identify a mutant 50K573L that had 133% efficiency compared to wild-type 50R573L. To rationally produce more catalytically efficient SgCSs, site-saturation mutagenesis of the 50 position needs to be further performed individually.
During the evolutionary process, plants have perfected their systems to produce diverse catalogs of compounds in adaptations to both natural and artificial selection processes. Rapid advances in genome sequencing technologies have greatly accelerated studies of the molecular basis underlying these evolutionary events [42,43], which provide deep insights into nature's strategy for survival in a specific ecological niche. Phylogenetic analysis showed that S. grosvenorii diverged from the Cucurbitaceae family approximately 40.95 million years ago [26]. We generated and analyzed a multiple sequence alignment to test whether the 50 and 573 amino sites of CS were conserved in the Cucurbitaceae plant family. The substitution of 573L with 573Q resulted in an almost complete loss of function in terms of producing cucurbitadienol, indicating that this amino acid site may serve as an active-site residue for 2,3-oxidosqualene cyclization to cucurbitadienol. Previous study showed that a point mutations Cys (C) to Tyr (Y) at position 393 of cucumber CS which disables the catalytic capacity of CcCS inhibiting bitterness biosynthesis in cucumber [39]. However, in contrast to the well-studied functionally characterized SgCS, key amino acid residues responsible for the formation of cucurbitadienol have not been identified. Therefore, further studies based on homologous modeling and site-directed mutagenesis should be carried out.
Integrating "good genes", into high-yielding, high-content cultivars continues to be one of primary objectives of many breeding programs [44]. Although substantial efforts have been made during the past twenty years to develop an optimal yield of the final product MV S. grosvenorii cultivars using conventional plant breeding methods, there has been limited success in achieving the desired goal [45]. After crossings one needs to screen thousands of individual plants for their performance, which is time consuming and costly as the plants have to be cultivated till there are fruits. With the advent of molecular biology techniques, it was presumed that developing high-quality cultivars would be convenient and relatively less time consuming. Results from our studies identified genetic sources of high SgCS efficiency that could be useful to breeders for S. grosvenorii improvement.
In S. grosvenorii, squalene is converted to cucurbitadienol by SQE, and cucurbitadienol is further converted to mogroside compounds by CYP450 and UGT. After the metabolic pathway of mogroside in S. grosvenorii was identified [2,9,16,26,46], the SNP sites of SQE, CYP450s and UGT could be discovered to reveal a new hypothesis that effective genetic polymorphisms in functional genes involved in MV biosynthesis can result in changes in MV content. Due to the large collection of high-efficiency mutants, the further step for breeding the "super" S. grosvenorii cultivar should focus on cotransformation of all of these homozygous functional enzymes with high activity into the elite germplasm S2 or E1. Additionally, these mutants can be valuable gene resources for the production of mogrosides by metabolic engineering.

Chemicals and Reagents
The following reference compounds were purchased from Chengdu Must Bio-Technology Co., Ltd. Nielsen and Claudia Vickers [47]. Other reagents were purchased from Beijing Chemical Corporation (Beijing, China) unless otherwise specified. The dry fruit samples were powdered to a homogeneous size (ca. 50 mesh) by a disintegrator (Shanghai Shuli Instrument, Shanghai, China). Approximately 0.5 g of powder from each sample was accurately weighed and introduced into a 50-mL capped conical flask with 25 mL of methanol/water (80:20, v/v). The flask was sealed and sonicated in a KQ-300 ultrasonic water bath (Kunshan Ultrasonic Instrument, Jiangsu, China) operating at 40 kHz with an output power of 300 W for 30 min at room temperature. A duplicate extract was prepared. Both extracts were mixed and transferred into a volumetric flask and then diluted to 100 mL with methanol/water (80:20, v/v) and filtered through a 0.22 µm microporous membrane.

Sample Collection and Preparation
For the analysis of the nonpolar compounds squalene and cucurbitadienol from S. grosvenorii, 50 mL of the above mentioned methanol/water extracts was extracted three times with the same volume of n-hexane. The combined extracts were dried under reduced pressure distillation and dissolved in 1 mL of n-hexane.

LC-MS/MS Analysis of 21 Mogrosides
The HPLC system consisted of an Agilent Technologies 1260 Series LC system (Agilent, USA) equipped with an automatic degasser, a quaternary pump, and an autosampler. Chromatographic separations were performed on an Agilent Poroshell 120 SB C18 column (100 mm × 2.1 mm, 2.7 µm) by gradient elution using a mobile phase consisting of (A) water (containing 0.1% formic acid) and (B) acetonitrile with the following gradient procedure: 0-8 min 25% B, 11 min 80% B, 11.01-11.50 min 80% B and 11.51-15.0 min 20% B, with a flow rate of 0.20 mL/min. The injection volume was 2.0 µL.
The column effluent was monitored using a 4000 QTRAP ® LC-MS/MS (AB Sciex, Toronto, Canada). Ionization was achieved using electrospray ionization (ESI) in the negative-ion mode with nitrogen as the nebulizer. Multiple reaction monitoring (MRM) scanning was employed for quantification. The source settings and instrument parameters for each MRM transition were optimized not only to maximize the generated deprotonated analyte molecule ([M − H] − ) of each targeted mogroside but also to efficiently produce its characteristic fragment/product ions. The electrospray voltage was set at −4500 V, and the source temperature was 500 • C. The curtain gas (CUR), nebulizer gas (GS1), and heater gas (GS2) were set at 15, 50, and 40 psi, respectively. The compound-dependent instrumental parameters of two individual precursor-to-product ion transitions specific for each analyte, including the precursor ion, two product ions, declustering potential (DP), entrance potential (EP), collision energy (CE), and collision cell exit potential (CXP), were optimized and are listed in Table 2. The dwell time was 400 ms for each MRM transition.
LC-MS/MS chromatograms of the standards of 21 mogrosides are presented in Figure A3a, and the samples are shown in Figure A3b.

SNP Analysis
The dried S. grosvenorii materials were wiped with 75% ethanol and ground into powder. Total DNA was extracted from approximately 100 mg of the powder with the plant genomic DNA kit following the manufacturer's instructions and dissolved in 30 µL of sterile water. SgCS sequences were amplified from genomic DNA by polymerase chain reaction (PCR) using the SgCS-F and SgCS-R primers (Table A4). The PCR mixture (30 µL) contained template DNA 0.8 µL, forward primer (10 µM) 0.7 µL, reverse primer (10 µM) 0.7 µL, 10× PCR Buffer 3.0 µL, KOD-Plus-Neo DNA polymerase 0.6 µL, dNTP (2 mM) 3.0 µL, MgSO 4 (25 mM) 1.2 µL and ddH2O 20.0 µL. The PCR conditions were 94 • C for 2 min, followed by 30 cycles at 98 • C for 10 s, 59 • C for 30 s and 68 • C for 2 min 30 s, with a final incubation at 68 • C for 7 min. PCR products were examined by 1.5% agarose gel electrophoresis before bidirectional DNA sequencing on a 3730XL sequencer (Applied Biosystems, Foster City, CA, USA). Sequences were aligned using DNAman (version 8.0, Lynnon Biosoft, Quebec, Canada), and SNPs were identified by visual inspection of the alignments.

Cloning 4 Different Copies of SgCS Genes in Yeast
Four different copies of SgCS genes were codon-optimized for synthesis according to the codon bias of yeast and cloned into the BamHI/EcoRI sites of the pCEV-G4-Km yeast expression vector under the control of the TEF1 promoter to construct pCEV-50R573L, pCEV-50C573L, pCEV-50R573Q and pCEV-50C573Q, respectively. The sequences that showed in National Center for Biotechnology Information (NCBI) database was obtained and codon-optimized as described by Qiao previously [36].

SgCS Mutagenesis Experiments
Mutagenesis of the 50 sites was performed using a Site-directed Mutagenesis Kit (Biomed, Beijing, China), and the corresponding degenerate primers are presented in Table A4 with the substitutions underlined. The PCR products were then purified by agarose gel electrophoresis and transformed into Trans1-T1 E. coli. The sequences of the mutant genes in the resulting plasmid (pCEV-G4-Km) were confirmed by Sanger sequencing using the oligonucleotide primers pCEV-Seq-F and pCEV-Seq-R (Table A4).

Yeast Transformation and Cell Cultivation
The plasmids were transfected into S. cerevisiae strain BY4742 using the Frozen-EZ yeast transformation II kit purchased from Zymo Research (CA, USA) and selected for growth on YPD plates with 200 mg/L G418. The empty pCEV-G4-Km vector was also introduced into BY4742 as a control.
The recombinant cells were first inoculated into 15 mL culture tubes containing 2 mL of YPD medium with 200 mg/L G418 and grown at 30 • C and 250 rpm to an OD600 of approximately 1.0. Flasks (250 mL) containing 100 mL of medium were then inoculated to an OD600 of 0.05 with the seed cultures. Strains were grown at 30 • C and 250 rpm for 3 days, and all optical densities at 600 nm (OD600) were measured using a Shimadzu UV-2550 spectrophotometer.

In Vitro Activity
The host strain S. cerevisiae BY4742 barboring different gene types of SgCS were grown under identical conditions to those described above and collected by centrifugation. Each yeast strain was suspended in 2 volumes of 100 mM sodium phosphate buffer, pH 7, and lysed using an Emulsiflex-C5 homogenizer. After lysis, 100 mM sodium phosphate buffer, pH 7, was added to generate a 20% slurry. A solution of 3(S)-oxidosqualene and Triton X-100 was added to the homogenate aliquots (350 µL) to a final concentration of 1 mg/mL substrate and 0.1% Triton X-100. After 0.5, 1, 3, 5, and 10 h, the reactions were terminated by adding two volumes of ethanol. The denatured protein was removed by centrifugation, and the supernatant was concentrated to dryness under a nitrogen stream. The residue was resuspended in n-hexane and analyzed by GC-MS.

GC-MS Analysis of Yeast and Plant Extracts
Cells were collected by centrifugation at 10,000× g for 5 min, refluxed with 5 mL of 20% KOH/50% ethanol and extracted three times with the same volume of n-hexane. The combined extracts were dried under reduced pressure distillation and dissolved in 1 mL of n-hexane.
GC-MS was performed on a Thermo Scientific ISQ Single Quadrupole GC-MS (Thermo Scientific, Waltham, MA USA). Separation of the components of the nonpolar active ingredients was carried out on an HP-5 MS capillary column (5% phenyl and 95% dimethylpolysiloxane, 30 m × 0.25 mm × 0.25 µm). An electron ionization system with an ionization energy of 70 eV was used in the mass spectrometer. Helium gas was used as the carrier gas, and the carrier flow rate was 1.5 mL/min. The temperatures of both the injector and MS transfer line were 250 • C, and the ion source temperature was 220 • C. The initial oven temperature was 70 • C, which was held for 2 min, and the temperature was then programmed to linearly increase at a rate of 20 • C/min up to 260 • C and finally increase by 10 • C/min up to 300 • C, where it was held for 10 min. A 1 µL sample was injected automatically into the monitoring system in 1:10 split mode. Under these conditions, squalene and cucurbitadienol eluted at 14.73 and 18.85 min, respectively.

Data Processing and Statistics
Data were expressed as the mean ± standard deviation. IBM SPSS statistics software 22.0 (SPSS Inc., Chicago, IL, USA) was used for data analysis, including descriptive statistics of the data, correlation analysis of the factors, regression analysis and the chi-square test. The boxplots were generated with Origin 8 (OriginLab Co., Northampton, MA, USA).

Conclusions
In this study, we conducted a biosynthesis-based secondary metabolomics analysis of 15 S. grosvenorii varieties. A high catalytic efficiency SgCS enzyme was obtained by SNP analysis and site-directed mutation. The genotypes and chemical differences of 15 S. grosvenorii varieties were revealed by SNP analysis of cucurbitadienol and quantitative analysis of secondary metabolomics, respectively. The 15 varieties showed significant differences in their secondary metabolite profiles. A total of four wild-type SgCS protein variants based on two missense mutation SNP sites were discovered. Moreover, a site-directed mutant, namely, 50K573L, produced a 33% enhancement in efficiency compared to wild-type 50R573L. Our findings thus identify a novel cucurbitadienol synthase allele correlates with high efficiency and provide new insight into molecular breeding of S. grosvenorii. Future studies are planned to include more varieties and functional genes, such as CYP450 and UGT, to produce high-quality S. grosvenorii. Gene editing technology can be used to knock out the "bad gene" and transfer the "better gene" into the elite S. grosvenorii germ. The "high catalytic efficiency" plant can be obtained and verified by PCR, Western Blot analysis and so on, which will lay the foundation for molecular design breeding. Acknowledgments: We thank Yuan Zhou (Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences) for providing the cucurbitadienol standard.

Conflicts of Interest:
The authors declare no conflicts of interest.  Figure A1. Hierarchical clustering analysis of S. grosvenorii varieties. Figure A2. Alignment of PCR fragment showing the SNP sites of SgCS. Figure A3. Typical LC-MS/MS total ion chromatograms.  Figure A1. Hierarchical clustering analysis of S. grosvenorii varieties. Species were grouped into four main clusters. Species S10 and S13 were categorized into Cluster I. S2, E23 and E5 were grouped into Cluster II. E1 was defined into third distinct cluster (Cluster III). The remaining species including C2, C3, C6, W4, S3, E29, E3, E12 and E2 were put into Cluster IV. Figure A1. Hierarchical clustering analysis of S. grosvenorii varieties. Species were grouped into four main clusters. Species S10 and S13 were categorized into Cluster I. S2, E23 and E5 were grouped into Cluster II. E1 was defined into third distinct cluster (Cluster III). The remaining species including C2, C3, C6, W4, S3, E29, E3, E12 and E2 were put into Cluster IV.