Transcriptome Analyses Revealed the Key Metabolic Genes and Transcription Factors Involved in Terpenoid Biosynthesis in Sacred Lotus

As the largest group of structurally diverse metabolites, terpenoids are versatile natural compounds that act as metabolism mediators, plant volatiles, and ecological communicators. However, few terpenoid compounds have been identified in plant parts of sacred lotus (Nelumbo nucifera Gaertn.). To elucidate the molecular genetic basis of the terpene biosynthetic pathway, terpenes from different parts of the plant, including seeds (S), young leaves (YL), mature leaves (ML), white flowers (WF), yellow flowers (YF), and red flowers (RF), were identified by LC-MS/MS and the relative contents of the same terpenes in different parts were compared. The results indicate that all plant parts primarily consist of triterpenes, with only minor quantities of sesquiterpenes and diterpenes, and there were differences in the terpene content detected in different plant parts. To illustrate the biosynthesis of various terpenoids, RNA sequencing was performed to profile the transcriptomes of various plant parts, which generated a total of 126.95 GB clean data and assembled into 29,630 unigenes. Among these unigenes, 105 candidate unigenes are involved in the mevalonate (MVA) pathway, methyl-erythritol phosphate (MEP) pathway, terpenoid backbone biosynthesis pathway, and terpenoid synthases pathway. Moreover, the co-expression network between terpene synthase (TPS) and WRKY transcription factors provides new information for the terpene biosynthesis pathway.


Introduction
Sacred lotus belongs to Nelumbonaceae, which contains only two species, sacred lotus (Nelumbo nucifera Gaertn.) and American lotus (Nelumbo lutea Pers.) [1]. Sacred lotus is one of the economically best-known aquatic plants in East Asia, with a history of cultivation for nearly 2000 years in China, and is commonly used for medicine and ornamentation [2,3]. A large number of metabolites, including alkaloids, steroids, flavonoids, triterpenoids, glycosides, and polyphenolshave, were widely detected in sacred lotus and are tightly related to their pharmacological activities, such as anti-ischaemia, antioxidant, anticancer, antiviral, as well as anti-obesity [4][5][6][7][8][9].
As the largest group of structurally diverse metabolites, terpenoids are versatile natural compounds that act as metabolism mediators, plant volatiles, and ecological communicators. Triterpenoids are important plant secondary metabolites with pharmacological effects, such as anticancer, antiviral, and cholesterol lowering. According to the number of rings in the structure, triterpenoids can be classified into monocyclic triterpenes, bicyclic triterpenes, tricyclic triterpenes, tetracyclic triterpenes, and pentacyclic triterpenes, etc. Among them, pentacyclic triterpenoids have received more attention because of more types and functions [10][11][12][13]. Biologically, triterpene saponins are considered to be defensive compounds against external stresses [14,15]. Terpenoids are sequentially biosynthesized from the universal C5 precursors as dimethylallyl diphosphate (DMAPP) and isopentenyl diphosphate (IPP) [16][17][18]. These C5 precursors are sequentially catalyzed by prenyltransferases to form prenyl diphosphates, including geranyl pyrophosphate (GPP), farnesyl pyrophosphate (FPP), and geranylgeranyl pyrophosphate (GGPP). These prenyl diphosphates are further used as the immediate precursors for the biosynthesis of various terpenoids, such as monoterpenes (C10), sesquiterpenes (C15), diterpenes (C20), triterpenes (C30), tetraterpenes (C40), as well as polyterpenes (more than C40) by the action of terpene synthase (TPS). Transcription factors (TFs) regulate the accumulation of terpenoids by activating or repressing the promoters of TPS genes to control their expression and five TFs have been identified as being involved in the regulation of terpene synthesis [19,20].
Despite the scientific and industrial interest in sacred lotus, terpenoid components have not been identified. Moreover, the genes involved in terpenoid biosynthesis were been completely studied in aquatic botany. In recent years, RNA sequencing (RNA-Seq) and metabolite analysis were developed as powerful tools for identifying genes involved in the biosynthesis of various secondary metabolites in higher plants, including Salvia officinalis Linn. [21], Cinnamomum camphora (L.) Presl. [10], and Salvia guaranitica St. Hil. [22].
As one of the best-known medicinal plants, few terpenoid compounds have been identified in plant parts of N. nucifera, and the molecular genetic basis of terpenoid biosynthesis pathways is still unveiled. Considering the specific characteristics of sacred lotus, it was of interest to profile the terpenoid biosynthesis and key genes related with the terpenoid metabolic pathway. In this study, the contents and compositions of terpenoids in seeds, young leaves, mature leaves, and flowers with different colors were analyzed with LC-MS/MS. In addition, the key genes and transcription factors involved in terpenoid biosynthesis were screened by RNA-seq. We believe these results will shed light on the mechanism of terpenoid biosynthesis in sacred lotus.

Identification of Terpenoid Components in Various Plant Parts
For unveiling the distribution of terpenoids in flowers with different colors, the widely planted cultivars with a red flower (Jinlinghuodu), white flower (Baiyinlian), and yellow flower (Jinsenianhua) were selected. In order to compare terpenoids in various plant parts, the seeds, young leaves, and mature leaves from Taikonglian 36 were harvested ( Figure 1A). In total, 909 metabolites were detected in six sets of samples (three biological replicates per set) of lotus based on the UPLC-MS/MS detection platform and the self-constructed database. Because a broad-target metabolomic analysis was performed, which had some limitations for detecting each class of triterpenoids, cluster analysis of metabolites revealed that the 909 metabolites could be classified into 12 classes ( Figure 1B). In addition, a total of 21 terpenoids were identified in different plant parts, and among these 21 terpenoids, 16 terpenoids could be detected in all plant parts, but their contents appeared different in these plant parts (Table 1).

RNA-Seq and Transcriptomic Assembly
To identify genes involved in terpenoid biosynthesis in sacred lotus, 18 RNA libraries were prepared and analyzed. In total, 249,600,000 reads were obtained from these 18 libraries (accession number: PRJNA857167). After filtering the original data, clean reads were obtained for subsequent analysis (Table S1). Additionally, clean reads were mapped to the sacred lotus genome; the percentage of sequencing reads generated by each sample successfully aligned to the genome was higher than 80%. These results indicated that the quality of these mapped genes was sufficient to conduct the subsequent analysis (Table S2).

Gene Annotation and Functional Classification
BLAST alignment was utilized to annotate the 29,630 unigenes of sacred lotus with an E-value threshold of 1e −5 in the public databases: NR (NCBI non-redundant protein sequences), KOG (euKaryotic Ortholog Groups), PFAM (Protein family), GO (Gene Ontology), and KEGG (Kyoto Encyclopedia of Genes and Genomes). In summary, 76,070 unigenes were successfully annotated by at least one database and 18,045 unigenes shared annotation in all databases (Figure 2A).  Notes: Index, Metabolite number, Class II, secondary classification of substances. Q1, the molecular weight of the parent ion after the substance was added to the ion by the electrospray ion source, Q3, the characteristic fragment ion, level, substance identification level, "1", sample substance secondary mass spectra, RT and database substance matching score of 0.7 or more; "2", sample substance secondary mass spectra, RT and database substance matching score of 0.5-0.7, "3", sample substance's five detection parameters Q1, Q3, RT, DP, CE, and database substance check are consistent. The values in the table are the relative content of metabolites, without units, calculated by calculating the peak area formed in the detector by the characteristic ions of each substance. It is not the  Notes: Index, Metabolite number, Class II, secondary classification of substances. Q1, the molecular weight of the parent ion after the substance was added to the ion by the electrospray ion source, Q3, the characteristic fragment ion, level, substance identification level, "1", sample substance secondary mass spectra, RT and database substance matching score of 0.7 or more; "2", sample substance secondary mass spectra, RT and database substance matching score of 0.5-0.7, "3", sample substance's five detection parameters Q1, Q3, RT, DP, CE, and database substance check are consistent. The values in the table are the relative content of metabolites, without units, calculated by calculating the peak area formed in the detector by the characteristic ions of each substance. It is not the absolute content of the substance, but the detection conditions are consistent and can be used to compare the differences in the same substance in different samples.
Based on sequence homology, 22,171 annotated unigenes were categorized into three ontologies with 58 GO terms ( Figure 2B). Within the category of cellular component (CC), genes matched to 18 GO terms, the most highly represented of which were 'cell' (15,881 unigenes), 'cell part' (15,817 unigenes), and 'cellular process' (13,402 unigenes). For the molecular function (MF) category, 'binding' (12,464 unigenes) and 'catalytic activity' (10,985 unigenes) were the two most abundant of 12 GO terms. The largest asso-ciated term within the 28 GO terms of the biological process (BP) was 'cellular process' (13,402 unigenes).
(10,985 unigenes) were the two most abundant of 12 GO terms. The largest associated term within the 28 GO terms of the biological process (BP) was 'cellular process' (13,402 unigenes).

Identification of Differential Expression Genes (DEGs)
To fully explore potential DEGs in various plant parts, 22,336 DEGs were identified by comparing the eleven groups (S vs. ML, S vs. RF, S vs. WF, S vs. YF, S vs. YL, WF vs. RF, WF vs. YF, YL vs. ML, YL vs. WF, and YL vs. RF). The total number of DEGs, up-regulated DEGs, and down-regulated DEGs in each group are presented in Table S3. To further characterize the expression patterns of differential genes, we performed k-means clustering analysis on all DEGs. The results showed that 22,336 DEGs were assigned to 12 clusters with different numbers of DEGs per cluster, ranging from 1012 to 4266 ( Figure 3).

Functional Classification of DEGs
The 50 GO-Terms with the lowest q-value in the enrichment analysis results were selected and a column chart of the enrichment items was drawn ( Figure 4A). Among the 33 GO terms of biological process (BP), the largest related term was "cell process". For the Cellular component (CC) category, the DEGs mainly participated in the chloroplast thylakoid, plastid thylakoid, and other organelles. In the category of molecular function (MF), "structural molecule activity" was the largest related term. RF, WF vs. YF, YL vs. ML, YL vs. WF, and YL vs. RF). The total number of DEGs regulated DEGs, and down-regulated DEGs in each group are presented in Table S further characterize the expression patterns of differential genes, we performed k-m clustering analysis on all DEGs. The results showed that 22,336 DEGs were assigned clusters with different numbers of DEGs per cluster, ranging from 1012 to 4266 (Figu

Functional Classification of DEGs
The 50 GO-Terms with the lowest q-value in the enrichment analysis results selected and a column chart of the enrichment items was drawn ( Figure 4A). Amon 33 GO terms of biological process (BP), the largest related term was "cell process". Fo Cellular component (CC) category, the DEGs mainly participated in the chloro thylakoid, plastid thylakoid, and other organelles. In the category of molecular fun (MF), "structural molecule activity" was the largest related term.

DEGs Involved in Terpenoid biosynthesis
To unveil the regulatory mechanism of terpenoid accumulation patterns in different plant parts of sacred lotus, we analyzed the expression profiles of genes involved in terpenoid biosynthesis. In total, 105 DEGs related to terpenoid biosynthesis were identified in sacred lotus and their expression patterns are shown in Figure 5. In total, 55 DEGs were annotated to the terpenoid backbone synthesis pathway (ko00900), of which 13 and 11 DEGs participated in the MVA and MEP pathways, respectively. Furthermore, DEGs encoding key enzymes in the MVA pathways exhibited a higher expression level in YL compared with S and ML. Interestingly, AACT2 and HMGS2 exhibited the highest mRNA level in S, while HMGR3 was the highest expressed in ML ( Figure 5 and Table S4). Similar to the MVA pathway, the expression levels of DEGs involved in MEP in leaves was still higher than seeds, and the expression levels of DEGs in ML were significantly higher than YL ( Figure 5 and Table S4). Moreover, six unigenes encoding IPTS were identified, including geranyl diphosphate synthase (GPPS), farnesyl diphosphate synthase (FPPS), and geranylgeranyl diphosphate synthase (GGPPS). The expression level of FPPS in YL was higher than that in S and ML and also had higher transcript levels in the three differently colored flowers, which was consistent with the expression pattern of DEGs in the MVA pathway. The expression level of GPPS in S was lower than in other parts, but there was no obvious difference in leaves and flowers. The different expression patterns in the three GGPPS transcripts are worth noting. The GGPPS2 transcript exhibited the highest expression level in mature leaves, with the highest mRNA level of GGPPS3 in WF, RF, and YF. Moreover, the expression level of GGPPS1 was relatively lower than that of GGPPS2 and GGPPS3 ( Figure 5 and Table S4).

Co-Expression Network of TPS and WRKY Transcription Factors
Transcription factors (TFs) usually control the expression of TPS genes by activ or repressing their promoters, thereby regulating the accumulation of terpenoids. In 47 DEGs were annotated as WRKY transcription factors, 27 of which were highly c lated with 15 terpenoids (Table S6, Figure S2). A co-expression network of these 27 W transcription factors and the 40 genes encoding TPS was constructed (Figure 7). Th sults showed that WRKY9 (gene-LOC104588731) was negatively correlated with GE Figure 6. Correlation analysis of TPS genes and released amounts of terpene compounds (Pearson correlation coefficient > 0.8). The circular nodes are TPS genes, the triangular nodes are terpenes, the red line represents positive correlation, and the blue line represents negative correlation.

qRT-PCR Validation of DEGs
To verify the reliability of the RNA-Seq data, the expression of 12 unigenes was detected by qRT-PCR (Figure 8). The expression levels of these genes determined by qRT-PCR were almost consistent with those inferred from the RNA-Seq FPKM data (Figure 8), except for the expression pattern of AACT1 and FPPS1 in white flower.

qRT-PCR Validation of DEGs
To verify the reliability of the RNA-Seq data, the expression of 12 unigenes was detected by qRT-PCR (Figure 8). The expression levels of these genes determined by qRT-PCR were almost consistent with those inferred from the RNA-Seq FPKM data (Figure 8), except for the expression pattern of AACT1 and FPPS1 in white flower.

Discussion
The remarkable health and disease alleviation activities of the scared lotus are associated with the high content of bioactive compounds, including polyphenols, flavonoids, phenolic acids, alkaloids, terpenoids, steroids, fatty acids, and glycosides [23]. However, most studies focus on flavonoids, alkaloids, and phenolic acids, while few have been conducted on terpenoids [24][25][26]. In this study, in total, 21 terpenoids were detected (Table 1). Although the genes involved in terpenoid biosynthesis are widely studied in different plants [20,[27][28][29][30][31][32][33], genes related to the synthesis of terpenoids in sacred lotus have not been studied so far. The precursors of monoterpenes and diterpenes come from the MEP pathway in plastids, while the precursors of sesquiterpenes and triterpenes come from the MVA pathway in the cytoplasm [34]. Considering the two rate-limiting enzymes, HMGR and DXS played important roles in the overall regulation of the MVA and MEP pathways, respectively [35][36][37]. Among five genes encoding HMGR identified in this study, HMGR1, HMGR2, and HMGR4 exhibited the highest expression in YL, with the lowest expression in ML ( Figure 5). In total, three genes encoding DXS were detected; DXS2 and DXS3 had the same expression pattern, with higher expression in leaves than in seeds, and there was no significant difference in expression in the three colors of flowers, which corresponded to the metabolic content results. This result indicated that the distribution of terpenoids in different plant parts was closely related to the expression pattern of genes involved in the MVA and MEP pathways. However, the relative contribution of each pathway to the biosynthesis of various terpenoids was still uncertain.
IPTS are the key enzymes that connect the upstream MVA and MEP pathways with downstream isoprenoid biosynthesis branch points of different structures. In recent years, GPPS, GGPPS, FPPS, and CPPS, as the key enzymes in the biosynthesis of isoprenoid compounds, were identified in higher plants. In addition, most studies showed that the transcription levels of these genes were closely related to the content of corresponding terpenoids [38][39][40][41]. For example, FPPS overexpression increased the production of ganoderic acid [42]. A high number of monoterpenes was produced by overexpression of peppermint (Mentha × piperita) GPPS.SSU in transgenic tobacco plants [43] and the content of abietane diterpenes in Salvia sclarea L. hairy roots was increased by engineering the GGPPS and CPPS [44]. In this research, six DEGs encoding IPTS were identified, including one GPPS, two FPPS, and three GGPPS. We found that the expression of FPPS was higher in young leaves than in seeds and mature leaves, which is consistent with the results detected for the metabolic content of triterpenoids. It has been shown that the GGPPS transcript was mainly concentrated in the above-ground parts, such as leaves, flowers, or fruits, and the expression patterns of different GGPPS homologous genes in the same plant were various [45][46][47][48][49][50][51]; the expression patterns of GGPPS2 and GGPPS3 identified in this study are consistent with this conclusion.
As a key enzyme playing an important role in the synthesis of terpenoids, the TPS genes were identified in A. thaliana, tomato, rice, and tobacco [52][53][54][55][56]. Further, a large number of different TPS and their products were the chief reasons for the variety in terpenes [57,58]. In this study, 40 DEGs encoding TPS exhibited a high correlation with 20 terpenes (Figure 6). In previous studies, multiple TPS in most plants were shown to be multi-product enzymes [58]. Our study indicated that most TPS in lotus also showed a high correlation with multiple products and some terpenoids were also highly correlated with multiple TPS. Therefore, it is speculated that these terpenoids may be regulated by multiple TPS. WRKY transcription factors have been reported to regulate terpenoid production by activating or inhibiting promoter binding of TPS [20,[59][60][61]. In our study, our results showed that 26 members of WRKY were closely related to 34 TPS genes (Figure 7). These phenomena suggest that WRKY transcription factors play an important role in the regulation of terpenoid production, but the exact mode of action needs to be further explored and confirmed.

LC-MS/MS Analysis
Biological samples were freeze dried using a vacuum freeze dryer (Scientz-100F) and crushed using a mixer mill (MM 400, Retsch) with a zirconia bead for 1.5 min at 30 Hz. Then 100 mg of lyophilized powder was dissolved with 1.2 mL 70% methanol solution, and vortexed 30 s every 30 min 6 times in total. The samples were refrigerated 4 • C overnight and centrifugated at 12,000 rpm for 10 min. The extracts were filtrated (SCAA-104, 0.22 µm pore size; ANPEL, Shanghai, China, http://www.anpel.com.cn/, accessed on 5 September 2021) before UPLC-MS/MS analysis. Chromatography and mass spectrometry were performed with the assistance of Wuhan Metware Biotechnology Co., Ltd. (Wuhan, China) as the method described [62]. Metabolite quantification was accomplished by multiple reaction monitoring (MRM) analysis using triple-quadrupole mass spectrometry. The characteristic ions of each substance were screened by triple quadruple rods, the signal intensity (CPS) of the characteristic ions was obtained in the detector. The sample offline mass spectrometry file was opened with MultiaQuant software. The integration and calibration of the peaks were performed, and the peak area (Area) of each peak represented the relative content of the corresponding substance. The internal standard used was 2-chloro-L-phenylalanine (CAS: 103616-89-3, Bailiwick, 106151-100 mg). The HCA (hierarchical cluster analysis) results of metabolites were presented as heatmaps and normalized signal intensities of metabolites (unit variance scaling) were visualized as a color spectrum.

Total RNA Extraction and Transcriptome Sequencing
Total RNA was extracted using the Total Plant RNA Extraction Kit (Tsingke, Beijing, China). RNA concentration and purity were determined using NanoPhotometer ® spectrophotometer (IMPLEN, München, BY, GER) and RNA concentration was measured using Qubit ® RNA Assay Kit in Qubit ® 2.0 Flurometer (Life Technologies, Waltham, CA, USA). RNA integrity was assessed using the RNA Nano 6000 Assay Kit of the Bioanalyzer 2100 system (Agilent Technologies, Palo Alto, CA, USA). A total amount of 1 µg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext ® UltraTM RNA Library Prep Kit for Illumina ® (NEB, Ipswich, MA, USA) following manufacturer's recommendations. The cDNA libraries were sequenced on the Illumina sequencing platform by Metware Biotechnology Co., Ltd. (Wuhan, China).

RNA-Seq Analysis
The original data were filtered to remove reads with adapters. HISAT version 2.1.0 (Daehwan Kim, Baltimore, MA, USA) was used to construct the index and compare clean reads to the reference genome (https://www.ncbi.nlm.nih.gov/genome/?term=AQOG0 1, accessed on 25 September 2021) [63], and StringTie version 1.3.4d (Mihaela Pertea, Baltimore, MA, USA) was used for new gene prediction [64]. We used featureCounts version 1.6.2 (Yang Liao, Parkville, VIC, AUS) to calculate the gene alignment and then calculated the transcript per million fragments mapped (FPKM) of each gene based on the gene length. DESeq2 version 1.22.1 (Michael I Love, Heidelberg, BW, GER) [65,66] was used to analyze the original count data and to screen for DEGs, and genes satisfying |log2Fold Change| ≥ 1 and False Discovery Rate (FDR) < 0.05 were defined as differentially expressed genes (DEGs). The enrichment analysis was performed based on the hypergeometric test. The gene co-expression network analysis was performed using the Metware Cloud, a free online platform for data analysis (https://cloud.metware.cn, accessed on 3 January 2022), and Pearson correlation coefficient greater than |0.8|.

Validation of DEGs by qRT-PCR Analysis
The key DEGs in the terpenoid synthesis pathway were selected to validate their expression levels by quantitative reverse transcription-polymerase chain reaction (qRT-PCR). Synthesis of cDNA for qRT-PCR was performed with First Strand cDNA Synthesis Kit (ThermoFisher Scientific, Shanghai, China). The primers were synthesized by Tsingke Biotechnology Co., Ltd. (Wuhan, China) ( Table S7). The reaction system of qRT-PCR was as follows: cDNA template 0.5 µL at 3000 ng/µL, gene-specific upstream and downstream primers 0.4 µL each at 10 µM, 2 × ChamQ Universal SYBR qPCR Master Mix (Vazyme, Nanjing, China) 10 µL, 8.7 µL of double-distilled water (ddH2O). qRT-PCR was performed by CFX96 Real Time System (Bio-Rad, Hercules, CA, USA). Young leaves were selected as the control and 26s rRNA as the internal reference genes. The relative expression of genes in the qRT-PCR experiment was analyzed by the 2 − CT method [67]. At least three biological replicates were set for each sample.

Conclusions
In this study, 21 terpenoids were detected in sacred lotus and 105 genes involved in terpenoid synthesis were identified. WRKY, an important transcription factor, is highly correlated with the TPS gene family and might play an important regulatory role in the terpenoid synthesis process. This study illustrated the metabolic diversity of terpenoids in sacred lotus and provided a global overview of the gene expression profiles related with terpenoid biosynthesis in sacred lotus.