Integrated Chemical and Transcriptomic Analysis Reveals the Distribution of Protopanaxadiol- and Protopanaxatriol-Type Saponins in Panax notoginseng

Panax notoginseng is famous for its important therapeutic effects and commonly used worldwide. The active ingredients saponins have distinct contents in different tissues of P. notoginseng, and they may be related to the expression of key genes in the synthesis pathway. In our study, high-performance liquid chromatography results indicated that the contents of protopanaxadiol-(Rb1, Rc, Rb2, and Rd) and protopanaxatriol-type (R1, Rg1, and Re) saponins in below ground tissues were higher than those in above ground tissues. Clustering dendrogram and PCA analysis suggested that the below and above ground tissues were clustered into two separate groups. A total of 482 and 882 unigenes were shared in the below and above ground tissues, respectively. A total of 75 distinct expressions of CYPs transcripts (RPKM ≥ 10) were detected. Of these transcripts, 38 and 37 were highly expressed in the below ground and above ground tissues, respectively. RT-qPCR analysis showed that CYP716A47 gene was abundantly expressed in the above ground tissues, especially in the flower, whose expression was 31.5-fold higher than that in the root. CYP716A53v2 gene was predominantly expressed in the below ground tissues, especially in the rhizome, whose expression was 20.1-fold higher than that in the flower. Pearson’s analysis revealed that the CYP716A47 expression was significantly correlated with the contents of ginsenoside Rc and Rb2. The CYP716A53v2 expression was associated with the saponin contents of protopanaxadiol-type (Rb1 and Rd) and protopanaxatriol-type (R1, Rg1, and Re). Results indicated that the expression patterns of CYP716A47 and CYP716A53v2 were correlated with the distribution of protopanaxadiol-type and protopanaxatriol-type saponins in P. notoginseng. This study identified the pivotal genes regulating saponin distribution and provided valuable information for further research on the mechanisms of saponin synthesis, transportation, and accumulation.


Introduction
Panax notoginseng (Burk) F. H. Chen is a widely used herbal medicine with important pharmacological effects, such as anti-hypertensive, neuroprotective, anti-inflammatory estrogen-like, anti-atherosclerotic, anti-tumor, and hepatoprotective activities [1][2][3][4]. The medicinal properties of P. notoginseng are attributed to its bioactive constituents, namely, triterpene saponins, which are also known as ginsenosides. These saponins can be categorized into two groups based on the skeleton of their aglycones, namely, dammarane-type and oleanane-type saponins [5,6]. Dammarane-type saponins commonly include notoginsenoside R1 and ginsenosides Rb1, Rb2, Rc, Rd, Re, Rd, Rf, and Rg1 [7]. Oleanane-type saponins mainly comprise ginsenosides Ro and Roa [8]. Different parts of P. notoginseng contain various types of saponins and show diverse pharmacological activities [3,9]. For example, P. notoginseng roots have been widely used to treat cardiovascular diseases and other disorders [10]. P. notoginseng leaves are used to cure insomnia and alleviate anxiety [4]. P. notoginseng flowers have beneficial effects on hypertension, tinnitus, and vertigo therapy [11,12]. These results indicate that the parts of P. notoginseng elicit different therapeutic effects. The identification of the types and contents of these different parts is essential for its safe application in medicinal and health care products.
This study aimed to identify the main genes regulating the distribution of protopanaxadioland protopanaxatriol-type saponins in different parts of P. notoginseng based on the integrated analysis of high-performance liquid chromatography (HPLC) and transcriptome. First, the types and contents of saponins in different tissues of P. notoginseng were detected through HPLC. Subsequently, the transcriptome of the six parts was examined using Illumina HiSeq 4000 to identify the key genes related to saponin distribution. RT-qPCR analysis was performed to further confirm the main gene expression patterns in the tissues of P. notoginseng. Pearson's analysis was carried out to state the relationship between saponin contents and gene expression relative to the regulating saponin distribution.

Analysis of Saponin Contents in P. notoginseng
The analysis method was provided in Supplementary Table S1. The contents of the seven saponins were higher in the below ground tissues than in the above ground tissues (Figure 1). R1, Rg1, Re, Rb1, and Rd showed the highest levels in the rhizome (6.16, 48.42, 2.80, 28.13, and 7.73 mg·g −1 , respectively), followed by those in the root. The contents of Rc and Rb2 were high in the leaf (27.99 mg·g −1 ) and flower (29.45 mg·g −1 ) but were undetected in the below ground tissues (rhizome, root, and fibril). The contents of the total protopanaxatriol saponins (PTS) (NG-R1, G-Rg1, and G-Re) showed high levels in the below ground tissues, especially in the rhizome (57.38 mg·g −1 ) and root (29.09 mg·g −1 ). The contents of the total protopanaxadiol saponins (PDS) (G-Rb1, Rc, Rb2, and Rd) showed higher levels in the below ground tissues than in the above ground tissues. These results indicated that the contents of five saponins (R1, Rg1, Re, Rb1 Rd) and the total saponins were high in the below ground tissues, but Rc and Rb2 were only detected in the above ground tissues.

Analysis of Saponin Contents in P. notoginseng
The analysis method was provided in Supplementary Table S1. The contents of the seven saponins were higher in the below ground tissues than in the above ground tissues (Figure 1). R1, Rg1, Re, Rb1, and Rd showed the highest levels in the rhizome (6.16, 48.42, 2.80, 28.13, and 7.73 mg·g −1 , respectively), followed by those in the root. The contents of Rc and Rb2 were high in the leaf (27.99 mg·g −1 ) and flower (29.45 mg·g −1 ) but were undetected in the below ground tissues (rhizome, root, and fibril). The contents of the total protopanaxatriol saponins (PTS) (NG-R1, G-Rg1, and G-Re) showed high levels in the below ground tissues, especially in the rhizome (57.38 mg·g −1 ) and root (29.09 mg·g −1 ). The contents of the total protopanaxadiol saponins (PDS) (G-Rb1, Rc, Rb2, and Rd) showed higher levels in the below ground tissues than in the above ground tissues. These results indicated that the contents of five saponins (R1, Rg1, Re, Rb1 Rd) and the total saponins were high in the below ground tissues, but Rc and Rb2 were only detected in the above ground tissues. Total amount of protopanaxadiol saponins (G-Rb1, Rc, Rb2, and Rd); Total: Total amount of protopanaxatriol and protopanaxadiol saponins. Diffe rent le tters represent the significant differences between the five tissues at p < 0.05.

Transcriptome Analysis in P. notoginseng
Illumina Hiseq sequencing technology was employed to analyze the transcriptome of the 18 samples (six tissues, namely, flower, leaf, stem, rhizome, root, and fibril; three biological replicates for each part) and to clarify the mechanism of saponin synthesis in P. notoginseng. A total of 120.70 Gb of data were obtained, and the average output of each sample was 6.71 Gb of data. After filtering was performed, 805 M clean reads with an average size of 150 bp were used for the de novo assembly. Overall, 1,325,735 contigs with an average size of 846 bp were obtained, 158,551 unigenes were assembled, and the percentage of GC was 41.05% with an average contig size of 1274 bp and an N50 contig size of 2012 bp (Table 1 and Supplementary Table S2).

Transcriptome Analysis in P. notoginseng
Illumina Hiseq sequencing technology was employed to analyze the transcriptome of the 18 samples (six tissues, namely, flower, leaf, stem, rhizome, root, and fibril; three biological replicates for each part) and to clarify the mechanism of saponin synthesis in P. notoginseng. A total of 120.70 Gb of data were obtained, and the average output of each sample was 6.71 Gb of data. After filtering was performed, 805 M clean reads with an average size of 150 bp were used for the de novo assembly. Overall, 1,325,735 contigs with an average size of 846 bp were obtained, 158,551 unigenes were assembled, and the percentage of GC was 41.05% with an average contig size of 1274 bp and an N50 contig size of 2012 bp (Table 1 and Supplementary Table S2).

Functional Annotation of the Transcriptome
Distinct gene sequences were searched using BLAST with a cutoff E-value of 10 −5 against public databases, including Nr, KEGG, COG, and SWISS-PROT. A total of 47,184 unigenes (40.6%) returned a significant Blast result through this approach. Among the 116,214 unigenes, approximately 11,079 (9.5%) showed significant matches with the Nr database. A total of 1082; 1816; and 684 unigenes had significant matches with the KEGG, COG, and SWISS-PROT databases, respectively (Supplementary Figure S1).
GO assignments were used to classify the functions of the predicted P. notoginseng genes (Supplementary Figure S2A). A total of 367,640 sequences could be categorized into three major GO categories based on sequence homology, namely, biological process, cellular component, and molecular function. They were then classified into 55 functional groups. Cellular, metabolic, and single-organism processes were the most abundant GO terms in the biological process category. Cell and cell part were the most highly dominant terms in the cellular component category. Binding and catalytic activity were the most highly represented terms in the molecular function category. In addition, 35,921 sequences were annotated as a metabolic process category, suggesting that these genes were involved in the metabolite synthesis pathways.
All the unigenes obtained in the present study were subjected to a search against the COG database to predict and classify the possible functions of transcripts (Supplementary Figure S2B). A total of 90,920 unigenes were clustered into 25 function categories. Among the different COG classes, the major COG category was "general functional prediction only" (15,129), followed by "transcription" (8430), "replication, recombination, and repair" (7715), "translation, ribosomal structure, and biogenesis" (6831), "posttranslational modification, protein turnover, and chaperones" (6716), and "signal transduction mechanisms" (6305). The least-represented groups were "nuclear structure" (100) and "extracellular structures" (22). Interestingly, our dataset showed that 2233 uni-transcripts annotated with the COG database represented the "secondary metabolite biosynthesis, transport, and catabolism" category, indicating that a large number of secondary metabolites existed in P. notoginseng.
The annotated sequences were also mapped to the KEGG to identify the biological pathways of P. notoginseng ( Figure 2). As a result, 93,620 sequences were assigned to 21 KEGG pathways. The pathways with the most represented sequences were metabolism, followed by genetic information processing, cellular process, environmental information processing, organismal system, and human diseases. A total of 55,205 sequences were assigned to the metabolic pathways, including global and overview maps (21,073), carbohydrate metabolism (7938), lipid metabolism (4696), and amino acid metabolism (4643). Furthermore, a total of 2128 sequences were annotated to the metabolism of terpenoids and polyketides. These annotations provided useful information to investigate the specific functions and pathways in P. notoginseng.

Differential Expression of Transcripts in the Six tissues of P. notoginseng
Cluster dendrogram and PCA analysis were constructed in accordance with the FPKM value, which revealed the different transcripts between below and above ground tissues in P. notoginseng ( Figure 3).

Differential Expression of Transcripts in the Six tissues of P. notoginseng
Cluster dendrogram and PCA analysis were constructed in accordance with the FPKM value, which revealed the different transcripts between below and above ground tissues in P. notoginseng ( Figure 3).

Differential Expression of Transcripts in the Six tissues of P. notoginseng
Cluster dendrogram and PCA analysis were constructed in accordance with the FPKM value, which revealed the different transcripts between below and above ground tissues in P. notoginseng ( Figure 3).  The below ground and above ground tissues of P. notoginseng were clustered into two separate groups, and each part was assembled into the same branches based on the results of the clustering tree ( Figure 3A). The PCA results showed that the below ground and above ground tissues were clearly separated in the PC2 axis ( Figure 3B).
The tissue-specific expression genes were calculated to reveal the difference in the transcripts of various tissues of P. notoginseng (Figure 4). A total of 143,185 unigenes were applied to create gene intersection profiles. A total of 76,679 unigenes were shared among the six tissues. A total of 482 and 882 unigenes were shared in the below ground and above ground tissues, respectively. A total of 1351; 1724; 2042; 4668; 5206; and 12,038 unigenes were specifically expressed in the stem, rhizome, leaf, root, flower, and fibril, respectively. A large number of DEGs were found in the six tissues of P. notoginseng according to the results of the DEGs between two tissues with FC ≥ 2 and FDR ≥ 0.05 (Supplementary  Table S3). In comparison with those in rhizome, 1238; 13,983; 19,374; 16,372; and 21,897 DEGs were found in the samples of root, fibril, stem, leaf, and flower, respectively. This result suggested that the transcripts of the six tissues had different expression levels in P. notoginseng. The below ground and above ground tissues of P. notoginseng were clustered into two separate groups, and each part was assembled into the same branches based on the results of the clustering tree ( Figure 3A). The PCA results showed that the below ground and above ground tissues were clearly separated in the PC2 axis ( Figure 3B).
The tissue-specific expression genes were calculated to reveal the difference in the transcripts of various tissues of P. notoginseng (Figure 4). A total of 143,185 unigenes were applied to create gene intersection profiles. A total of 76,679 unigenes were shared among the six tissues. A total of 482 and 882 unigenes were shared in the below ground and above ground tissues, respectively. A total of 1351; 1724; 2042; 4668; 5206; and 12,038 unigenes were specifically expressed in the stem, rhizome, leaf, root, flower, and fibril, respectively. A large number of DEGs were found in the six tissues of P. notoginseng according to the results of the DEGs between two tissues with FC ≥ 2 and FDR ≥ 0.05 (Supplementary Table S3). In comparison with those in rhizome, 1238; 13,983; 19,374; 16,372; and 21,897 DEGs were found in the samples of root, fibril, stem, leaf, and flower, respectively. This result suggested that the transcripts of the six tissues had different expression levels in P. notoginseng.

Analysis of Genes Involved in Triterpenoid Saponin Biosynthesis
A total of 16 genes (175 transcripts) were identified to encode the known enzymes involved in saponin biosynthesis through the MVA and MEP pathways based on the analysis of transcriptome (FPKM ≥ 10) (Figures 5 and 6 and Supplementary Tables S4 and S5). The following major genes that were related to saponin synthesis were also identified: FPSs (2), SS (1), SEs (2), DDS (1), and Beta-AS (1).
Transcripts that participated in the MVA pathway had remarkable differential expression in the six tissues at FPKM ≥ 10 ( Figure 5B). The expression of the transcripts that encoded HMGR, HMGS, PMK, MVK, FPS, DS, SS, and SE was higher in the above ground tissues than in the below ground tissues, and the highest expression was detected in the flower. Overall, one HMGS

Analysis of Genes Involved in Triterpenoid Saponin Biosynthesis
A total of 16 genes (175 transcripts) were identified to encode the known enzymes involved in saponin biosynthesis through the MVA and MEP pathways based on the analysis of transcriptome (FPKM ≥ 10) (Figures 5 and 6 and Supplementary Tables S4 and S5). The following major genes that were related to saponin synthesis were also identified: FPSs (2), SS (1), SEs (2), DDS (1), and Beta-AS (1).
Transcripts that participated in the MVA pathway had remarkable differential expression in the six tissues at FPKM ≥ 10 ( Figure 5B). The expression of the transcripts that encoded HMGR, HMGS, PMK, MVK, FPS, DS, SS, and SE was higher in the above ground tissues than in the below ground tissues, and the highest expression was detected in the flower. Overall, one HMGS (Unigene85672_All), one HMGR (Unigene20600_All), one MVK (CL6309. Contig4_All), two FPSs (Unigene52014_All, CL4568. Contig1_All), and two SEs (CL2346. Contig4_All, CL6177. Contig2_All) were significantly expressed to a greater extent in the flower than in the other tissues. Two transcripts encoding AACT (Unigene6764_All, Unigene14400_All) were highly expressed in the rhizome, and one transcript (CL14665.Contig5_All) was highly expressed in the root. The expression of the MEP and other saponin biosynthesis pathways was similar to that of the MVA pathway ( Figure 5A). Among the eight highly abundant transcripts encoding DXS, four transcripts (CL2707. Contig 13_All, Unigene 8570_All, CL1460. Contig 14_All, and Unigene 5318_All) were highly expressed in the flower, and one transcript (CL13822. Contig3_All) was highly expressed in the leaf. One transcript (CL5346. Contig10_All) encoding DXR was highly expressed in the flower. Five transcripts (Unigene15_All, Unigene17_All, Unigene33577_All, CL2347. Contig11_All, and Unigene10721_All) encoding MECPS were highly expressed in the leaf, and one transcript (CL10747. Contig3_All) was highly expressed in the flower. were significantly expressed to a greater extent in the flower than in the other tissues. Two transcripts encoding AACT (Unigene6764_All, Unigene14400_All) were highly expressed in the rhizome, and one transcript (CL14665.Contig5_All) was highly expressed in the root. The expression of the MEP and other saponin biosynthesis pathways was similar to that of the MVA pathway ( Figure 5A). Among the eight highly abundant transcripts encoding DXS, four transcripts (CL2707. Contig 13_All, Unigene 8570_All, CL1460. Contig 14_All, and Unigene 5318_All) were highly expressed in the flower, and one transcript (CL13822. Contig3_All) was highly expressed in the leaf. One transcript (CL5346. Contig10_All) encoding DXR was highly expressed in the flower. Five transcripts (Unigene15_All, Unigene17_All, Unigene33577_All, CL2347. Contig11_All, and Unigene10721_All) encoding MECPS were highly expressed in the leaf, and one transcript (CL10747. Contig3_All) was highly expressed in the flower.

Analysis of Putative Genes Involved in Saponin Distribution in P. notoginseng
Among the synthetic pathways, specific gene families, such as CYP and UGT, enriched the diversity of saponins. A total of 75 expressed CYP transcripts had distinct expression in our samples (FPKM ≥ 10) ( Figure 6). A total of 38 CYPs were highly expressed in the below ground tissues, and the expression of 19 CYPs were the highest in the rhizome. A total of 37 CYPs were highly expressed in the above ground tissues, and the expression of 21 CYPs were the highest in the flower. In addition, one PDS (CL6973. Contig1_All) and two PTS (Unigene8203_All, CL50. Contig10_All) transcripts were identified in our CYP transcript data and annotated as CYP716A47 and CYP716A53v2, respectively. The PDS transcript was expressed in six tissues of P. notoginseng, and their expression was the highest in the flower. However, two PTS transcripts were more highly expressed in the rhizome and the root than those in the above ground tissues. Real time-qPCR was performed to detect the expression values of CYP716A47 and CYP716A53v2 genes and to further confirm their expression levels. The RT-qPCR results showed that CYP716A47 was most highly expressed in the flower, while CYP716A53v2 was the most highly expressed in the rhizome and the root. These findings were consistent with the RPKM values ( Figure 7A,B). Total of 52 highly expressed UGTs (FPKM ≥ 10) were found in the P. notoginseng transcriptome (Supplementary Figure S3).

Analysis of Putative Genes Involved in Saponin Distribution in P. notoginseng
Among the synthetic pathways, specific gene families, such as CYP and UGT, enriched the diversity of saponins. A total of 75 expressed CYP transcripts had distinct expression in our samples (FPKM ≥ 10) (Figure 6). A total of 38 CYPs were highly expressed in the below ground tissues, and the expression of 19 CYPs were the highest in the rhizome. A total of 37 CYPs were highly expressed in the above ground tissues, and the expression of 21 CYPs were the highest in the flower. In addition, one PDS (CL6973. Contig1_All) and two PTS (Unigene8203_All, CL50. Contig10_All) transcripts were identified in our CYP transcript data and annotated as CYP716A47 and CYP716A53v2, respectively. The PDS transcript was expressed in six tissues of P. notoginseng, and their expression was the highest in the flower. However, two PTS transcripts were more highly expressed in the rhizome and the root than those in the above ground tissues. Real time-qPCR was performed to detect the expression values of CYP716A47 and CYP716A53v2 genes and to further confirm their expression levels. The RT-qPCR results showed that CYP716A47 was most highly expressed in the flower, while CYP716A53v2 was the most highly expressed in the rhizome and the root. These findings were consistent with the RPKM values ( Figure 7A,B). Total of 52 highly expressed UGTs (FPKM ≥ 10) were found in the P. notoginseng transcriptome (Supplementary Figure S3).

Discussion
In this study, the types and contents of saponin components had tissue specificity in the below and above ground tissues of P. notoginseng. Protopanaxadiol-and protopanaxatriol-type saponin contents were higher in the below ground tissues than in the above ground tissues, especially in the rhizome. The transcripts were different in the below ground and above ground tissues of P. notoginseng. A total of 482 and 882 unigenes were shared in the below ground and above ground tissues, respectively. A total of 38 and 37 CYP transcripts (RPKM ≥ 10) were highly expressed in the below ground and above ground tissues, respectively. The expression level of CYP716A47 was high in the above ground tissues, especially in the flower, and the relative expression of CYP716A53v2 was high in the below ground tissues, especially in the rhizome. The contents of most saponin were remarkably related to the expression levels of CYP716A47 and CYP716A53v2, suggesting that the different expression of CYP716A subfamily genes (CYP716A47 and CYP716A53v2) resulted the specific saponin distribution in the different tissues of P. notogonseng.
Ginsenosides have uneven distribution and accumulation in Panax plants [8,24]. The levels of ginsenosides in berries are higher than those in P. ginseng roots [25]. The total ginsenoside contents in P. quinquefolius leaves are higher than those in its roots [26]. In our studies, the contents of saponins were higher in the below ground tissues than in the above ground tissues, especially in the rhizome, and these findings were similar to previous studies [27]. The rhizome is an important part of perennial plants, such as P. notoginseng, because it possesses the buds for sprouting in the succeeding year. The abundant accumulation of various saponins in the rhizome can play a defensive role against pathogens and insects. In addition, the contents of protopanaxadiol-(Rb1 and Rd) and protopanaxatriol-type (R1, Rg1, and Re) saponins were higher in the below ground tissues than those in the above ground tissues, whereas protopanaxadiol-type saponins (Rb2 and Rc) were detected in the above ground tissues only. These results were similar to those in previous studies [15,28]. Therefore, saponin types and contents had an uneven distribution in different tissues of P. notoginseng.
The transcripts in the below ground and above ground tissues of P. notoginseng differed. A total of 1351; 1724; 2042; 4668; 5206; and 12,038 unigenes were specifically expressed in the stem, rhizome, leaf, root, flower, and fibril, respectively, suggesting that the transcripts in the six tissues were differentially expressed. This observation was similar to previous findings [29]. These annotations and transcript differences provided useful information for investigating the functions and expression patterns of unigenes in P. notoginseng. A majority of the sequences involved in the carbohydrate metabolic processes were identified in P. ginseng [30]. In our data, 55,205 sequences were assigned to the metabolic pathways, and 2128 sequences were annotated to the metabolism of terpenoids and polyketides. Unigene expression has tissue-specific distribution in the roots, stems, leaves, and flowers of P. ginseng, and this finding was similar to our results [31]. Consistent with a previous study, our study demonstrated that several transcripts encoding HMGR, MGS, PMK, and MVK were highly expressed in the flowers [31]. The gene-related saponin synthesis showed high expression profiles in the above ground tissues of P. notoginseng. Five genes (i.e., FPS, SS, SE1, SE2 and DS) related to saponin biosynthesis were highly expressed in the above ground tissues, especially in the flower [13]. The high contents of saponins were detected in the below ground tissues of P. notoginseng in our study. The intermediate products of saponin synthesis may be actively synthesized in above ground tissues and transported to the below ground tissues [32], which revealed that contents of saponins were higher in the below ground tissues of P. notoginseng.
The expression patterns of CYP716A47 and CYP716A53v2 genes could regulate the distribution of protopanaxadiol-and protopanaxatriol-type saponins in the different tissues of P. notoginseng. CYP716 subfamily genes (CYP716A47 and CYP716A53v2) are involved in the dammarane-type triterpenoid production. CYP716A47 catalyzes the formation of protopanaxadiol from dammarenediol-II [21], and CYP716A53v2 catalyzes the formation of protopanaxatriol from protopanaxadiol in P. ginseng [22]. Previous studies revealed that CYP716A47 and CYP716A53v2 are highly expressed in the main root and the rhizome, especially in the periderm [30,33]. The expression level of CYP716A53v2 is higher in P. notoginseng roots than in its leaves and flowers [29], and this observation was consistent with our findings. The RNAi of CYP716A47 in the transgenic hairy root of P. ginseng and P. quinquefolius caused the reduction of ginsenoside synthesis, indicating the positive correlation between gene expression levels and ginsenoside contents [34]. In our study, CYP716A47 expression levels had a significantly positive correlation with the contents of protopanaxadiol-type saponins (Rc and Rb2). CYP716A53v2 gene expression levels had a significantly positive correlation with the contents of protopanaxadiol-type (Rb1 and Rd) and protopanaxatriol-type saponins. These results demonstrated that the expression patterns of CYP716A47 and CYP716A53v2 genes could regulate the distribution of protopanaxadiol-type and protopanaxatriol-type saponins in different tissues of P. notoginseng.
Ginsenoside Ro, oleanane-type saponin, was first isolated from P. japonicus and also found in other Panax species, such as P. ginseng, P. zingiberensis, and P. stipuleanatus [35,36]. Ginsenoside Ro was not detected in our P. notoginseng samples. One gene encoding beta-AS (Unigene 20794_All) and one gene encoding CYP716A52v2 (CL11590.Contig2_All) were annotated in our transcript data. Previous studies reported that CYP716A52v2, which served as a CYP716A subfamily gene, may be involved in the oleanane-type triterpene biosynthesis [23]. Similar to a previous study, our study indicated that the content of ginsenoside Ro might be too low to be detected in P. notoginseng [29].

Plant Samples
Nine plants of 3-year-old P. notoginseng were collected from Wenshan, Yunnan Province in China (23.5 • N and 104 • E) at their flowering stage in our plantation. P. notoginseng seedlings were separated into six different tissues, namely, root, rhizome, fibril, stem, leaf, and flower (Supplementary Figure S4). All voucher specimens were carefully washed, cut into small pieces, and immediately deposited at −80 • C for further processing. Six plants were used for chemical analysis, and other three plants were utilized for molecular experiment.

Saponin Extraction and HPLC analysis
Standards of saponins (i.e., Rg1, Re, Rd, Rb1, Rb2, Rc, and R1) were purchased from Shanghai Tauto Biotech Company (Shanghai, China). These standards have more than 98.0% purity, and their numbers are 16,042,724; 160,907; 160,924; 160,930; 160,606,121; 16,081,931; and 160,923, respectively. The standard stock solutions were dissolved with methanol (Analytical Grade, Fisher, Hampton, NH, USA).Saponins were extracted through ultrasonic extraction in accordance with previously described methods [15]. In brief, the samples were crushed, and 0.1 g of each sample was soaked in 1.0 mL of methanol solution. After more than 10 s of vortexing and 10 min of sonication, the mixture was frozen at −20 • C for 1 h and then centrifuged at 10,000 rpm for 10 min. The upper layer was collected, filtered through a 0.45 µm organic micropore filter, and transferred for HPLC analysis.
The samples were analyzed with an Agilent HPLC 1260 series system (Agilent, Santa Clara, CA, USA) equipped with a VWD detector, a quaternary pump, a column compartment, and an autosampler. A C 18 reversed phase Eclipse XDB column (5 µm, 4.6 mm × 250 mm, Agilent, Santa Clara, CA, USA) was used for the separation. The gradient was composed of acetonitrile (A) and water (B), and the linear gradient was set as follows in accordance with the Pharmacopoeia of the People's Republic of China: 0-12 min for 19% A and 12-60 min for 19% A to 36% A. The following parameters were also used: sample injection volume of 10 µL, wavelength of 203 nm, column temperature of 25 • C, and flow rate of 1.0 mL·min −1 .

RNA Extraction, Library Construction, and RNA Sequencing
Total RNA was isolated from P. notoginseng tissues by using a polysaccharide plant polyphenol quick RNA extraction kit (Bio Teke Corporation, Beijing, China) in accordance with the manufacturer's instructions. Three biological replicates were prepared. The integrity and quantity of RNA were assessed and determined by conducting gel electrophoresis and using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Hampton, NH, USA), respectively. Transcriptome libraries for RNA sequencing were constructed, detected with Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA) and ABI Step One Plus Real-time PCR System (Thermo Fisher Scientific, Hampton, NH, USA), and sequenced using an Illumina HiSeq 4000 platform (Illumina, San Diego, CA, USA) to obtain 150 bp paired-end reads.

Transcriptome Analysis
The raw sequencing reads were filtered to obtain the clean reads of each sample. Three criteria were used to filter these reads: adaptors, more than 10% "N" bases, low-quality reads more than 50% of bases having a qualtiy value ≤ 10. All the sequences obtained from the transcriptome were submitted to NCBI (Accession Number: SRR7427745-SRR7427748). De novo assembly from the clean reads of all the samples was conducted using Trinity with default parameters (The completeness of our assembly was evaluate by BUSCO and test result show the complete BUSCO were 88.7%.) (Supplementary Table S6) [37,38]. Unigenes were aligned through BLASTX with the best similar hit of an E-value < 1 × 10 −5 , including NCBI non-redundant (Nr) protein (ftp://ftp.ncbi.nih.gov/blast/ db/FASTA), NCBI (ftp://ftp.ncbi.nih.gov/blast/db), and SWISS-PROT (http://www.uniprot.org/). Homology similarity was searched in the databases, including NCBI (ftp://ftp.ncbi.nih.gov/blast/db) and SWISS-PROT (http://www.uniprot.org/), through BLASTX by using the best similar hit of an E-value < 1× 10 −5 . The classification of protein function was searched in the Gene Ontology (GO) database (http://www.geneontology.org/) and the Clusters of Orthologous Groups (COG) database (www.ncbi.nlm.nih.gov/COG). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database (http://www.genome.jp/kegg/keggl/.html) was used to determine the biological pathways of the unigenes.
The fragments per kilobase of exon model per million mapped reads (FPKM) were calculated to quantify the expression level among the six tissues of P. notoginseng. The false discovery rate (FDR) was used to compute the differences in the significance of transcript abundance. log 2 ratio ≥ 2 and FDR ≤ 0.05 were considered to identify differentially expressed genes (DEGs) (Differential expression analysis were carried out at the gene level.). Heatmaps were constructed using R programming language and software (R 3.4.2, R packages).

Real-Time qPCR Analysis of CYP716A47 and CYP716A53v2
The total RNA from P. notoginseng tissues was converted into a single-stranded cDNA by using a FastQuant RT kit with gDNase (TIANGEN, Beijing, China). Quantitative reactions were performed using a real-time qPCR system. The reaction mixture contained 10 µL of SYBR Green qPCR Master Mix (Toyobo, Osaka, Japan), 0.7 µL of 10 µM each of the forward and reverse primers, and 2 µL of the cDNA template. Sterile water was added to obtain a final volume of 20 µL. PCR amplification was conducted under the following conditions: 95 • C for 30 s, followed by 40 cycles of 95 • C for 5 s, 55 • C for 15 s, and 72 • C for 15 s, and a dissociation stage of 55 • C for 15 s and 95 • C for 15 s. All primers used in this study were listed in the Supplementary Table S7. The relative expression levels of the key genes were calculated using the 2 −∆∆ C t method [39].

Conclusions
In this study, the types and contents of saponin and the transcripts in the below ground and above ground tissues had tissue specificity. Protopanaxadiol-and protopanaxatriol-type saponins were higher in the below ground tissues than those in the above ground tissues. The transcripts were different in the below ground and above ground tissues of P. notoginseng. Among 38 and 37 CYP transcripts were highly expressed in the below ground and above ground tissues, respectively. Pearson's analysis showed that CYP716A47 and CYP716A53v2 could regulate protopanaxadiol-and protopanaxatriol-type saponin distribution in different tissues of P. notoginseng.