Transcriptome Analysis of Methyl Jasmonate-Elicited Panax ginseng Adventitious Roots to Discover Putative Ginsenoside Biosynthesis and Transport Genes

The Panax ginseng C.A. Meyer belonging to the Araliaceae has long been used as an herbal medicine. Although public databases are presently available for this family, no methyl jasmonate (MeJA) elicited transcriptomic information was previously reported on this species, with the exception of a few expressed sequence tags (ESTs) using the traditional Sanger method. Here, approximately 53 million clean reads of adventitious root transcriptome were separately filtered via Illumina HiSeq™2000 from two samples treated with MeJA (Pg-MeJA) and equal volumes of solvent, ethanol (Pg-Con). Jointly, a total of 71,095 all-unigenes from both samples were assembled and annotated, and based on sequence similarity search with known proteins, a total of 56,668 unigenes was obtained. Out of these annotated unigenes, 54,920 were assigned to the NCBI non-redundant protein (Nr) database, 35,448 to the Swiss-prot database, 43,051 to gene ontology (GO), and 19,986 to clusters of orthologous groups (COG). Searching in the Kyoto encyclopedia of genes and genomes (KEGG) pathway database indicated that 32,200 unigenes were mapped to 128 KEGG pathways. Moreover, we obtained several genes showing a wide range of expression levels. We also identified a total of 749 ginsenoside biosynthetic enzyme genes and 12 promising pleiotropic drug resistance (PDR) genes related to ginsenoside transport.


Introduction
Ginseng (P. ginseng C.A. Meyer) is well recognized as the king of herbs and widely used as a source of herbal medicine in eastern Asia and North America [1]. The genus "panax" means panacea or cure all and the root is so-called "ginseng" because of its man-like shape and thought to represent its three characteristics (body, mind and sprit) [2]. The plant has multiple clinical and pharmacological effects related to cancer, stress, obesity, diabetes and cardiovascular disease [3][4][5][6]. The major pharmacologically active compounds of ginseng are ginsenosides belonging to dammarane and oleanane type triterpene saponins. Dammarane-type consists of two groups based on their structure, i.e., the Rb group (protopanaxadiols, including Ra1, Ra2, Rb1,Rb2, Rb3, Rc, Rd, Rg3, Rh2 and others) and the Rg group (protopanaxatriols, including Rg1, Rg2, Re, Rf, Rh1 and others), while oleanane-type ginsenoside represents only one saponin, Ro [7]. They have been identified in all parts of the plant, including the root, stem, leaf and flower [8]. However, research for years on its highly valued root has largely been conducted due to its important herbal applications since ancient times.
The large size (~3.2 Gb) and high complexity of the P. ginseng genome that is reportedly tetraploid (2n = 4x = 48) [9] has made it difficult to get a whole genomic sequence for this plant. Some researchers have generated ESTs from untreated and MeJA treated ginseng hairy roots by the traditional Sanger sequencing method to detect genes involved in ginsenosides biosynthesis [10][11][12]. Over the last few years, the next-generation sequencing (NGS) technology has emerged as a cutting edge approach for high-throughput sequence determination of model and non-model organisms, especially those with large and complex genomes. Besides, this approach has considerably improved our understanding on the complexity of gene expression profiling, discovery, regulation and networks in plants [13,14]. In spite of its clear potential, the NGS technology has been applied in a few studies for the transcriptome analysis of Panax species, including P. ginseng, P. notoginseng and P. quinquefolius [15][16][17], where the technology was almost limited to the Roche/454 pyrosequencing platform to identify mainly ginsenoside biosynthetic genes. More recently, the Illumina/Solexa platform has been introduced in untreated adventitious root transcriptome analysis of two P. ginseng cultivars namely Chunpoong (CP) and Cheongsun (CS) for their gene expression profiling and ginsenoside biosynthetic genes identification [14]. However, gene discovery and candidate genes involved in ginsenosides biosynthesis and their transport in ginseng species are still limited.
In recent years, plant genetic engineering research has turned towards modulation of the secondary metabolite biosynthesis pathway, and biotic and abiotic stress and defense-related genes [18,19]. Secondary metabolites biosynthesis and their regulation is a multilayer network requiring common signal pathways and a variety of key elicitors such as MeJA or salicylic acid [20]. Hence, Jasmonic acid and its derivatives have received much attention as key molecules, particularly in regulating the secondary metabolism in many plant species. Increasing evidence shows that MeJA significantly increases the production of many highly valuable secondary metabolites of pharmaceutical and industrial importance [21]. Various secondary metabolites including ginsenosides synthesized by MeJA in medicinal plants have pharmacological effects in human health and play important roles in the adaptation of plants to a particular environment [10]. Among various elicitors, MeJA has recently been confirmed as an effective elicitor for the induction of ginsenoside contents in cultured cells and adventitious roots of P. ginseng [22,23]. In addition, MeJA significantly up-regulates PDR gene expression related to various secondary metabolites including ginsenosides accumulation and their transport in cells [24,25]. In this study, we adopted the Illumina/Solexa sequencing technology for sequencing and analyzing the MeJA treated adventitious root transcriptome of P. ginseng and a total of 71,095 all-unigenes were generated from Pg-Con and Pg-MeJA. After annotation, P. ginseng was shown as genetically more related to Vitis vinifera. Furthermore, we have conducted the identification of several candidate genes encoding almost all the enzymes associated with ginsenoside biosynthesis via the mevalonic acid (MVA) pathway. We have also discovered a number of PDR genes related to the transport of ginsenosides in cells. To the best of our knowledge, this is the first MeJA elicited NGS transcriptome sequencing and analysis of P. ginseng adventitious roots that will supply very useful resources for further functional research on this plant and its closely related species.

Figure 1.
Comparison of unigene length between hit and no-hit unigenes in Nr database.
In the Nr database, most of the matched sequences (85.44%) had E-values between 0 and 1 × 10 −15 and the rest 14.56% had E-values from 1 × 10 −15 to 1 × 10 −5 ( Figure 2A). Likewise, the E-value distribution in the Swiss-prot database revealed that the majority of matched sequences (78.84%) had E-values from 0 to 1 × 10 −15 and the remaining 21.16% had the E-values between 1 × 10 −15 and 1 × 10 −5 ( Figure 2B). The translated amino acid sequences of unigenes showed a high similarity to the sequences from the Nr database. Nearly 94.09% of the Blastx hits were in a similarity range between 40% and 100% and only 5.91% of hits had similarity values less than 40% ( Figure 2C). The similarity distribution in the Swiss-prot was also more similar to that of the Nr database. Similarities between 40% and 100% were found for 93.13% of query sequences against Swiss-prot, whereas only 6.87% had lower homologies with <40% identity ( Figure 2D). Homologies among different species are illustrated in Figure 2E, where out of the matched 54,920 unigenes in Nr, 22,083 (40.21%) were matched to V. vinifera followed by Lycopersicon esculentum (8226; 14.98%) and Amygdalus persica (5161; 9.40%). Thus P. ginseng was genetically more related to V. vinifera.

GO (Gene Ontology) Functional Classification
GO assignments were used to classify the predicted functions of P. ginseng adventitious root genes. In our work, the Blast2GO program [28] was used first to obtain GO annotation of assembled unigenes annotated by Nr and then the WEGO software [29] was utilized to perform GO functional classification of the unigenes. In total, 43,051 unigenes with Blast matches to known proteins were assigned to the three GO ontologies that were further sub-divided into 46 categories (Table S1). As shown in Figure 3, assignments to the cellular component ranked the highest (34,946; 81.17%), followed by biological process (33,708; 78%) and molecular function (32,970; 76.58%). Under biological process, cellular process (64.44%) and metabolic process (61.34%) were prominently represented indicating that some important cellular processes and metabolic activities occurred in P. ginseng adventitious roots. Under the category of cellular components, cell (76.67%), cell part (76.67%) and organelle (61.10%) represented the majorities of the category. Catalytic activity (50.70%) and binding (50.34%) were the most abundant among various molecular functions. However, we found only a small number of genes in the categories of cell killing (1 unigene), channel regulator activity (1 unigene), protein tag (3 unigenes) and metallochaperone activity (10 unigenes).

COG (Clusters of Orthologous Groups) Functional Classification
In our study, all annotated uningenes were aligned to the COG database to further assess the completeness of transcriptome library and the effectiveness of annotation process. Out of the 71,095 all-unigenes, 19,986 were assigned to 25 COG categories ( Figure 4, Table S2), where "general function prediction only" category represented the largest group (6308; 31.56%) followed by "transcription" (3459; 17.31%), "replication, recombination and repair" (3275; 16.39%) and "posttranslational modification, protein turnover, chaperones" (2672; 13.37%). The following categories: "extracellular structures" (22 unigenes) and "nuclear structure" (43 unigenes) represented the smallest groups. In addition, 1029 unigenes were assigned to "secondary metabolites biosynthesis, transport and catabolism" suggesting that these important processes might take place in P. ginseng adventitious roots.

KEGG (Kyoto Encyclopedia of Genes and Genomes) Pathway Annotation
To identify the active biological pathways and for further understanding the biological functions and interactions of genes, 71,095 all-unigenes were analyzed to the KEGG pathway database. In total, 32,200 unigenes having significant matches in the database were assigned to 5 categories (level 1), 19 sub-categories (level 2) and 128 pathways (level 3) (Table S3). In level 3, "metabolic pathways" had the largest number of unigenes (7137; 22.16%) followed by "biosynthesis of secondary metabolites" (3419; 10.62%). The "metabolism" category under level 2 contained 11 sub-categories, of which the large four are "global map", "carbohydrate metabolism", "lipid metabolism" and "amino acid metabolism" ( Figure 5A). In the "biosynthesis of other secondary metabolites" sub-category, 1280 uningenes were classified into 13 pathways and most of them were mapped to "phenylpropanoid biosynthesis", "flavonoid biosynthesis", "stilbenoid, diarylheptanoid and gengerol biosynthesis" and "flavone and flavonol biosynthesis" ( Figure 5B). Many enzymes mapped to unigenes in KEGG pathways indicate that the active metabolic processes were underway in P. ginseng adventitious roots and a variety of metabolites were synthesized as well.   (Table S4).

Identification of Candidate Genes Involved in Ginsenoside Biosynthesis
MeJA functions as an effective elicitor to boost the production of many secondary metabolites. Recent research shows that MeJA markedly promotes saikosaponin production in adventitious roots of Bupleurum falcatum [30] and regulates the phenolics, terpenes and alkaloids accumulation in plant cell suspension cultures [31]. Treatment of P. ginseng hairy roots and cultured cells by MeJA has been shown to increase ginsenoside content [22,23]. Increasing reports reveal that MeJA can up-regulate metabolite-related enzyme genes as well [32]. Supporting these premises, MeJA elicited adventitious roots of P. ginseng were used to screen genes related to ginsenoside biosynthesis. More recent evidence shows that both the MVA and the non-MVA (also known as MEP or DOXP) pathways are involved in ginsenoside biosynthesis. The synthesis is performed in the cytosol from acetyl-CoA via the MVA pathway and from pyruvate and glyceraldehyde-3-phosphate in the plastid via the non-MVA pathway [33]. It is noteworthy that a total of 749 unigenes (Table S5) in our Solexa/Illumina dataset encoding almost all the known enzymes were found to be involved in ginsenoside biosynthesis via the MVA pathway. However, we did not find any enzyme gene participating in the non-MVA pathway. The identified enzymes from our dataset involved from acetyl-CoA to ginsenoside biosynthesis are shown in three steps (Table 2, Figure 7). Table 2. Differentially expressed unigenes encoding enzymes in the ginsenoside biosynthesis pathway.

Number of Unigenes Up-Regulated Down-Regulated Not DEGs
Step I Step I: Genes encoding all the ginsenoside biosynthetic enzymes involved in this step were found in our study including acetyl-CoA acetyltransferase (AACT), HMG-CoA synthase (HMGS), HMG-CoA reductase (HMGR), mevalonate kinase (MVK), phosphomevalonate kinase (PMK), mevalonate diphosphate decarboxylase (MDD) and isopentenyl diphosphate isomerase (IDI). Here, MVK catalyzing the production of mevalonate phosphate ranked the highest (20 unigenes) and most of them (14 unigenes) were found to be up-regulated. However, no MVK gene was previously detected in untreated 11-year-old P. ginseng [34] and 4-year-old American ginseng root [35], with the exception of only 3 candidate genes in 4-year-old untreated root of P. ginseng [15]. HMGS involved in this step was not previously discovered from 4-year-old of P. notoginseng [16] and P. quinquefolius [35] untreated root, while we obtained 9 candidates from our work. The difference of the above results may be due to the MeJA induction or the type of organs, or species. IDI catalyzes the reversible conversion of isopentenyl diphosphate (IPP) and its isomer dimethylallyl diphosphate (DMAPP) that are the universal precursors for isoprenoids biosynthesis including ginsenosides in all organisms [36]. So far, one IDI candidate from the 454-EST dataset of 4-year-old untreated P. ginseng root was identified [15]. However, no IDI was discovered from MeJA-treated hairy roots of 4-year-old P. ginseng using traditional Sanger EST analyses [10]. But two unigenes encoding IDI were found from our dataset that may indicate the high coverage of Illumina/Solexa sequencing platform. Step II: We identified several candidate genes encoding geranyl diphosphate synthase (GPS), farnesyl diphosphate synthase (FPS), squalene synthase (SS) and squalene epoxidase (SE) involved from IPP to 2,3-oxidosqualene. In a previous study, two unigenes for GPS in 4-year-old P. ginseng root were detected [15], while no GPS gene for this enzyme was found in 11-year-old P. ginseng [34]. Here, 24 GPS unigenes were discovered including eight up-regulated and 16 down-regulated candidates. In a study, the expression of the FPS was shown to be up-regulated by MeJA in P. ginseng hairy roots [37]. We found one FPS unigene from our dataset that was also up-regulated. FPP is converted to squalene under the action of SS. This is the first enzymatic reaction from the central isoprenoid pathway towards sterol and triterpene biosynthesis [11] and may be a potential point for their biosynthesis regulation. Recent studies showed that only one unigene encoding SS was found in both 4-and 11-year old untreated P. ginseng root [15,34]. In another case, two up-regulated SS candidates were discovered from MeJA-treated hairy roots based on Sanger EST analyses [10]. It is noteworthy that 10 SS candidates (Table S6) encoding SS enzymes for squalene synthesis were found in this study and all of them were up-regulated, indicating that squalene production was very active in our 4-year MeJA-treated P. ginseng adventitious roots. Phylogenetic analysis revealed that some SS genes closely formed one branch in the tree, even though they were derived from different plant species (Figure 8). In particular, two contigs (CL4699.contig1 and CL4699.contig3) out of 10 belonging to cluster CL4699 showed close similarity to the P. ginseng candidate SS1 up-regulated by MeJA [37] indicating that these contigs were likely to be involved in ginsenoside biosynthesis. SE converts squalene to 2,3-oxidosqualene from which phytosterol and triterpene in plants are synthesized. The number of unigenes for SE was larger in our work than that of previously identified by the 454 pyrosequencing and Sanger EST analyses in untreated and MeJA-treated P. ginseng root respectively [15,34]. These results may represent the immense capacity of both MeJA and Illumina/Solexa sequencing technology. Step III: In P. ginseng, dammarenediol synthase (DDS), β-amyrin synthase (β-AS), cycloartenol synthase (CAS), lanosterol synthase (LS) and lupeol synthase (LUS) belonging to oxidosqualene cyclase (OSC) family is situated at the branching point of ginsenoside and phytosterol biosynthesis. Their biosynthesis is achieved by three reaction steps namely cyclization, hydroxylation and glycosidation [38]. In cyclization, 2,3-oxidosqualene is converted to cycloartenol by CAS and lanosterol by LS. 2,3-oxidosqualene is also converted to β-amyrin under the action of β-AS and subsequently oleanane-type ginsenoside (Ro) is synthesized. However, the absence of β-AS and CAS both in 4-and 11-year old P. ginseng roots [15,34] may indicate that oleanane-type ginsenoside (Ro) and phytosterol were not actively biosynthesized in their roots. Another branch of this step is the DDS enzyme catalyzing the production of dammarenediol-II from 2,3-oxidosqualene. The genes encoding this enzyme were shown to be up-regulated by MeJA [37,39]. DDS was shown to be absent in 11-year-old P. ginseng root [34] indicating that dammarane-type ginsenosides biosynthesis was not active in its roots. Moreover, many genes were discovered from transcriptome analysis of 4-year-old American ginseng roots. However, no gene responsible for the cyclization of oxidosqualene was found [35]. Interestingly, suppression of CAS caused enhanced ginsenoside content in P. ginseng hairy roots. This phenomenon was related to lower CAS and higher DDS activity, presumably due to increased availability of 2,3-oxidosqualene for DDS to produce dammarenediol-II and subsequently dammarane-type ginsenosides [40]. We have identified one candidate encoding DDS induced by MeJA induction. However, out of a total 17 CAS and 2 LS unigenes found in our dataset, 16 candidates encoding CAS and all of the LS unigenes were down-regulated indicating that lower CAS and LS activity may supply more 2,3-oxidosqualene for DDS and β-AS leading to the increased final ginsenoside levels and reduced total phytosterol contents. However, we did not find any LUS functioning for lupeol synthesis indicating that its absence may increase the final ginsenoside levels as well.
In hydroxylation, two reaction steps are catalyzed by two cytochrome P450 (P450) enzymes: the conversion from dammarenediol-II to protopanaxadiol by one P450 and the conversion from protopanaxadiol to protopanaxatriol by another P450 [41,42]. One hundred and thirty-three P450s and 25 P450s candidates were identified previously in a cDNA library generated from 4-year-old MeJA-induced and 11-year-old untreated hairy roots of P. ginseng respectively [10,34]. In another case, among 27 P450s, six were up-regulated and eight were down-regulated in response to MeJA [17]. By comparison, we found a total of 335 P450s unigenes in our study; of those 161 were up-regulated and 168 down-regulated and six candidates did not show any differential expression level. This large number of P450 candidates gives a prospective gene resource in the detection of special P450s related to ginsenoside biosynthesis in P. ginseng.
Glycosidation occurs in the last reaction step where different ginsenosides are synthesized by adding one or several monosaccharides to triterpene aglycones by UDP-glycosyltransferase (UGT) [43,44]. 235 UGTs from 11-year-old P. ginseng root transcriptome have previously been identified [34]. Among 27 UGTs discovered in P. quinquefolius, 11 UGTs were up-regulated and three were down-regulated by MeJA [17]. In this study, a total of 116 unigenes encoding UGTs were identified, of which 71 candidates were up-regulated and 45 were down-regulated in response to MeJA. These observations show that MeJA as a signal transducer leads to the regulation of the enzyme genes involved in ginsenoside biosynthesis.

Identification of Candidate Genes Involved in Ginsenoside Transport
Interestingly, MeJA may enhance transporter synthesis to assist metabolite transport in a temporal and spatial manner in plant organs [45]. Moreover, as compared to ginsenoside biosynthesis, little is known about ginsenoside transport and regulation in cells. Therefore, we have also focused on detecting a number of genes related to the transport of secondary metabolites through plant PDR transporters belonging to the ABCG subfamily of ATP-binding cassette (ABC) transporters. Among several full length ABC subfamilies, the best-characterized three subfamilies are PDR, multidrug resistance (MDR) and multidrug resistance-associated protein (MRP) involved in the transport of various molecules across membranes [46,47]. In the present study, many ABC members such as MDR, MRP and PDR genes along with their expression were screened from our transcriptome dataset. Among the unigene hits, a total of 595 unigenes for ABC (from ABCA to ABCG), 229 for MDR, 76 for MRP and 125 for PDR were found (Tables S7-S10). Further evidence demonstrates that PDR transporters take part in exporting a wide range of substrates across membranes in different environmental conditions [46]. In our previous study, three PgPDR genes (PgPDR1, PgPDR2 and PgPDR3) were isolated from P. ginseng, where PgPDR3 was induced by MeJA and its expression was highly related to ginsenoside accumulation. Our findings show that PgPDR3 participates in the export of ginsenosides from the cells [19,25]. Of the identified 125 PDR unigenes in the current study, 70 were up-regulated by MeJA, such as the three PgPDRs (mentioned above). Homology and similarity analysis shown in Figure 9 and Table S6 provided 12 promising PDR candidates out of 125 PDR unigenes. Of those, in particular three unigenes, "CL1490.Contig1", "CL1490.Contig4" and "Unigene12975" showed a much higher degree of similarity to the three PgPDR genes, indicating that they were most likely to be involved in the export of ginsenosides. Likewise, we found three contigs of "CL6349" and six of "CL10813" from our dataset closely related to FvPDR3 of Fragaria vesca subsp. Vesca and VvPDR3 of V. vinifera respectively. With respect to these points of view, our MeJA-treated sequencing results provided a multitude potential genes associated with the transport of a wide range of substrates including ginsenosides and other secondary metabolites.

Discussion
Transcriptome sequencing is one of the most effective tools for gene identification. Understanding the transcriptome analysis is essential to know which genes are expressed at various developmental stages or physiological conditions of a cell. Plant genomics, especially for non-model species, is always hard due to its large and complex genome, high levels of ploidy and large proportions of repeat sequences. Instead of the whole genome sequencing, transcriptome sequencing is a better alternative for rapid and efficient access to genetic information [48]. High-throughput Illumina/Solexa sequencing technology has recently been an efficient, fast, inexpensive and reliable tool for transcriptome characterization and gene discovery in non-model organisms as well. Previously, a number of ESTs were generated from leaves, seeds, rhizomes and roots of P. ginseng by the Sanger sequencing technology [11,12,49,50]. However, deep EST sequencing using this traditional method is highly expensive, labor intensive and time-consuming. Therefore, with the development of large scale genomics, the Sanger method could not meet the needs of the development. However, over the past few years, the NGS technologies have overcome the current limitations of the Sanger method and at the same time, the NGS platform has become the prime approach for high-throughput gene discovery on a genome-wide scale both for model and non-model organisms. These technologies can produce a large number of quality reads with high sampling rates of cDNA libraries providing a complete view of transcriptomes. Thus, plant genomics in recent years has developed rapidly with the application of various NGS technologies with respect to costs, efficiency and speed compared with that of the traditional Sanger method. Hence, over the past several years, these technologies have made a revolution in genomics and provided a rapid and economic way to sequence extremely large amounts of genetic material [51,52]. Among NGS technologies, the Illumina/Solexa platform has been one of the most useful and widely used technologies due to its fast, inexpensive, accurate, efficient and reliable transcriptome characterization, gene discovery and various functional studies in many organisms [53]. Using this technology, comprehensive sequencing resources have been successfully constructed from non-model plants including P. ginseng [14], Lentinula edodes [54] and Auricularia polytricha [55]. However, transcriptome assembly was highly challenging including mis-assembled contigs [56]. Recently, "Trinity", [26] a novel assembly method has been introduced most notably through an expand capability for transcriptome de novo assembly using relatively short reads. The results from this study suggested that short reads from Illumina/Solexa sequencing can effectively be assembled through "Trinity" giving the advantage of a high assurance and reliable assembly of putative unigenes, including potential and valuable splice variant information. Therefore, de novo sequencing and assembly of transcriptomes or genomes has effectively been utilized for model [57,58] and non-model organisms [59][60][61]. Figure 9. Phylogenetic analysis of 12 potential PDR unigenes of our dataset and other plant PDRs. Among two kinds of unigenes, some having 70% similarity among them, are under one cluster (CL) initiated with "CL" along with the same number such as "CL1490", followed by different number of contigs, whereas others start directly with "Unigene" such as "Unigene12975".  indicates the unigenes from our current study. The box in red outline indicates one group of genes that are more similar with each other.
Despite the commercial and medicinal values of P. ginseng, NGS-based MeJA-elicited transcriptome information on this species in public databases was scarce prior to our study. In our work, Illumina/Solexa sequencing technology was applied to perform this work. Compared to the Sanger method, this technology provides a large amount of genetic resources in a fast, efficient and economic way. Through this program, we obtained about 55 millions raw reads separately from Pg-Con and Pg-MeJA and subsequently 53 millions clean reads from each sample. Finally, all of these clean reads assembled into 71,095 all-unigenes were annotated with five above mentioned main databases (Table S11). Furthermore, the Trinity assembly algorithm designed specifically for assembling of putative transcripts has also given us biologically significant results of high confidence. More recent works obtained similar advantages to the use of the Illumina HiSeq™2000 platform in the assembly of a transcriptome for untreated P. ginseng adventitious roots and Lentinula edodes C91-3 respectively [14,54].
More importantly, stimulation of ginsenoside synthesis by MeJA is performed by the up-regulation of genes including SS, SE, and DDS involved in ginsenoside biosynthesis [37,62]. Although many candidate genes related to ginsenoside biosynthesis have previously been identified both in untreated and MeJA treated P. ginseng, several genes encoding ginsenoside biosynthetic enzymes such as MVK, GPS, β-AS and DDS were not found [10,34]. For example, DDS thought to be a rate-limiting and the first enzyme in the biosynthetic pathway of dammarane-type saponins was absent in the 454 EST dataset of untreated P. ginseng roots [34]. Some recent works have discovered only one unigene from 4-year-old untreated P. ginseng [15] and three unigenes from American ginseng roots [17] based on the 454 pyrosequencing method. In another case, only two up-regulated SS candidates were found from the same year-old MeJA-treated P. ginseng hairy roots via the traditional Sanger EST analyses [10]. However, we have obtained 10 up-regulated SS unigenes from our dataset. Moreover, nearly all types of major ginsenoside biosynthetic enzyme genes were discovered in the transcriptome of our MeJA-treated P. ginseng adventitious roots (mentioned above). Based on comparison, the findings indicate that the information of this study is relatively comprehensive and highlights the immense capacity of both MeJA and Illumina/Solexa technology. On the other hand, only a few ABC transporters have been identified in medicinal plants related to ginsenosides and transport of other secondary metabolites. Moreover, several MeJA-induced plant PDRs have been discovered associated with defense reactions against pests, pathogens and herbivores by transporting antimicrobial natural secondary metabolites in cells [63][64][65]. Based on the knowledge that secondary metabolites can be induced by MeJA, the identified 12 potential candidate PDR unigenes from our study through phylogenetic analysis will provide more important functions related to the transport of various substrates including secondary metabolites and their transport in cells. Moreover, the annotated unigenes and associated resources from our work provide valuable information for P. ginseng adventitious roots to investigate specific processes and allow the identification of novel genes related to ginsenosides and other secondary metabolites biosynthesis, their accumulation, as well as transport, that are responsive to MeJA treatment.

Preparation of Plant Materials
Chinese Jilin ginseng cv. Damaya cultivated mainly in the shade is a cold resistant and fast growing variety of P. ginseng with a high content of ginsenosides. Commercially cultivated 4-year-old P. ginseng (Damaya) was collected on the 28 July 2013 from the forest of the Changbai mountain of Fusong county, Baishan, China. The fresh roots were sterilized as described earlier [23], cut into small pieces and then cultured in solid MS media supplemented with 1.0 mg·L −1 2,4-dichlorophenoxy acetic acid (2,4-D) and 0.1 mg·L −1 kinetin (KT) and kept 28 days at 25 °C in dark for callus induction.
Afterwards, the callus was transferred to new solid MS media supplemented with 5.0 mg·L −1 indole-3-butyric acid (IBA) and kept 28 days at 25 °C in the dark for adventitious roots induction. Newly established adventitious roots were moved to liquid MS media supplemented with 5.0 mg·L −1 IBA and maintained on a rotary shaker at 160 rpm and 25 °C in the dark and then sub-cultured once every 2 weeks. MeJA (200 µM, desired concentration) was added to the 2 weeks cultured adventitious roots for 24 h (Pg-MeJA). Equal volumes of solvent (ethanol) was dissolved in the experimental controls and kept in the same condition (Pg-Con).

RNA Extraction and cDNA Library Construction for Sequencing
The total RNA from two samples (Pg-Con and Pg-MeJA) was extracted with E.Z.N.A. ® Plant RNA Kit (Omega Bio-Tek, Doraville, GA, USA) following the manufacturer's procedure and then treated with DNaseI to remove DNA residues. RNA (31.26 µg) at a concentration of 665 ng·μL −1 from Pg-Con and 30.40 µg at a concentration of 351 ng·μL −1 from Pg-MeJA were used for cDNA library preparation. For Illumina/Solexa sequencing, mRNA with poly(A) tail was isolated from total RNA using oligo(dT) magnetic beads. Poly(A) RNA was then purified and fragmented into small pieces by using divalent cations at elevated temperature. Then double-stranded cDNA was synthesized from cleaved mRNA fragments. Short cDNA fragments were purified and resolved with Elution Buffer (10 mM Tris-HCl, pH 8.5) for end reparation and poly(A) addition. After that, the short cDNA fragments were attached with sequencing adapters. For PCR amplification, appropriate fragments were selected as templates based on the agarose gel electrophoresis result. Finally, the cDNA library products were sequenced using Illumina HiSeq™2000 at BGI (http://www.genomics.cn/en/index), Shenzhen, China.

Data Filtering and Sequence Assembly
Prior to assembly, raw reads including adaptors, and unknown or low quality bases were filtered to obtain high-quality transcriptome sequence data. De novo assembly of the short clean reads was carried out by assembling program "Trinity" (http://trinityrnaseq.sourceforge.net) [26]. The resulting sequences from "Trinity" are called unigenes. If multiple samples from the same species were sequenced, unigenes from each sample's assembly can be taken into further process of sequence splicing and redundancy, removing with sequence clustering software to get non-redundant unigenes called all-unigene.

Protein Coding Sequence (CDS) Prediction
Blastx alignment (E-value < 1 × 10 −5 ) between public databases like Nr (http://www.ncbi.nlm.nih.gov), Swiss-prot (http://www.expasy.ch/sprot), KEGG (http://www.genome.jp/kegg) and COG (http://www.ncbi.nlm.nih.gov/cog) were performed and the best aligning results were used to decide sequence direction (5'-3') of unigenes. If the searching results obtained from different databases conflicted with each other, a priority order of Nr, Swiss-prot, KEGG and COG would have been followed while deciding sequence direction of unigenes. When some unigene happened to be unaligned to none of the above mentioned databases, "ESTScan" (www.ch.embnet.org/software /ESTScan.html) [66] was used to predict its coding regions and to decide the sequence direction.

Unigene Annotation and Classification
The annotation of unigenes was carried out through a variety of bioinformatics procedures using Blastx (E-value < 1 × 10 −5 ). With Nr annotation, the Blast2GO program [28] was applied to obtain GO annotation of unigenes according to molecular function, biological processes and cellular component. After obtaining GO annotation for every unigene, WEGO software [29] was used to perform GO functional classification for all the annotated unigenes and to understand the distribution of species' gene functions at the macro level. The COG database was applied to classify orthologous gene products. We therefore aligned unigenes to the COG database to classify and predict the possible functions of the unigenes. The KEGG database helps to analyze the gene product during the metabolism process and gene function in the cellular processes. So, we used the KEGG annotation to assign the pathway annotations of the unigenes.

Conclusions
Recently, with the application of NGS, genomic information of medicinal plants has developed rapidly. Such invaluable information is useful for the development of herbal medicines. However, MeJA-elicited insufficient transcriptomic and genomic data on P. ginseng in the public databases has limited our understanding of the molecular mechanism underlying gene discovery and gene profiling. Biosynthesis of many types of secondary metabolites including ginsenosides and their transport and accumulation are induced by MeJA [67]. Thus, the study was conducted on the MeJA-treated highly valued P. ginseng roots containing ginsenosides. Seventy-one thousand and ninety-five unigenes obtained from Illumina/Solxa sequencing demonstrate an imperative transcriptomic level of information for P. ginseng and will surely accelerate further functional genomics research for related medicinal plants as well. Additionally, from these generated sequences, a number of genes were identified encoding nearly all the known enzymes related to ginsenoside biosynthesis. More importantly, a handful of candidate PDR genes were discovered playing important roles in the transport of secondary metabolites including ginsenosides in medicinal plants. Furthermore, our results also represent an imperative capacity of Illumina/Solexa sequencing as a fast, reliable, less experimental complexity and cost-effective approach for transcriptome characterization and gene discovery in non-model organisms, especially those with complex and large genomes such as P. ginseng.