Comparative Transcriptome Analysis Identifies Genes Putatively Involved in 20-Hydroxyecdysone Biosynthesis in Cyanotis arachnoidea

Cyanotis arachnoidea contains a rich array of phytoecdysteroids, including 20-hydroxyecdysone (20E), which displays important agrochemical, medicinal, and pharmacological effects. To date, the biosynthetic pathway of 20E, especially the downstream pathway, remains largely unknown. To identify candidate genes involved in 20E biosynthesis, the comparative transcriptome of C. arachnoidea leaf and root was constructed. In total, 86.5 million clean reads were obtained and assembled into 79,835 unigenes, of which 39,425 unigenes were successfully annotated. The expression levels of 2427 unigenes were up-regualted in roots with a higher accumulation of 20E. Further assignments with Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways identified 49 unigenes referring to the phytoecdysteroid backbone biosynthesis (including 15 mevalonate pathway genes, 15 non-mevalonate pathway genes, and 19 genes for the biosynthesis from farnesyl pyrophosphate to cholesterol). Moreover, higher expression levels of mevalonate pathway genes in roots of C. arachniodea were confirmed by real-time quantitative PCR. Twenty unigenes encoding CYP450s were identified to be new candidate genes for the bioreaction from cholesterol to 20E. In addition, 90 transcription factors highly expressed in the roots and 15,315 unigenes containing 19,158 simple sequence repeats (SSRs) were identified. The transcriptome data of our study provides a valuable resource for the understanding of 20E biosynthesis in C. arachnoidea.


Introduction
Cyanotis arachnoidea C. B. Clarke (Commelinaceae), a Chinese traditional medicine herb, has been use for treatment of limb numbness and rheumatoid arthritis, as well as for promoting blood circulation and muscle relaxing [1]. Rich phytoecdysteroids, including 20-hydroxyecdysone (20E), dihydroxyrubrosterone, rubrosterone, poststerone, and cyanosterone B, were found in this plant [1,2]. 20E was reported to have agricultural applications in enhancing the synchronous development of Bombyx mori, elevating silk yield [3] and reducing the time of the molting cycle of Alpheus heterochelis [4]. 20E and its derivatives exhibited a wide variety of pharmacological effects, including anti-depression, antioxidation, anti-diabetes, and neuron protection [5,6]. 20E is primarily obtained from ecdysteroid-rich plants, such as C. arachnoidea, Ajuga turkestanica, and Serratula wolffi [6]. Dried roots and aerial parts from C. arachnoidea have been reported to contain 20E at 5.50% and 0.52%, respectively [7]. However, due to the limitation of the wild resources for C. arachnoidea, the supply of 20E is in great shortage and its application is always restricted [6].
Originally, the ecdysteroid biosynthetic pathway has been described as a cytosolic pathway, starting from mevalonate (MVA) as a precursor [8]. Subsequently, it has been proven that isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP) required for the triterpenoid synthesis can be produced not only from the MVA pathway in cytosol but also the 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway in plastids [9]. However, the studies to date on phytoecdysteroid biosynthesis are limited to the parts mediated by the MVA pathway [8]. In many plants, ecdysteroids were synthesized from cholesterol [10,11]. However, Adler and Grebenok [12] identified lathosterol as a precursor of 20E in spinach. It is obviously controversial whether cholesterol or lathosterol is the preferred substrate for phytoecdysteriod biosynthesis [8]. Conversion of a sterol to ecdysteroid requires several structural modifications, such as the hydroxylation of the sterol nucleus and side chain [12]. In insects, cytochrome P-450 monooxygenases (CYP450s), which catalyze the bioreaction from cholesterol to 20E, have been cloned and identified. However, only one CYP450 enzyme in plants has been identified to be involved in 20E biosynthesis in the hairy roots of Ajuga reptans [13]. To date, the biosynthesis of 20E, especially the origin of the precursor IPP and its downstream pathway, remains largely unknown.
The accumulation of 20E in C. arachnoidea was different among various tissues. Mu et al. [7] reported that the 20E content in roots was 4-20 times higher than that in leaves. Biosynthesis and accumulation of secondary metabolites are often tissue-specific [14]. Tomás et al. [6] reported that the ecdysteroid content was related to organized structures of whole plants and in vitro propagated plantlets in A. reptans, while calluses from leaves or roots could not produce ecdysteroids [15]. The cultured plantlets of A. reptans produced seven ecdysteroids with the tissue differentiation. Tomas et al. has revealed that ecdysteroid production is root-specific in A. reptans [16]. In recent years, comparative transcriptome analysis has been applied to investigate the biosynthesis of secondary metabolites. Yang et al. compared the transcriptome of leaves and roots from Salvia miltiorrhiza and identified candidate genes involved in the biosynthesis of tanshinones [17]. Biosynthesis of withanolide A, a medicinal component synthesized specifically in roots of Withania somnifrea, was also investigated by comparative transcriptome analysis [18]. Despite the important pharmacological value of 20E, there is little genetic information revealing the biosynthesis of 20E in C. arachnoidea. Thus, as a follow-up to our previous works on the biotechnological production of 20E [19] and cloning of key enzyme genes in 20E biosynthesis [20], we conducted a comparative transcriptome analysis of the leaves and roots of C. arachnoidea to identify genes putatively involved in 20E biosynthesis. As we know, this study reported the de novo sequencing of C. arachnoidea for the first time. Meanwhile, transcription factors (TFs), which have been found to regulate the secondary metabolism in plants, were also searched for in the transcriptome database. Furthermore, we provided valuable information for developing the important molecular marker of simple sequence repeats (SSRs). This study was designed to characterize the transcriptome of C. arachnoidea and provide a valuable basis for further elucidating the biosynthetic pathway of 20E.

Quantification of 20E from C. arachnoidea
As shown in Figure 1, 20E was found in both C. arachnoidea leaves and roots at different levels. The 20E contents were much (i.e., 6-fold) higher in roots (18.0 mg/g DW) than in leaves (3.0 mg/g DW). It has been reported that the ecdysteroid content varies in different organs of the plants [6]. 20E content in roots of C. arachnoidea was reported to be 4-20 times higher than that in leaves [7]. Zhu et al. [21] demonstrated that the 20E content of C. arachnoidea roots was the highest in air-dried whole plants, amounting to 2.9% (w/w). Although 20E production is not root-specific in C. arachnoidea, such spatial distribution of 20E suggests a root preferential expression pattern of biosynthetic pathway genes. Data are expressed as means ± standard deviation of three replicates. Statistical significance in comparison with the corresponding control values is indicated by ** p < 0.01.

Library Sequencing and De Novo Assembly and Annotation
To determine the transcriptome of C. arachnoidea leaves and roots, two cDNA libraries were established and sequenced using a HiSeq™2500 platform (Illumina, San Diego, CA, USA). Finally, 45,368,044 and 41,095,316 clean reads were obtained from leaves and roots, respectively (Table 1). We generated a unique transcript library for C. arachnoidea using both sets of reads and obtained a total of 79,835 unigenes with an average length of 894 bp and a N50 length of 1268 bp. The length distribution of unigenes was presented in Figure S1. Clean reads from leaves and roots were separately mapped to the C. arachnoidea transcript library. Finally, 69,782 and 70,556 unigenes were obtained for the leaves and roots, respectively (Table 1).  Data are expressed as means ± standard deviation of three replicates. Statistical significance in comparison with the corresponding control values is indicated by ** p < 0.01.

Library Sequencing and De Novo Assembly and Annotation
To determine the transcriptome of C. arachnoidea leaves and roots, two cDNA libraries were established and sequenced using a HiSeq™2500 platform (Illumina, San Diego, CA, USA). Finally, 45,368,044 and 41,095,316 clean reads were obtained from leaves and roots, respectively (Table 1). We generated a unique transcript library for C. arachnoidea using both sets of reads and obtained a total of 79,835 unigenes with an average length of 894 bp and a N 50 length of 1268 bp. The length distribution of unigenes was presented in Figure S1. Clean reads from leaves and roots were separately mapped to the C. arachnoidea transcript library. Finally, 69,782 and 70,556 unigenes were obtained for the leaves and roots, respectively (Table 1). For sequence annotation, BLASTX alignment was performed against non-redundant (NR), Swiss-Prot, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. As a result ( Figure S2), 39,057 (48.92%) unigenes received gene descriptions through comparisons with known proteins in the NR database, and 49.55% of annotated unigenes exhibited strong homology (E value < 1.0 × e −60 ). Most of these annotated unigenes (30.23%) were best matched to sequences from Musa acuminata subsp. Malaccensis, followed by Elaeis guineensis (19.74%), and Phoenix doctylifera (16.04%). A sum of 28,791 unigenes (36.06%) was annotated using the Swiss-Prot database. Additionally, 27,111 (33.96%) and 9922 (12.43%) unigenes were annotated using the GO and KEGG databases, respectively. Figure S3a showed the number of unigenes annotated by all the databases.

Different Expression Genes (DEGs) Identified Between Leaves and Roots
The transcript abundance of each unigene was calculated through the fragments per kilobase per million reads (FPKM) method and those with |log 2 FoldChange| ≥ 1, with a p value ≤ 0.05, were considered as DEGs. Finally, a sum of 4811 DEGs was identified in which 2427 unigenes were up-regulated while 2384 unigenes were down-regulated in C. arachnoidea roots (Table S2). In addition, we identified 1180 DEGs which were only expressed in leaves and 1250 DEGs expressed specifically in roots ( Figure S3b).

Identification of Unigenes Involved in Secondary Metabolism
The biological function and pathway assignment of unigenes were conducted through KEGG pathway annotation. In the end, 9922 (12.43%) unigenes were mapped to five categories (i.e., organismal systems, metabolism, genetic information processing, environmental information processing, and cellular processes) and 32 sub-categories ( Figure S4). The higher proportion of unigenes belonged to the pathway of signal transduction, carbohydrate metabolism, and translation. The majority of unigenes were classified into the metabolism category, and the number of unigenes related to different secondary metabolisms was listed in Table 2. Out of all unigenes involved in secondary metabolite pathways, unigenes in "Phenylpropanoid biosynthesis (ko00940)" had the highest proportion, followed by "Stilbenoid, diarylheptanoid, and gingerol biosynthesis (ko00945)", "Terpenoid backbone biosynthesis (ko00900)", and "Flavonoid biosynthesis (ko00941)". Additionally, 59 unigenes were mapped to the "Steroid biosynthesis (ko00100)" pathway ( Figure S5). Among 1070 unigenes related to secondary metabolism, 192 unigenes were differentially expressed in C. arachnoidea leaves and roots.

Identification of Unigenes Related to 20E Biosynthesis
20E is an isoprenoid-derived compound using five-carbon building units of IPP or its possible isomer, DMAPP, which can be synthesized by both the MVA and MEP pathway in plants. Previous reports showed that phytoecdysteroid biosynthesis is limited to parts mediated via the MVA pathway [8]. In the C. arachnoidea transcriptome, unigenes encoding all the known structural enzymes in the upstream pathways up to IPP were found, including 11 unigenes for six enzymes in the MVA pathway, 15 unigenes for seven enzymes in the MEP pathway, and four unigenes for isopentenyl diphosphate isomerase (IDI) (Step 1 in Table 3 and Figure 2). According to the FPKM value, unigenes involved in the MVA pathway mostly showed higher expression levels in C. arachnoidea roots, such as AACT (comp70319_c0_seq8), PMK (comp65552_c0_seq7), and PMD (comp59477_c0_seq1). On the contrary, unigenes involved in the MEP pathway had higher expression level in leaves, such as DXR (comp47417_c0_seq1, comp74054_c0_seq1), MCS (comp30840_c0_seq1), HDS (comp33211_c0_seq1, comp67980_c0_seq1), and HDR (comp73401_c0_seq1). Moreover, the expression profile of unigenes involved in the MVA and MEP pathways were validated through real-time quantitative PCR (RT-qPCR) ( Figure 3). Unigenes with the |relative expression level| ≥ 2 were presented in Figure 3A. The expression level of IDI was up-regulated 3.82-fold in roots. Expression of ACTT was up-regulated in roots by 3.80-fold. Additionally, unigenes encoding HMGR, HMGS, PKM, and PMD involved in the MVA pathway all had relative higher expression levels in C. arachniodea roots than in leaves. Except for DXS, unigenes encoding DXR and HDS involved in the MEP pathway were down-regulated in C. arachnoidea roots ( Figure 3A). Because 20E is highly accumulated in roots, the genes encoding enzymes involved in 20E biosynthesis are expected to show a root-preferential expression pattern. These results implied that the backbone of 20E could be synthesized mainly through the MVA pathway rather than the MEP pathway in roots.
In many plants, ecdysteroids are synthesized from the C27 sterol cholesterol [22]. Additionally, lathosterol was also proved as a precursor for ecdysteroid biosynthesis [12]. The result of the pathway assignment mentioned above showed that 59 unigenes were mapped to the "Steroid biosynthesis (ko00100)" pathway. According to the map showed in Figure S5, the downstream steps of 20E biosynthesis were presented in Step 2 of Figure 2. Unigenes encoding most of the enzymes involved in lathosterol and cholesterol biosynthesis were found in our transcriptome database (Step 2 in Table 3). These unigenes showed higher expression levels in roots compared to leaves. For example, SQLE (comp35298_c0_seq1), ERG24 (comp52043_c0_seq5), DHCR24 (comp29143_c0_seq1, comp48248_c0_seq1), and EBP (comp47084_c0_seq3) all had higher FPKM values in roots. To validate the expression level of these unigenes, RT-qPCR analysis was conducted. As shown in Figure 3B, the expression level of DHCR24 was 673.45-fold higher in roots than in leaves. The expression of EBP and ERG24 was up-regulated in roots by 14.8-and 11.4-fold, respectively. In all, we identified 49 unigenes encoding 24 enzymes involved in 20E biosynthesis in C. arachniodea.   Table 3.   Table 3.  Table 3.  Relative expression level of unigenes in roots was calculated by the 2 −∆∆Ct method with actin as an internal control and was compared to those in leaves, which were all set to be 1. The negative value represents higher expression levels of unigenes in leaves than in roots. The error bar indicated standard deviations of three biological replicates. Asterisks represented significant differences: * p < 0.05 and ** p < 0.01. The full names of the abbreviations (unigenes) are provided in Table 3.

Validation of Differentially Expressed CYP450s
Conversion of a sterol to ecdysteroid requires several structural modifications, such as the hydroxylation of the sterol nucleus and side chain [12]. In insects, the biosynthesis of 20E from cholesterol was catalyzed by CYP450 enzymes [23]. The hydroxylation steps in plants are presumably mediated by CYP450s as they are in insects [12]. Tsukagoshi et al. [13] identified a CYP450 enzyme CYP71D443, which catalyzes the C-22 hydroxylation of 20E in Ajuga hairy roots. There was only one reported CYP450 enzyme to be involved in the 20E biosynthesis of plants so far [13]. In order to identify the possible involvement of CYP450s in 20E biosynthesis, up-regulated CYP450s unigenes in roots were selected from DEGs. As a result, 20 unigenes encoding CYP450s were identified and their up-regulated expression levels in roots were validated using RT-qPCR (Table 4). For example, the expression level of unigene comp36620_c0_seq1 was 1666.05-fold higher in roots than in leaves. The expression level of comp67837_c0_seq4 in roots was 597.69-fold higher than in leaves. Much work needs be done to narrow down these candidate CYP450 genes to reveal the genes responsible for the bioreaction from cholesterol to 20E. Foldchange equals the ratio of S1/S2. S1 means the FPKM value of a unigene in roots while S2 means the FPKM value of a unigene in leaves. RT-qPCR represents the relative expression level of unigenes in roots measured by RT-qPCR compared to those in leaves. Values are reported as mean ± SD from three independent experiments. Asterisks represented significant differences: * p < 0.05 and ** p < 0.01.

Transcription Factors Predicted and Statistics of Simple Sequence Repeats
In our research, TFs were searched for in the C. arachnoidea transcriptome, and a total of 1312 TFs were found (Table 5). Among all the TFs identified, basic helix-loop-helix (bHLH), ethylene response factor (ERF), and C2H2 -type zinc finger family members were of a higher proportion. Furthermore, 90 TFs were up-regulated while 58 TFs were down-regulated in C. arachnoidea roots. Among the up-regulated TFs, MYBranked the highest (15), followed by bHLH (13), and WRKY (10). In plants, TFs have been found to regulate secondary metabolism pathways [14]. For example, an ERF transcription factor named JRE4 stimulates the biosynthesis of steroidal glycoalkaloids (SGAs), cholesterol-derived metabolites, through activating the transcription of SGA biosynthetic genes, such as HMGS, HMGR, IDI, and SQS [24]. Considering the root-preferential accumulation of 20E, these up-regulated TFs were worth further investigating for their function in 20E biosynthesis. The transcript abundance of each unigene was calculated through the FPKM method, and those with |log 2 FoldChange| ≥ 1 with a p value ≤ 0.05 were considered as DEGs. Positive and negative log 2 FoldChange values indicates up-and down-regulated unigenes, respectively.
SSRs are one of the most important molecular markers and have various applications in genetics and plant breeding [25]. Thus, SSRs in C. arachnoidea were analyzed in our present study. Out of 79,835 unigenes, 15,315 unigenes containing 19,158 SSRs were identified (Table 6). Within these SSRs, 1194 SSRs presented in a compound formation. Mononucleotide repeats were the most abundant (14,598), followed by tri-nucleotide repeats (2270), and di-nucleotide repeats (2100). Penta-nucleotide repeats were the least (17). We also checked the SSRs in unigenes involved in 20E biosynthesis. Unigenes comp68996_c0_seq2 (HMGR) and comp63129_c0_seq5 (ERG26) contained compound SSRs. Comp71987_c0_seq5 encoding DXR had a di-nucleotide repeat. Additionally, the eight unigenes encoding AACT, HMGS, DXS, DXR, MCT, HDS, DHCR24, and EBP contained mononucleotide repeats. SSRs at different positions in a gene can help determine the regulation of gene expression and the function of the protein produced [25]. Sharopova [26] reported that genes containing five or more SSRs had the highest average level of expression in Arabidopsis. In rice, amylase content was correlated with a variation in the number of SSRs in the 5'-untranslated region of the waxy gene [27]. These SSRs identified in the C. arachnoidea transcriptome were valuable genetic resources for future studies of this species.

Plant Materials
C. arachnoidea seeds were collected in September 2011 from the suburbs of Luquan County, Yunnan Province of China with its voucher specimen (SCU-110923), identified by C.Y. Liu. They were deposited in the herbarium of Soochow University. Germination and transplantation were performed in the horticultural nursery of Soochow University, Suzhou, China. One month after germination, the seedlings were transplanted into a plastic pot (35 × 25 × 7 cm in length, width, and height, respectively) containing sand and vermiculite in a 1:2 (v/v) ratio. Cultures were maintained in a growth chamber at 25 ± 2 • C under a 14/10-h light/dark cycle photoperiod with white fluorescent light at 1500 lux. Leaf and root tissues of four-month old C. arachnildea were collected separately, frozen in liquid nitrogen, and stored at −80 • C until use. At least three biological replicates were used for subsequent studies, and each replicate contained leaf or root tissues from at least 15 seedlings.

20E Extraction and Analysis
The extraction and quantification of 20E in C. arachnoidea leaves and roots were conducted as described by our previous report [20] with slight modifications. Dry leaves or roots (0.5 g) were ground into a powder and extracted with 30 mL methanol under sonication for 90 min. The extract was then evaporated to dryness and dissolved in 1 mL methanol. High performance liquid chromatography (HPLC) conditions are as follows: an Aglient 1280 HPLC system equipped with 250 × 4.6 mm Aglient HC-C18 column, samples were eluted with 20:80 (v/v) acetonitrile/water at a flow rate 1 mL/min, and monitored at 242 nm. 20E was quantified with a genuine standard (Sigma, St. Louis, CA, USA). Figure 1A presented a typical chromatogram of 20E in C. arachnoidea leaves and roots under the condition.

cDNA Library Construction, Sequencing and Quality Control
For cDNA library construction, total RNA was firstly extracted from leaf or root tissues of four-month old C. arachnildea using a mirVana™ RNA isolation kit (Applied Biosystems, Foster City, CA, USA) and then treated with DNase I for 30 min at 37 • C. RNA integrity and purity was confirmed using the Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA). For each pool, total RNA (10 µg) was prepared for the cDNA library. The mRNA was purified from total RNA using magnetic beads with Oligo (dT). Subsequently, the mRNA was sheared into small fragments for cDNA synthesis. Double-stranded cDNA was synthesized using random hexamers. These cDNA fragments were subjected to an end repair process, the ligation of adapters, and were enriched by PCR to create the final libraries. Paired-end sequencing at 125 bp was performed using a HiSeq™2500 platform (Illumina, San Diego, CA, USA). High-quality reads were obtained by removing adaptor fragments, reads containing more than 5% ambiguous bases, and low-quality reads containing more than 20% of bases with a Q value ≤ 20.

De Novo Assembly and Sequence Annotation
De novo assembly of all clean reads was performed using the Trinity program (version: trinityrnseq_r20131110). The raw RNA-seq data were submitted to NCBI's Gene Expression Omnibus (GEO) repository under accession number SRP144398 (http://www.ncbi.nlm.nih.gov). Further processing to form longer sequences was undertaken using the software TGICL (http://compbio.dfci. harvard.edu/tgi/software/) and, finally, the unigenes were acquired. All unigenes were assigned a putative gene description following BLASTX alignment to the NR (ftp://ftp.ncbi.nih.gov/blast/db), Swiss-Prot (http://www.uniprot.org/downloads), GO (http://www.geneontology.org/), and KEGG (http://www.genome.jp/kegg/pathway.html) databases, with a cut off E value of ≤ 1 × e −5 . To gain an overview of gene pathway networks, all unigenes were mapped to KEGG pathways using the KEGG Automatic Annotation Server. The number of unigenes corresponding to different KEGG pathways was calculated.

Identification of DEGs
Clean reads from C. arachnoidea leaves and roots were separately mapped to the assembled transcripts, and the transcript abundance of each unigene was calculated using the FPKM method. Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml) and eXpress (http://www. rna-seqblog.com/express-a-tool-for-quantification-of-rna-seq-data/) were used for mapping the sequencing reads to calculate the FPKM values. In this work, the significance of gene expression differences was assessed using the |log 2 FoldChange| ≥ 1 and a p value ≤ 0.05.

Real-Time Quantitative PCR Analysis
The expression levels of selected unigenes were analyzed through RT-qPCR. Total RNA of C. arachnoidea leaves and roots were isolated with the RNAprep Pure Plant Kit (Tiangen, Beijing, China). The first cDNA strand was synthesized using the RevertAid First Strand cDNA Synthesis Kit (Thermo Scientific, San Jose, CA, USA) according to the manufacturer's instructions. RT-qPCR was performed using the CFX96™ Real-Time System (Bio-Red, Hercules, CA, USA). The reaction mixture included 2 µL five-fold diluted cDNA template, 1 µL of 10 mM forward primer, 1 µL of 10 mM reverse primer, 10 µL FS Universal SYBR Green Master (Roche, Indianapolis, IN, USA), and 6 µL ddH 2 O. Amplification conditions were 94 • C for 4 min, and then 40 cycles of 94 • C for 1 min, 56 • C for 30 s, and 72 • C for 15 s. All the primers used were listed in Table S1.

Identification of Simple Sequence Repeats
The identification of SSRs in the C. arachnoidea transcriptome was conducted using a microsatellite program (MISA) (http://pgrc.ipk-gatersleben.de/misa/). SSRs from the mononucleotides to the hexa-nucleotides were searched for in all unigenes. Both perfect and compound repeats were identified.

Statistical Analysis
To examine significant differences statistically between the means of two groups, we used Microsoft Excel software to conduct Student's t-test. Values are reported as the mean ± SD from three independent experiments. Asterisks represented significant differences: * p < 0.05 and ** p < 0.01.

Conclusions
Our research established the transcriptome database of the important medicinal plant, C. arachniodea, for the first time. In total, 79,835 unigenes were assembled and 39,425 unigenes were successfully annotated. The comparative analysis of the C. arachniodea leaf and root transcriptome demonstrated that the expression levels of 2427 unigenes were up-regulated in roots with a higher accumulation of 20E. Forty-nine unigenes encoding enzymes in the MVA and MEP pathways and in the downstream of phytoecdysteroid backbone biosynthesis were identified. The higher expression levels of MVA pathway genes in roots of C. arachniodea implied that 20E biosynthesis could be mainly mediated by the MVA pathway. Moreover, twenty unigenes encoding CYP450s were identified to be the new candidate genes for the bioreaction from cholesterol to 20E. In addition, 90 TFs up-regulated in roots and molecular marker SSRs were analyzed for further research on gene regulation and plant breeding. Our work will be helpful in understanding phytoecdysteroid biosynthesis and providing important genetic information on C. arachniodea.