Representational Difference Analysis of Transcripts Involved in Jervine Biosynthesis

Veratrum-type steroidal alkaloids (VSA) are the major bioactive ingredients that strongly determine the pharmacological activities of Veratrum nigrum. Biosynthesis of VSA at the molecular and genetic levels is not well understood. Next-generation sequencing of representational difference analysis (RDA) products after elicitation and precursor feeding was applied to identify candidate genes involved in VSA biosynthesis. A total of 12,048 contigs with a median length of 280 bases were received in three RDA libraries obtained after application of methyl jasmonate, squalene and cholesterol. The comparative analysis of annotated sequences was effective in identifying candidate genes. GABAT2 transaminase and hydroxylases active at C-22, C-26, C-11, and C-16 positions in late stages of jervine biosynthesis were selected. Moreover, genes coding pyrroline-5-carboxylate reductase and enzymes from the short-chain dehydrogenases/reductases family (SDR) associated with the reduction reactions of the VSA biosynthesis process were proposed. The data collected contribute to better understanding of jervine biosynthesis and may accelerate implementation of biotechnological methods of VSA biosynthesis.


Introduction
Veratrum-type steroidal alkaloids (VSA) are valuable bioactive compounds important for the pharmaceutical industry [1]. Exclusive, natural sources of VSA are limited to the Melanthiaceae family and mainly the Veratrum genus [2]. Although, over 40 Veratrum species are distributed all over the Northern Hemisphere, only two species (Veratrum nigrum L and V. album) occur naturally in Europe as a natural source of VSA [3][4][5][6]. V. nigrum (2n = 16) belongs to a critically endangered species and a few isolated populations have been reported in Poland [4,7,8]; thus, understanding of VSA biosynthesis is essential for effective biosynthesis of jervine in heterologous systems.
Diverse VSA can be synthesized by the enzymatic machinery of Veratrum spp. and structural aspects allow us to discern cevanine, veratramine and jervine as the main types of VSA. A common feature of VSA is C-nor-D-homo-ring system and the types are characterized by various linkages between the steroidal and alkaloidal components ( Figure 1) [15]. There is a demand for low cost production of cyclopamine, a derivative of jervine [30], yet natural resources are limited. Chemical synthesis of VSA is not efficient [31,32]. In Veratrum plants, concentration of cyclopamine is generally low (about 0.01%), whereas jervine is more abundant (0.1%) and can be efficiently converted to cyclopamine [33]. The early steps of the jervine biosynthesis pathway leading to cycloartenol and cholesterol have already been described in detail [34][35][36]. Cholesterol is further transformed into solanidine that is the parent compound of VSA; however, corresponding enzymes are generally unknown [30]. Four enzymes that catalyze biosynthesis of cyclopamine [36] have been characterized in V. californicum. Recently, organ-specific transcriptomes of V. nigrum have been compared to identify 73 candidate genes involved in VSA biosynthesis [37]. Veratrum nigrum is threatened with extinction and is not cultivated on an industrial scale, so there is a growing interest in application of biotechnological methods to increase VSA production. Genetic transformation of Veratrum dahuricum has been developed to increase synthesis of cyclopamine, jervine, and veratramine [38]. Advances in synthetic biology and metabolic engineering also enable the production of plant secondary metabolites in microorganisms. Genome and transcriptome sequencing has become a strategy for discovering genes activated in the biosynthesis of high-value secondary metabolites. These genes can be subsequently used for the gradual reconstruction of the plant metabolic pathway in the cells of microorganisms [39][40][41].
Representational difference analysis (RDA) has been developed to identify unique DNA sequences out of two complex and highly related genomes. This method has been adapted for the study of cDNA libraries [42] and for isolating DNA fragments that are differentially methylated [43]. Veratrum nigrum is threatened with extinction and is not cultivated on an industrial scale, so there is a growing interest in application of biotechnological methods to increase VSA production. Genetic transformation of Veratrum dahuricum has been developed to increase synthesis of cyclopamine, jervine, and veratramine [38]. Advances in synthetic biology and metabolic engineering also enable the production of plant secondary metabolites in microorganisms. Genome and transcriptome sequencing has become a strategy for discovering genes activated in the biosynthesis of high-value secondary metabolites. These genes can be subsequently used for the gradual reconstruction of the plant metabolic pathway in the cells of microorganisms [39][40][41].
Representational difference analysis (RDA) has been developed to identify unique DNA sequences out of two complex and highly related genomes. This method has been adapted for the study of cDNA libraries [42] and for isolating DNA fragments that are differentially methylated [43]. The advantage of the RDA method is its capability of isolating rare mRNAs differentially expressed in Life 2020, 10, 88 3 of 20 two cell populations. RDA can detect sequences represented at 0.0001% in the starting mRNA [44]. The reduction of the complexity is achieved by restriction enzyme digestion, ligation to adaptors and PCR amplification. In the subsequent differential hybridization and amplification steps, sequences in the range of 150-1200 bp unique to the 'tester' DNA are enriched [45]. Combination of cDNA RDA with NGS was first used to identify genes involved in diosgenin biosynthesis [46].
Transcriptome sequencing opens the most straightforward way to elucidate biosynthesis of plant specialized metabolites [47]. Substrate feeding and methyl jasmonate signaling to unravel enzymes involved in the biochemical pathway of jervine biosynthesis, with the perspective of designing and optimizing biotechnological process, has been applied. Sequenced libraries enriched with target genes and subtracted background transcripts have been developed in Representational Difference Analysis (RDA).

Plant Material
Seeds of Veratrum nigrum (accession 594) were obtained from Masaryk University (Brno, Czech Republic) sterilized and germinated on MS medium [48]. The plants were maintained in vitro at 25 • C with 16/8 h (day/night) photoperiod for one year, and then sprayed (150 µL) with 10% ethanol solutions of cholesterol (CHL) (0.26 mM), methyl jasmonate (MeJ) (0.44 mM) and squalene (SQ) (0'2 mM). Control samples were treated with 10% solution of ethanol. Three plants per treatment were collected after 9 days, frozen immediately in liquid nitrogen and stored at −80 • C until used in phytochemical and transcriptomic analysis.

Jervine Quantification
Jervine was extracted and quantified by UHPLC-MS/MS as described previously [6]. The one-year-old whole plants of V. nigrum from in vitro cultures were ultrasonicated with chloroform and ammonia hydroxide solution (20:3). The residue after evaporation was resolved in methanol and analyzed by ultra-performance liquid chromatography (UHPLC, Shimadzu) coupled with QTRAP 4500 mass spectrometry (AB Sciex) with triple quadrupole. Acetonitrile (solvent A) and water with 0.01% formic acid (solvent B) were used as the mobile phase and separation was carried out using Kinetex XB-C18 LC column (50 mm × 2.1 mm I.D., 1.7 µm). For detecting and measuring jervine content in plant materials, the MRM (multiple reaction monitoring) method was used. The following transitions were analyzed: m/z 425.89/313.10 Da and 425.89/114.0 Da for jervine. Differences in concentration of jervine in fresh weight were analysed by Mann-Whitney U test (with continuity correction) in STATISTICA 12 package (TIBCO Software, Palo Alto, CA, USA).

Representational Difference Analysis and Library Construction
RNA was isolated from the whole plants with GeneMATRIX Universal RNA Purification kit (EURx, Gdańsk, Poland). The total RNA was quantified and qualified on Agilent 2100 BioAnalyzer using RNA 6000 Nano Kit (Agilent Technologies, Santa Clara, CA, USA). High quality samples were selected for ProtoScript II First Strand cDNA Synthesis Kit and mRNA Second Strand Synthesis Module (New England Biolabs, Ipswich, MA, USA) to obtain the first and the second strand of cDNA, respectively. Representational difference analysis was performed on cDNA [42] with small modifications. Bulked cDNAs from three plants characterized with the highest jervine contents, obtained after cholesterol, methyl jasmonate and squalene treatment, were used as factor specific testers. Untreated, control plants with the lowest concentration of jervine were bulked and used as a driver. Three rounds of subtractive hybridization were utilized to obtain respective differential libraries of sizes from 200 to 350 bp for subsequent sequencing. The paired-end cDNA sequencing libraries were prepared starting from 1 ng of cDNA per sample using Nextera XT DNA Sample Preparation Kit (Illumina, San Diego, CA, USA) according to manufacturer instructions. The libraries were run (2 × 300 cycles) on the MiSeq

De Novo Assembly and Functional Annotation
Raw reads were cleaned by removal of read-through adapter sequences, low quality sequences and ambiguous nucleotides. Clean reads were de novo assembled (Trinity) in CLC Genomics Workbench 12 (Qiagen, Germantown, MD, USA) at default parameters. Cd-hit-est program was used to reduce the redundant unigenes with cut-off set to 0.9 [49]. Unigenes were annotated against non-redundant proteins (NR), nucleotide (NT), EuKaryotic Orthologous Groups (KOG), and Pfam databases using WebMGA server [50] or CLC Genomics Workbench 12. Kyoto Encyclopedia of Genes and Genomes (KEGG) database and Automatic Annotation Server (KAAS) with bi-directional best hit method associated sequences to the KEGG pathways and orthology (KO) assignment [51]. The gene ontology (GO) annotations were analyzed using the Blast2GO program [52] and visualized with WEGO software [53]. The transcription factor (TF) families were identified in the Plant Transcription Factor Database [54]. The expression levels of unigenes were defined by the FPKM values (fragments per kb per million reads) using CLC Genomics Workbench 12 program. Functionally grouped unigenes in RDA libraries were compared (blastn, blastx) with local databases of reference sequences and respective unigenes obtained from organ-specific libraries of Veratrum nigrum [37] at the E-value threshold set to 0.05. The contigs in RDA pools were further assembled in Geneious v 8.1.9 in pairwise combinations. Contigs that were not reassembled between two compared pools were treated as unique and specific to RDA library. Sets of contigs were compared at different levels to obtain data for the Venn charts.

Jervine Concentration
Veratrum nigrum plants were fed with squalene and cholesterol as precursors in order to overexpress genes involved in jervine biosynthesis. Alternatively, production of secondary metabolites was enhanced by the treatment with methyl jasmonate as elicitor. Average jervine concentration in CHLand MeJ-treated plants was over four-fold higher than in control plants ( Figure 2). The content of jervine in individual plants sprayed with cholesterol varied up to 543.62 ng g −1 . Jervine biosynthesis was not significantly affected by squalene treatment. The results provide evidence that cholesterol and methyl jasmonate efficiently stimulate jervine accumulation in Veratrum plants. Methyl jasmonate has been reported to elicit the production of secondary metabolites in plants, including steroidal alkaloids in Solanum. Exposition of transgenic tomato to 100 µM MeJ vapor for 4 days resulted in two-fold increase of α-tomatine level relative to control [55]. Furthermore, the positive effect of cholesterol feeding on the synthesis of steroidal alkaloids was also found in S. lyratum cells. Application of cholesterol showed a direct dose effect (0.05, 0.5, and 20 mg L −1 ) on both solanidine and solasodine synthesis [56].

Sequence De Novo Assembly
cDNA libraries prepared after three rounds of subtraction were sequenced in the Illumina technology. A total of 1966, 1849 and 3047 thousand raw reads corresponding to 203, 207 and 342 million bp were generated for CHL-RDA, MeJ-RDA and SQ-RDA libraries, respectively. Cleaning and filtering stages attributed to the selection of 1905, 1810 and 2981 thousand reads for CHL, MeJ and SQ treatments, respectively. A total of 12,048 contigs with a median length of 280 bases were obtained by de novo assembly of clean reads (Table 1).   The number of unique transcripts was similar in CHL and MeJ libraries and accounted for 3214 and 3232 unigenes, respectively. The largest amount of unigenes (40%) was obtained for subtracted library after squalene treatment, and 94 (0.8%) contigs were common for the three differential products. Furthermore, the samples after squalene and methyl jasmonate treatment had the highest number of common transcripts (321) compared to other pairs ( Figure 3).
Clustering of transcripts to remove redundancy is a very sensitive step of sequence de novo assembly. Nucleotide identity between allelic variants of CYP genes from Veratrum californicum [36] was above 99%, whereas identity of paralogous genes CYP94N1 vs. CYP94N2, CYP90G1 vs. CYP90G2 and GABAT1 vs. GABAT2 was 95%, 97.5%, and 78.5%, respectively. The presence of extensive duplications in cytochrome P450s and transporter families provide additional reason for an increase of similarity threshold at clustering sage. In this view, the default level of 90% similarity seems to be overly relaxed and may lead to elimination of paralogues and allelic variants. In case of cDNA RDA approach, redundancy reduction is mostly addressed during library preparation and it can be assumed that at least a fraction of highly homologous DNA fragments derived from paralogs have been deleted. Both reduced threshold level at clustering stage and properties of the RDA library preparation method may contribute to underestimating of gene family diversity.
Life 2020, 10, 88 6 of 20 an increase of similarity threshold at clustering sage. In this view, the default level of 90% similarity seems to be overly relaxed and may lead to elimination of paralogues and allelic variants. In case of cDNA RDA approach, redundancy reduction is mostly addressed during library preparation and it can be assumed that at least a fraction of highly homologous DNA fragments derived from paralogs have been deleted. Both reduced threshold level at clustering stage and properties of the RDA library preparation method may contribute to underestimating of gene family diversity.

Functional Annotation
The contigs were reduced to 11,891 unigenes and annotated at NT, NR, KOG, Pfam, KEGG and GO databases ( Table 2). From 95% to 85% of unigenes representing the three RDA products showed significant similarity to sequences in NT and NR databases. Furthermore, a total of 5562 unigenes showed similarity to proteins in the KOG database. The distribution of the transcripts in 25 KOG categories was similar for all three cDNA-RDA products. The highest number of unigenes in CHL, MeJ and SQ subtractive libraries belonged to the cluster of the posttranslational modification, protein turnover, and chaperones (259, 261 and 313, respectively). 210 unique transcripts were assigned to the cluster of secondary metabolites biosynthesis, transport and catabolism including 16, 24 and 21 unigenes corresponding to cytochrome P450 family in respective libraries.

Functional Annotation
The contigs were reduced to 11,891 unigenes and annotated at NT, NR, KOG, Pfam, KEGG and GO databases ( Table 2). From 95% to 85% of unigenes representing the three RDA products showed significant similarity to sequences in NT and NR databases. Furthermore, a total of 5562 unigenes showed similarity to proteins in the KOG database. The distribution of the transcripts in 25 KOG categories was similar for all three cDNA-RDA products. The highest number of unigenes in CHL, MeJ and SQ subtractive libraries belonged to the cluster of the posttranslational modification, protein turnover, and chaperones (259, 261 and 313, respectively). 210 unique transcripts were assigned to the cluster of secondary metabolites biosynthesis, transport and catabolism including 16, 24 and 21 unigenes corresponding to cytochrome P450 family in respective libraries. Unigenes of V. nigrum were most frequently (30%) similar to sequences of Medicago truncatula. Gene ontology (GO) revealed annotations for 60%, 72% and 70% of unique transcripts in CHL-RDA, MeJ-RDA and SQ-RDA libraries. These sequences were categorized into biological processes, cellular components, and molecular function (35%, 25% and 40%, respectively). Within the above categories, the most abundant subcategories corresponded to metabolic process (GO: 0008152), cell (GO: 0005623) and catalytic activity (GO: 0003824) ( Figure 4). Moreover, detailed analysis of unigenes in the metabolic process GO subcategory, revealed sequences (595, 710 and 884 from CHL-RDA, MeJ-RDA and SQ-RDA libraries) assigned to organonitrogen compound metabolic process that includes biosynthesis of steroidal alkaloids. Unigenes of V. nigrum were most frequently (30%) similar to sequences of Medicago truncatula. Gene ontology (GO) revealed annotations for 60%, 72% and 70% of unique transcripts in CHL-RDA, MeJ-RDA and SQ-RDA libraries. These sequences were categorized into biological processes, cellular components, and molecular function (35%, 25% and 40%, respectively). Within the above categories, the most abundant subcategories corresponded to metabolic process (GO: 0008152), cell (GO: 0005623) and catalytic activity (GO: 0003824) ( Figure 4). Moreover, detailed analysis of unigenes in the metabolic process GO subcategory, revealed sequences (595, 710 and 884 from CHL-RDA, MeJ-RDA and SQ-RDA libraries) assigned to organonitrogen compound metabolic process that includes biosynthesis of steroidal alkaloids. In the molecular function GO category, a total of 225 unigenes coded transcription factors (TF, GO term: 0003700; DNA-binding transcription factor activity). Most of these unigenes (179) were further separated into 26 TF families at Plant Transcription Factor Database [54]. The most In the molecular function GO category, a total of 225 unigenes coded transcription factors (TF, GO term: 0003700; DNA-binding transcription factor activity). Most of these unigenes (179) were further separated into 26 TF families at Plant Transcription Factor Database [54]. The most frequently represented TF belonged to basic leucine-zipper (bZIP) superfamily (16%), WRKY family (12%), basic helix-loop-helix (bHLH) (11%), MYB proteins (11%) and ERF superfamilies (9%). ( Figure 5). All these families were reported to play an important role in the biosynthesis of sterol backbone, terpenoids and alkaloids. The WRKY transcription factors regulate biosynthesis of alkaloids (e.g., terpene indole alkaloids) and terpenoids (for example sesquiterpenes and diterpenes) [57][58][59][60]. Similar activities were observed for MYC2 and bHLH family proteins [61,62]. Furthermore, the bHLH transcription factors TSAR1 and TSAR2 regulate saponin biosynthesis in Medicago truncatula [63]. The MYB transcription factors regulate the flux in various branches of terpene metabolism. For example, the sweet basil data showed that MsMYB can affect both monoterpene and sesquiterpene synthesis [64]. Finally, the AP2/ERF transcription factors regulate the biosynthesis of steroidal alkaloids in the Solanaceae family. The GAME9 from ERF superfamily controls several upstream mevalonate and cholesterol precursor pathway genes in potato and tomato [65]. Identification of transcription factors effective in stimulation of selected biosynthetic pathways facilitates overexpression of these TFs to increase concentration of target compound [66].
backbone, terpenoids and alkaloids. The WRKY transcription factors regulate biosynthesis of alkaloids (e.g., terpene indole alkaloids) and terpenoids (for example sesquiterpenes and diterpenes) [57][58][59][60]. Similar activities were observed for MYC2 and bHLH family proteins [61,62]. Furthermore, the bHLH transcription factors TSAR1 and TSAR2 regulate saponin biosynthesis in Medicago truncatula [63]. The MYB transcription factors regulate the flux in various branches of terpene metabolism. For example, the sweet basil data showed that MsMYB can affect both monoterpene and sesquiterpene synthesis [64]. Finally, the AP2/ERF transcription factors regulate the biosynthesis of steroidal alkaloids in the Solanaceae family. The GAME9 from ERF superfamily controls several upstream mevalonate and cholesterol precursor pathway genes in potato and tomato [65]. Identification of transcription factors effective in stimulation of selected biosynthetic pathways facilitates overexpression of these TFs to increase concentration of target compound [66]. The unigenes from subtractive libraries were annotated at KEGG database to identify the active biological pathways. In total, unique KO identifiers were allocated to 1415, 1438 and 1845 transcripts from CHL-RDA, MeJ-RDA and SQ-RDA, respectively. All the unique transcripts mapped to 277 predicted metabolic KEGG pathways within five main categories of metabolism (43%), genetic information processing (13%), environmental information processing (13%), cellular processes (12%) and organismal systems (19%) ( Table S1). Among these groups, signal transduction, carbohydrate metabolism and amino acid metabolism pathways were most frequently represented (by 1001, 953 and 554 unigenes, respectively). Moreover, 21, 11 and 23 enzymes involved in steroid and terpenoid backbone biosynthesis were identified in CHL-RDA, MeJ-RDA and SQ-RDA libraries.
Based on KEGG database, from 850 to 1156 unigenes representing three differential products matched with six classes of enzymes ( Figure 6). Most sequences coded transferases (34%) and hydrolases (22%). Within the oxidoreductases, sequences corresponding to cytochrome P450 family accounted for 16  The unigenes from subtractive libraries were annotated at KEGG database to identify the active biological pathways. In total, unique KO identifiers were allocated to 1415, 1438 and 1845 transcripts from CHL-RDA, MeJ-RDA and SQ-RDA, respectively. All the unique transcripts mapped to 277 predicted metabolic KEGG pathways within five main categories of metabolism (43%), genetic information processing (13%), environmental information processing (13%), cellular processes (12%) and organismal systems (19%) ( Table S1). Among these groups, signal transduction, carbohydrate metabolism and amino acid metabolism pathways were most frequently represented (by 1001, 953 and 554 unigenes, respectively). Moreover, 21, 11 and 23 enzymes involved in steroid and terpenoid backbone biosynthesis were identified in CHL-RDA, MeJ-RDA and SQ-RDA libraries.
Based on KEGG database, from 850 to 1156 unigenes representing three differential products matched with six classes of enzymes ( Figure 6). Most sequences coded transferases (34%) and hydrolases (22%). Within the oxidoreductases, sequences corresponding to cytochrome P450 family accounted for 16, 15 and 18 in libraries developed after CHL, MeJ and SQ treatment, respectively.

Identification of Genes Involved in Terpenoid Backbone and Cholesterol Biosynthesis
In plants, terpenoid backbone is synthesized from isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP) by cytosolic mevalonate (MVA) and plastidal 2-C-methyl-d-erythrirtol-4-phosphate (MEP) pathways [34]. 21 unigenes which encode all of the enzymes involved in the MVA route have been identified. In most cases these transcripts were overexpressed in subtraction libraries (FPKM values ranged from 4.82 to 401.47) (Table S2). Furthermore, in this pathway, the AACT enzyme was the best represented by 7 variants. Unlike the MVA pathway, two enzymes from the MEP route were missing. Transcripts encoding 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase (CMK/IspE) and (E)-4-hydroxy-3-methylbut-2-enyl-diphosphate synthase (HDS/IspG) were not identified. Incomplete data of MEP pathway suggests that the MVA remains the main route in steroid skeletons production. In higher plants grown under normal physiological conditions, most of the carbon atoms in sterol molecules come from mevalonate [67]. In addition, experiments with incorporation of 14C-labeled cholesterol and acetate in the jervine and veratramine structures confirm that the main biosynthesis route of steroidal alkaloids in Veratrum plants begins from acetate converted to cholesterol via the mevalonic acid pathway [1]. Subsequent stages catalyzed by isopentenyl-diphosphate delta-isomerase (IPI), geranyl diphosphate synthase (DPS), farnesyl diphosphate synthase (FPS), squalene synthase (SQS) and squalene monooxygenase (SQE) lead to the 2,3-oxidosqualene chain formation. From one to eight variants of all of the abovementioned enzymes were identified in sequenced differential products with high FPKM values (from 5.25 to 2662.72).
Previous studies indicate that steroidal alkaloids are synthesized via cycloartenol and cholesterol [1]. Two possible cholesterol biosynthesis pathways from cycloartenol have been suggested [68]. These routes differ in the location of delta 24-sterol reductase (DWF1) in transformation. DWF1 can convert desmosterol to cholesterol in the last stage of biosynthesis [69] or reduce cycloartenol already in the initial stages of the pathway [70,71]. According to these studies, conversion of cycloartenol to cholesterol requires at least ten stages catalyzed by enzymes:

Identification of Genes Involved in Terpenoid Backbone and Cholesterol Biosynthesis
In plants, terpenoid backbone is synthesized from isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP) by cytosolic mevalonate (MVA) and plastidal 2-C-methyl-d-erythrirtol-4-phosphate (MEP) pathways [34]. 21 unigenes which encode all of the enzymes involved in the MVA route have been identified.
In most cases these transcripts were overexpressed in subtraction libraries (FPKM values ranged from 4.82 to 401.47) (Table S2). Furthermore, in this pathway, the AACT enzyme was the best represented by 7 variants.
Unlike the MVA pathway, two enzymes from the MEP route were missing. Transcripts encoding 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase (CMK/IspE) and (E)-4-hydroxy-3-methylbut-2-enyl-diphosphate synthase (HDS/IspG) were not identified. Incomplete data of MEP pathway suggests that the MVA remains the main route in steroid skeletons production. In higher plants grown under normal physiological conditions, most of the carbon atoms in sterol molecules come from mevalonate [67]. In addition, experiments with incorporation of 14C-labeled cholesterol and acetate in the jervine and veratramine structures confirm that the main biosynthesis route of steroidal alkaloids in Veratrum plants begins from acetate converted to cholesterol via the mevalonic acid pathway [1]. Subsequent stages catalyzed by isopentenyl-diphosphate delta-isomerase (IPI), geranyl diphosphate synthase (DPS), farnesyl diphosphate synthase (FPS), squalene synthase (SQS) and squalene monooxygenase (SQE) lead to the 2,3-oxidosqualene chain formation. From one to eight variants of all of the abovementioned enzymes were identified in sequenced differential products with high FPKM values (from 5.25 to 2662.72).
In conclusion, a total of 119 transcripts involved in steroidal backbone biosynthesis were found in the three differential RDA products, including 53 unigenes participating in transition from cycloartenol to cholesterol. The highest number of unigenes annotated to cholesterol synthesis (49 transcripts) was observed in the samples treated with squalene. A high range of FPKM values indicated that 93% of transcripts annotated to backbone biosynthesis route were overexpressed in variable level.

Identification of the Genes in Later Stages of Jervine Biosynthesis
At the molecular and biochemical level, information about genes involved in the downstream steps from cholesterol to jervine biosynthesis in V. nigrum plants is limited. Studies with 14 C-labeled cyclopamine indicated its role as a possible precursor in jervine biosynthesis [72]. In Veratrum californicum, four enzymes catalyzing the first six steps of verazine biosynthesis (Figure 7) have been identified and functionally confirmed in Sf9 cells [36]. Verazine, which is the precursor of cyclopamine, can be biosynthesized from cholesterol by a series of hydroxylation/oxidation reactions at C-22 and C-26 and transamination at the C-26 position. The hydroxylation/oxidation of cholesterol is catalyzed by three specific cytochromes P450 (CYP90B27, CYP94N1 and CYP90G1), and introduction of amine group is mediated by γ-aminobutyrate transaminase (GABAT1).
Although the chemical mechanism of cyclopamine synthesis has been proposed [30], the genes involved in the downstream steps of transition from verazine to jervine remain largely unknown. The conversion of verazine to the Veratrum alkaloids featuring a C-nor-D-homosteroidal skeleton includes many stages. The E-ring of solanidine is formed after hydroxylation at C-16 position and the creation of a piperidine ring. Afterwards, stereo-unspecific oxidase hydroxylates at the C-12 position and 12-hydroxy product is rearranged to the C-nor-D homosteroidal skeleton by Wagner-Meerwein type reaction. Next, husokinidine is formed by reductive opening of the E ring (pyrrolidine ring) and is oxidatively cyclized to form the tetrahydrofuran ring of cyclopamine. Finally, hydroxylation and oxidation at C-11 cyclopamine leads to jervine alkaloid [1,30] (Figure 8).
Life 2020, 10, x FOR PEER REVIEW 13 of 20 CYP734, CYP736). Eight transcripts with Pfam domain characteristic for P450 cytochromes were not classified within these subfamilies. In order to select the best candidate genes, these unigenes were blasted to 25 known reference sequences encoding enzymes responsible for hydroxylation/oxidation reactions at C-22 and C-26 and with other CYP genes with established C-11, C-12 and C-16 hydroxylation/oxidation activity (Table S4). At the first stage, nine reference genes with C-22 hydroxylase activity were compared with RDA transcripts at protein level. The reference genes included two Arabidopsis thaliana splicing variants and sequences with established activity in verazine biosynthesis in closely related V. californicum (Table S4) [36]. The comparative analysis indicated the main candidate with C-22 hydroxylase activity. The candidate with C-22 hydroxylase activity refers to two contigs (1929 and 1930) specific to CHL-RDA libraries similar to potato PGA2 gene (Table 3 and Table S3).
The length of fragments generated by RDA procedure in the study was rather short (about 280 bp), therefore extended sequences of the corresponding genes (Table S5, S7, S9 and S11) expressed in different organs of V. nigrum were explored [37]. The study has shown that contigs 1929 and 1930 correspond to contig_454 overexpressed (FPKM = 169.18) in V. nigrum roots (Table S5). Moreover, two sequences of V. nigrum homologous to CYP90B27 from V. californicum (stalk_contig_678, and root_contig_641) were found in stalk and root specific libraries (Table S5). RDA and organ-specific sequencing data indicate that root_contig_454 codes alternative enzyme with C-22 or C-16 hydroxylation activity in V. nigrum.  At the first stage, nine reference genes with C-22 hydroxylase activity were compared with RDA transcripts at protein level. The reference genes included two Arabidopsis thaliana splicing variants and sequences with established activity in verazine biosynthesis in closely related V. californicum (Table S4) [36]. The comparative analysis indicated the main candidate with C-22 hydroxylase activity. The candidate with C-22 hydroxylase activity refers to two contigs (1929 and 1930) specific to CHL-RDA libraries similar to potato PGA2 gene (Table 3 and Table S3). The length of fragments generated by RDA procedure in the study was rather short (about 280 bp), therefore extended sequences of the corresponding genes (Tables S5, S7, S9 and S11) expressed in different organs of V. nigrum were explored [37]. The study has shown that contigs 1929 and 1930  (Table S5). Moreover, two sequences of V. nigrum homologous to CYP90B27 from V. californicum (stalk_contig_678, and root_contig_641) were found in stalk and root specific libraries (Table S5). RDA and organ-specific sequencing data indicate that root_contig_454 codes alternative enzyme with C-22 or C-16 hydroxylation activity in V. nigrum.
RDA procedure failed to indicate candidates involved in C-22 oxidation. Sequences of CHL-RDA_contig_263 and MeJ-RDA_contig_1967 were similar to reference gene CYP90G1 in V. californicum at low stringency, and showed homology to leaf_contig_45093 with putative C-16 hydroxylation activity (Table S3). However, sequences (root_contig_4383, stalk_contig_6773 and root_contig_510) coding variants of CYP90G1 in V. nigrum were identified in organs (Table S5). It seems that the RDA approach may not be effective in case of genes with multiple active variants showing high homology and may result in elimination of target fragments.
The RDA approach resulted in the selection of correct CHL-RDA_contig_3050 with C-26 hydroxylase activity that corresponds to leaf_contig_707 homologous to reference gene CYP94N1 in V. californicum [36]. The second candidate represented by MeJ-RDA_contig_1334 similar to tomato CYP734A7 genes was highly overexpressed in the analyzed RDA libraries (Table 3). Similarly, subtractive libraries retained their best candidate for γ-aminobutyrate transaminase (GABAT2) with established role in verazine biosynthesis [36]. Over 40 unigenes with Pfam domains characteristic for aminotransferases (PF00202, PF00155 and PF00266) were found within transcripts in subtractive libraries of V. nigrum (Table S6). Over 90% of these unigenes were overexpressed (FPKM value from 6.01 to 1604.27). Blast analyses clearly indicated MeJ-RDA_contig_1251 with the highest homology to the reference GABAT2 gene (Table 3). Five highly homologous to GABAT2 sequences were found in organs of V. nigrum (Table S7) including root_contig_544 matching MeJ-RDA_contig_1251.
To reveal the candidate genes encoding enzymes participating in transition from verazine to cyclopamine, information from related species was used. Two C-11 hydroxylases/oxidases CYP87D18 from Siraitia grosvenorii [87] and CYP88D6 from Glycyrrhiza glabra [90] were used as references (Table S4). The results of RDA analysis indicate CHL-RDA_contig_648 (FPKM = 295) similar to CYP88D6 as putative candidate with C-11 activity (Table 3 and Table S3). Introduction of reference CYP716A47 from Panax ginseng, with steroid C-12 hydroxylase activity in ginsenoside biosynthesis pathway [91], was not successful in identifying homologous genes in V. nigrum (Table S4).
Five reference genes coding enzymes with the C-16 oxidation activity in steroidal alkaloid biosynthesis were compared with RDA and organ transcripts (Tables S3-S5). The best candidate with C-16 hydroxylase activity was SQ-RDA_contig_3964 (FPKM = 41) homologous to Arabidopsis thaliana CYP86A2 corresponding to stalk_contig_5333 (Table 3). Additional genes similar to CYP86A2 at protein level were found in organ data. Alternatively, the C-16 oxidation might be catalyzed by substrate specific enzymes from the 2-oxoglutarate-dependent dioxygenase superfamily. Annotation of transcripts from subtractive libraries of V. nigrum resulted in 27 2OGD-encoding unigenes with characteristic Pfam domains (PF03171 and PF14226) corresponding to 21 contigs (Table S8). The comparative analysis to the 16DOX reference gene from Solanum lycopersicum [82] indicates MeJ-RDA_contig_2593 and MeJ-RDA_contig_81 as the best candidates with sterol C-16 oxidation activity, expressed in stalk and roots respectively (Table 3 and Tables S8 and S9).
In the last stages of jervine biosynthesis, two reactions were associated with the reduction of heterocycles containing one nitrogen atom, i.e., reduction of 2,3,4,5-tetrahydropyridine and then reduction of the pyrrolidine ring. Based on 98% amino acid identity to pyrroline-5-carboxylate reductase (EC 1.5.1.2) from Medicago truncatula, SQ-RDA_contig_2664 transcript (FPKM = 18.09) was selected as candidate able to catalyze etioline reduction (Table 3 and Table S10). This enzyme belongs to the oxidoreductases family, especially those which act on the CH-NH group of donors and are capable of using delta-1-piperideine-6-carboxylate as a substrate [92].
The reduction of C = N binding in plants might be also catalyzed by enzymes from short-chain dehydrogenases/reductases family (SDR), even in case of cyclic substrate [93]. The SDR enzyme family is one of the largest among all enzyme families and classifies approximately 46,000 members with differentiated activities, e.g., oxidoreductases, lyases and isomerases [94]. In the cDNA-RDA products, 21 unknown sequences containing the SDR domain were found (PF00106.20). Three unigenes (CHL-RDA_contig_2945, MeJ-RDA_contig_1516 and SQ-RDA_contig_1577) characterized with the highest FPKM values in the range between 3843.51 and 1967.9 (Table S10) may be selected as candidates involved in this process. 87 sequences with SDR domain were identified in organs of V. nigrum and 3 root specific contigs 13569, 13569 and 12724 matched 4 respective RDA products (Tables S10 and S11).

Conclusions
In this study, the next-generation sequencing (NGS) technology was applied to cDNA-RDA products obtained after cholesterol, methyl jasmonate and squalene treatment. cDNA-RDA procedure allowed the reduction of the number of overrepresented transcripts and increased the possibility of identifying low frequency, unique candidate genes involved in VSA biosynthesis, which were overexpressed by elicitation and precursor feeding. Here, a total of 12048 contigs were generated with a median length of 280 for three samples. As a result of the analysis, the majority of candidate genes involved in the unique VSA biosynthesis pathway have been identified. Transcripts encoding enzymes involved in C-22, C-26, C-11 and C-16 hydroxylation, transamination at C-26 and heterocycles reduction have been proposed. No transcript corresponding to C-22 oxidation and C-12 hydroxylation/oxidation was found. Furthermore, enzymes from the short-chain dehydrogenases/reductases family (SDR) associated with the reduction reactions in VSA biosynthesis pathway were identified. Further work is necessary to validate the role of the selected candidate genes in VSA biosynthesis.
Supplementary Materials: The following are available online at http://www.mdpi.com/2075-1729/10/6/88/s1, Table S1: KEGG classification of the unigenes from V. nigrum after cDNA-RDA experiments; Table S2: Enzymes annotated to terpenoid backbone and sterols biosynthetic pathway with FPKM values. Table S3: Annotation of transcripts from subtractive libraries identified as CYP candidates; Table S4: Reference genes used for selection of candidates catalyzing stages in synthesis of jervine; Table S5: Annotation of transcripts from organs libraries identified as CYP candidates; Table S6: Annotation of transcripts from subtractive libraries identified as potential transaminases; Table S7: Annotation of transcripts from organs libraries identified as potential transaminases; Table S8: Annotation of transcripts from subtractive libraries identified as potential C16-hydroxylation candidates (from 2OGD enzymes family); Table S9: Annotation of transcripts from organs libraries identified as potential C16-hydroxylation candidates (from 2OGD enzymes family); Table S10. Annotation of transcripts associated with reduction of heterocycles from oxidoreductases family (unigene no 1) and from short-chain dehydrogenases/reductases family (SDR) (unigenes no 2-22) in all three differential products; Table  S11: Characteristics of transcripts coding enzymes from short-chain dehydrogenases/reductases family (SDR) in all V. nigrum organs. Funding: This work was supported by EU Regional Operational Program "The use of molecular and proteomic tools for identification of genes and enzymes with potential in biotechnology" [UDA-RPPK.01.03.00-18-018/13-0].