Identification of Putative Candidate Genes from Galphimia spp. Encoding Enzymes of the Galphimines Triterpenoids Synthesis Pathway with Anxiolytic and Sedative Effects

Galphimia spp. is popularly used in Mexican traditional medicine. Some populations of Galphimia exert anxiolytic and sedative effects due to the presence of the modified triterpenoids galphimines. However, the galphimine synthesis pathway has not yet been elucidated. Hence, in this study, a comparative transcriptome analysis between two contrasting populations of Galphimia spp., a galphimine-producer, and a non-galphimine-producer, is performed using RNA-Seq in the Illumina Next Seq 550 platform to identify putative candidates genes that encode enzymes of this metabolic pathway. Transcriptome functional annotation was performed using the Blast2GO in levels of gene ontology. For differential expression analysis, edgeR, pheatmap, and Genie3 library were used. To validate transcriptome data, qPCR was conducted. In producer and non-producer plants of both populations of Galphimia spp., most of the transcripts were grouped in the Molecular Function level of gene ontology. A total of 680 differentially expressed transcripts between producer and non-producer plants were detected. In galphimine-producer plants, a larger number of highly expressed transcripts related to acyclic and polycyclic terpene synthesis were identified. As putative candidate genes involved in the galphimine synthesis pathway, P450 family members and enzymes with kinase activity were identified.

Different metabolomic studies have already been carried out to detect the presence of galphimines as well as their concentration among six Mexican populations of Galphimia spp. [13,14], from which only two (located in Jalpan, (Querétaro) and Dr. Mora Table 1. Transcripts were obtained after de novo assembly of transcriptomes.

Sample and Replica Number of Transcripts
Leaves of galphimine producing-population 9023 Leaves of galphimine producing population, replica 8915 Leaves of non-galphimine producing-population 8667 Leaves of non-galphimine producing-population, replica 8896 Roots of galphimine producing-population 5004 Roots of galphimine producing population, replica 4667

Transcriptome Functional Annotation
The gene ontology analysis showed a different number of annotated genes: 5900 in GP plants, 3600 in the NGP ones, and 1500 in the controls. In GP plants and controls, most of the transcripts were grouped in gene ontology categories MF and CC. In the NGP plants, most of the transcripts were grouped mostly in the MF and BP categories. At the level of gene ontology BP, the largest number of transcripts were associated with cellular processes, biological processes, and biological regulation (Figure 1), while at the MF level, most of the transcripts annotated were related to catalysis, binding, and transporter. In the CC, transcripts of three samples are located as a cellular anatomical entity and protein-containing complex.

2.3.Transcriptome Functional Annotation
The gene ontology analysis showed a different number of annotated genes: 5900 in GP plants, 3600 in the NGP ones, and 1500 in the controls. In GP plants and controls, most of the transcripts were grouped in gene ontology categories MF and CC. In the NGP plants, most of the transcripts were grouped mostly in the MF and BP categories. At the level of gene ontology BP, the largest number of transcripts were associated with cellular processes, biological processes, and biological regulation (Figure 1), while at the MF level, most of the transcripts annotated were related to catalysis, binding, and transporter. In the CC, transcripts of three samples are located as a cellular anatomical entity and proteincontaining complex. According to the enzyme distribution code in the GP population, several transcripts were identified (Figure 2), such as 1300 with transferase activity, 1020 with hydrolase activity, and 810 with oxidoreductase activity; while in the NGP population transcripts were observed as follows: 2250 with oxidoreductase activity, 2220 with transferase activity, and 2050 with translocase activity. In control, 270 transcripts with hydrolase activity, 260 with transferase activity, and 250 with translocase activity were observed.

Percentage of annotated sequences
galfimine-producer plants non-galfimine-producer plants control According to the enzyme distribution code in the GP population, several transcripts were identified (Figure 2), such as 1300 with transferase activity, 1020 with hydrolase activity, and 810 with oxidoreductase activity; while in the NGP population transcripts were observed as follows: 2250 with oxidoreductase activity, 2220 with transferase activity, and 2050 with translocase activity. In control, 270 transcripts with hydrolase activity, 260 with transferase activity, and 250 with translocase activity were observed.

Differential Expression Analysis
When comparing the GP population with the NGP population, a total of 680 transcripts with differential expressions were found. Of these, 313 were positively regulated and 367 negatively regulated. The volcano plot shows that both positively regulated (represented by red dots) and negatively regulated transcripts (represented by green dots) have a large-magnitude logFC due to the significant separation from the center, and a high level of statistical significance since the location of dots over 100 of log10 (PValue). The greatest statistical significance was observed in positive regulated transcripts. ( Figure  3a).

Differential Expression Analysis
When comparing the GP population with the NGP population, a total of 680 transcripts with differential expressions were found. Of these, 313 were positively regulated and 367 negatively regulated. The volcano plot shows that both positively regulated (represented by red dots) and negatively regulated transcripts (represented by green dots) have a large-magnitude logFC due to the significant separation from the center, and a high level of statistical significance since the location of dots over 100 of log10 (PValue). The greatest statistical significance was observed in positive regulated transcripts. (Figure 3a).

Differential Expression Analysis
When comparing the GP population with the NGP population, a total of 680 transcripts with differential expressions were found. Of these, 313 were positively regulated and 367 negatively regulated. The volcano plot shows that both positively regulated (represented by red dots) and negatively regulated transcripts (represented by green dots) have a large-magnitude logFC due to the significant separation from the center, and a high level of statistical significance since the location of dots over 100 of log10 (PValue). The greatest statistical significance was observed in positive regulated transcripts. ( Figure  3a).   A total of 203 transcripts were differentially regulated between GP plants and control. The number of negatively regulated transcripts (logFC < 2) was 116, higher than positively regulated transcripts (logFC > 2), which was 87 in number. The positively regulated transcripts presented a higher statistical significance (Figure 3b). In addition, there were changes in the logFC and high values of -log10 (PValue), indicators of the statistical significance of some transcripts.
In the GP plants, a greater number of differentially expressed transcripts with positive regulation were observed as compared to the NGP and the control (Figure 4), suggesting a high metabolic activity in these plants. Moreover, the transcripts with the highest expression values in the GP plants were mostly hypothetical proteins, ribosomal proteins, transferases, endonucleases, and carboxypeptidases. However, in NGP plants, the transcripts with the highest expression encoded mitochondrial receptors, ribosomal proteins, enzymes with synthase activity, and polygalacturonases. expressed genes p-values, respectively. The significantly up-regulated and down-regulated genes are indicated with red and green dots, respectively, while non-significant genes are shown as black dots.
A total of 203 transcripts were differentially regulated between GP plants and control. The number of negatively regulated transcripts (logFC < 2) was 116, higher than positively regulated transcripts (logFC > 2), which was 87 in number. The positively regulated transcripts presented a higher statistical significance ( Figure 3b). In addition, there were changes in the logFC and high values of -log10 (PValue), indicators of the statistical significance of some transcripts.
In the GP plants, a greater number of differentially expressed transcripts with positive regulation were observed as compared to the NGP and the control (Figure 4), suggesting a high metabolic activity in these plants. Moreover, the transcripts with the highest expression values in the GP plants were mostly hypothetical proteins, ribosomal proteins, transferases, endonucleases, and carboxypeptidases. However, in NGP plants, the transcripts with the highest expression encoded mitochondrial receptors, ribosomal proteins, enzymes with synthase activity, and polygalacturonases. In the comparison between GP plants and controls, the transcripts with the highest expression values in the GP plants were ribosomal proteins, proteins associated with senescence, and enzymes with endonuclease, polymerase, and dehydrogenase activities ( Figure 5). In controls, the transcripts with higher expression values code for receptors, aquaporins, chaperones, endonucleases, and methyltransferases. In the comparison between GP plants and controls, the transcripts with the highest expression values in the GP plants were ribosomal proteins, proteins associated with senescence, and enzymes with endonuclease, polymerase, and dehydrogenase activities ( Figure 5). In controls, the transcripts with higher expression values code for receptors, aquaporins, chaperones, endonucleases, and methyltransferases.

Identification of Putative Candidate Genes That Encode Enzymes of the Galphimine Synthesis Pathway
Based on the results of functional annotation, 44 transcripts associated with the terpenes synthesis pathway were identified in GP plants ( Table 2). Out of those, two transcripts encode for 3-hydroxy-3-methylglutaryl-CoA reductase (a regulatory enzyme of the mevalonic acid pathway), and 13 transcripts encode enzymes of the methylerythritol phosphate pathway. Among other transcripts, three encode for enzymes associated with intermediates in the isoprenoid biosynthesis, one related to terpenes synthesis, and 25 with triterpenes. Table 2. Annotation and gene counts of transcripts related to terpene synthesis of two populations of Galphimia spp.

Identification of Putative Candidate Genes That Encode Enzymes of the Galphimine Synthesis Pathway
Based on the results of functional annotation, 44 transcripts associated with the terpenes synthesis pathway were identified in GP plants ( Table 2). Out of those, two transcripts encode for 3-hydroxy-3-methylglutaryl-CoA reductase (a regulatory enzyme of the mevalonic acid pathway), and 13 transcripts encode enzymes of the methylerythritol phosphate pathway. Among other transcripts, three encode for enzymes associated with intermediates in the isoprenoid biosynthesis, one related to terpenes synthesis, and 25 with triterpenes.
In the NGP plants, 35 transcripts were found associated with terpene synthesis, among which one encodes the regulatory enzyme of the mevalonic acid pathway, 10 associated with methylerythritol phosphate pathway, five with the intermediary in the isoprenoid biosynthesis, three transcripts related to terpenes synthesis and 16 with triterpenes ( Table 2). In the control, the following transcripts were observed: one that encodes for 3-hydroxy-3methylglutaryl-CoA reductase, one for P450, and five for terpene synthase.
The GP plants presented a greater number and diversity of transcripts related to the synthesis of terpenes with high values of gen counts compared to the NGP plants. This suggests a higher metabolic activity in the terpene synthesis pathway in these plants. P450 TBP, P450, and terpene synthase were the enzymes with the highest gen count. In the NGP plants, the enzymes with the highest gen counts were terpene synthase and CYP82G1.
In the co-expression network, 22 nodes were observed ( Figure 6), among which, HMGCR (3-hydroxy-3-methylglutaryl-CoA reductase), IPPI (isopentenyl diphosphate isomerase), and FT (farnesyl transferase) showed the highest number of interactions with other nodes. The expression of the majority of P450 family members was closely related.

Validation of Transcriptome Data by qPCR
By qPCR analysis, the results of the transcriptomic study were validated. The expression levels of putative candidate genes related to galphimine synthesis were found to be similar in both differential gene expression analysis ( Table 2) and qPCR (Figure 7). The expression was higher in GP plants because CT values were lower than in NGP plants ( Figure 8) and significant differences between plants of both populations in fold change value were observed ( Figure 6). In GP plants the genes with a higher level of relative expression were P450 TBP, 5'-AMP-activated protein kinase, and P450. Overall, the relative expression of these genes in NGP plants was low.

Validation of Transcriptome Data by qPCR
By qPCR analysis, the results of the transcriptomic study were validated. The expression levels of putative candidate genes related to galphimine synthesis were found to be similar in both differential gene expression analysis ( Table 2) and qPCR (Figure 7). The expression was higher in GP plants because CT values were lower than in NGP plants ( Figure 8) and significant differences between plants of both populations in fold change value were observed ( Figure 6). In GP plants the genes with a higher level of relative expression were P450 TBP, 5'-AMP-activated protein kinase, and P450. Overall, the relative expression of these genes in NGP plants was low.

Discussion
Galphimia spp, is an important medicinal plant in Mexico, with several applications in traditional medicine; thus, the main investigations with this plant are based on its pharmacological effects, especially for the modified triterpenoids galphimines. However, genes that encode enzymes of the galphimines synthesis pathway are unknown. Nevertheless, identifying new genes is a complex and great important process, especially when the genome is not available. The least complex and economical strategy is the use of omics

Discussion
Galphimia spp, is an important medicinal plant in Mexico, with several applications in traditional medicine; thus, the main investigations with this plant are based on its pharmacological effects, especially for the modified triterpenoids galphimines. However, genes that encode enzymes of the galphimines synthesis pathway are unknown. Nevertheless, identifying new genes is a complex and great important process, especially when the genome is not available. The least complex and economical strategy is the use of omics

Discussion
Galphimia spp, is an important medicinal plant in Mexico, with several applications in traditional medicine; thus, the main investigations with this plant are based on its pharmacological effects, especially for the modified triterpenoids galphimines. However, genes that encode enzymes of the galphimines synthesis pathway are unknown. Nevertheless, identifying new genes is a complex and great important process, especially when the genome is not available. The least complex and economical strategy is the use of omics tools such as transcriptome and digital gene expression profile analysis. These strategies allowed to explore and analyze genes with a differential expression which increases the chance of discovering new genes as well as the characterization of their structure. In this study, the Illumina Next seq 550 platform has been used to investigate the transcriptomes of Galphimia spp, which allowed the successful identification of several galphimine synthesis-related genes.
The GO annotation provides a preliminary indication of the nature of a gene product. In this study, most of the transcripts have been recorded at the MF and CC gene ontology levels, which provided an approximation to the elementary functions in which the transcripts participate as well as the parts of the cells or extracellular environment they're located in. Given that galphimines are triterpenoids of polycyclic structure and that the synthesis of these metabolites includes oxidation and transfer of functional groups, it is predicted that most of the transcripts identified are associated with transferase and oxidoreductase activities.
Triterpenes are one of the most important groups of secondary metabolites in plants. These are generally produced from the acyclic 30-carbon precursors 2,3-oxidosqualene in eukaryotes. The conversion of 2,3-oxidosqualene to various cyclic triterpenes by squalene cyclase is the first step in the diversification of the biosynthetic pathway of the triterpenes. Subsequently, the generated intermediaries undergo a large number of modifications regio and stereospecific structures catalyzed by cytochrome P450 (CYP), acetylases, and UDP glucosyltransferases that catalyze oxidation, methylation, acetylation, malonylation, and glycosylation, [23]. Plant CYPs transfer two electrons from cofactors NADPH and atmospheric oxygen to their substrates. Generally, plants require a CYP reductase to support electron transfer [24]. The oleanane, derived from β-amyrin, is one of the largest representatives of triterpene scaffolds pentacyclic. Approximately 55 CYPs act on scaffolds of pentacyclic triterpenes of plants. Most of them belong to members of the CYP716 family, although other families such as CYP51, CYP71, CYP72, CYP87, CYP88, and CYP93 also participate in modifications of pentacyclic triterpenes [23].
Also, three oxidosqualene cyclases (OSC) TwOSC1, TwOSC2, and TwOSC3 were isolated and characterized from Tripterygium wilfordii. TwOSC1 and TwOSC3 were multiproduct friedelin synthases and were involved in celastrol biosynthesis, while TwOSC2 was a β-amirin synthase [26]. In that work, the authors proposed a biogenetic pathway for celastrol in which hydroxyfriedelanes synthesis was mediated by P450 [26]. On the other hand, in Aralia elata, CYP716A295 and CYP716A296 were identified as the candidate genes most likely associated with oleanolic acid synthesis [27].
Galphimines are modified triterpenoids, classified as Nor-secofriedelanes. These are derived from oleanane and later from friedelin. The arrangements in the intermediates between friedelin and norsecofriedelanes involve oxidation-reduction reactions for opening a ring and the C3-C4 bond break, resulting in the two functional groups required for the formation of the seven-membered lactone characteristic of galphimines. Considering the functions that have been reported for the P450 family in the synthesis of triterpenes and the gene count of the transcripts encoding members of the P450 family in the GP population (Table 2), as well as the absence of many of these in the NGP population, these enzymes could be considered potential candidates for galphimines synthesis, as suggested in Figure 9. Plants 2022, 11, x FOR PEER REVIEW 11 of 16 Figure 9. Proposed galphimines synthesis pathway mediated by cytochrome P450 in Galphimia spp. plants. Solid arrows indicate pathway reactions identified in previous work [18,28]. Dotted arrows indicate reactions proposed in this study.
In differential expression analysis, the GP plants presented a greater number of genes related to the synthesis of terpenes as compared to the NGP plants. In both analyzed populations, some transcripts encode the mevalonic acid and methylerythritol phosphate pathways as well as precursors of the biosynthesis of isoprenoids, terpenes, and triterpenes. However, in the GP plants, the number and diversity of identified enzymes were higher. Since terpenes are a group of secondary metabolites that are highly represented in plants [29], they are present in both studied populations. However, the GP plants presented a higher expression of the enzymes of this pathway and, as a consequence, a higher content of these metabolites. This corroborates with the results of the study by Sharma et al. where the chemical profile of the Galphimia spp., the populations used in this study was analyzed by thin-layer chromatography (TLC), suggesting the presence of galphimines in the GP population, while other terpenes were in the NGP population [15].
Among the enzymes with the highest levels of expression in both populations were terpene synthase and CYP82G1. This is expected because terpene synthase diversifies isoprenoid precursors into a large number of mono and sesquiterpenes [30]. CYP82G1 participates in the synthesis of volatile homoterpenes. These homoterpenes are considered among the most common plant compounds that act as part of defense mechanisms against herbivores [31].
Co-expression networks are a new resource for understanding convergent pathways and their relations. This resource has been used to address various biological questions  [18,28]. Dotted arrows indicate reactions proposed in this study.
In differential expression analysis, the GP plants presented a greater number of genes related to the synthesis of terpenes as compared to the NGP plants. In both analyzed populations, some transcripts encode the mevalonic acid and methylerythritol phosphate pathways as well as precursors of the biosynthesis of isoprenoids, terpenes, and triterpenes. However, in the GP plants, the number and diversity of identified enzymes were higher. Since terpenes are a group of secondary metabolites that are highly represented in plants [29], they are present in both studied populations. However, the GP plants presented a higher expression of the enzymes of this pathway and, as a consequence, a higher content of these metabolites. This corroborates with the results of the study by Sharma et al. where the chemical profile of the Galphimia spp., the populations used in this study was analyzed by thin-layer chromatography (TLC), suggesting the presence of galphimines in the GP population, while other terpenes were in the NGP population [15].
Among the enzymes with the highest levels of expression in both populations were terpene synthase and CYP82G1. This is expected because terpene synthase diversifies isoprenoid precursors into a large number of mono and sesquiterpenes [30]. CYP82G1 participates in the synthesis of volatile homoterpenes. These homoterpenes are considered among the most common plant compounds that act as part of defense mechanisms against herbivores [31].
Co-expression networks are a new resource for understanding convergent pathways and their relations. This resource has been used to address various biological questions [32].
In the present study, the enzymes selected as possible candidates for the galphimines synthesis pathway interacted closely with each other, as well as with other enzymes involved in stages before the triterpenes synthesis. It has shown the genes that participate in the same metabolic pathway or those are under the same regulatory mechanism. It is worth remarking on the interaction of 3-hydroxy-3-methylglutaryl-CoA reductase, with a considerable number of the identified enzymes that are associated with the synthesis of polycyclic triterpenes. This enzyme is a regulator of the mevalonic acid pathway, which seems to be the proper route for synthesizing triterpenes [33]. The 3-hydroxy-3methylglutaryl-CoA reductase seems to control the core of the terpene's metabolism in the cytosol, regulating the production of the compound obtained, as well as the activity of the gene products involved in these reactions.
In this study, relative expression determined by qPCR corroborated the results from the transcriptomic analysis. Our results allow us to support that the genes selected as candidates involved in galphimine biosynthesis are present in high concentrations in the GP plants and are in low concentrations or absent in the NGP plants.
Given the limited availability of reference genomes in plants, the transcriptomic analysis of bioactive metabolites in producer and nonproducer plants is a valuable strategy to detect the genes involved in the synthesis pathways of the compounds of interest.

Plant Material
The collection of plants and seeds was carried out manually under the permission of Six RNA-Seq libraries (leaves samples from a GP population plant, leaves samples from the NGP population, root samples from the GP population, and one replica of each sample) were constructed using the Illumina TruSeq Stranded Total RNA protocol. Roots were used as a control since the production of galphimines is negligible in this organ. The quantity and quality of enriched dscDNA were assessed using a Qubit fluorometer (Thermo Fisher, Waltham, MA, USA) and Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). The libraries were sequenced using the Illumina Next Seq 550 platform; for 2 × 76 sequencing, cycles to obtain paired-end readings.

Reads Quality Control and De Novo Transcriptome Assembly
The quality of the reads obtained by sequencing was analyzed using the FastQC software (v.0.11.8) and Trimmomatic (v.0.38) tool was used to trim the sequence reads. Afterward, de novo transcriptomes assembly was carried out, for each sample, by Trinity (v.2.9.1) using the default settings mode.

Transcriptome Functional Annotation
Transcriptome functional annotation was performed using the Blast2GO (v.4.1.9). The assembled transcripts were aligned by a tblastx with the NCBI database. A BLAST expectation value of 1.0 × 10 −3 was applied, and for each sequence, 20 alignments were made. A sequence mapping was conducted, followed by functional annotation in levels of gene ontology Biological Process (BP), Molecular Function (MF), and Cellular Component (CC). Sequences that were not annotated by gene ontology were identified by InterPro, UniProt, and TAIR databases.

Differential Expression Analysis
To identify corresponding gene pairs between individuals in the two populations of Galphimia spp., Corset programs (v.1.09) (clustering de novo assembled transcripts) was used. After transcripts were compared and grouped, results were also verified by functional annotation. An abundance quantification of the transcripts assembled in each transcriptome was conducted using the Kallisto program (v.0.46.0.4). Then, the counting matrix was built in R (v.4.0.4) for Windows, with the use of the plyr, dplyr, and stats libraries. Genes with low library counts were filtered using the CPM Filter on a per-million count (CPM) basis. For the analysis of differential expression, the edgeR library was used. Samples in the counting matrix were normalized on a transcript-per-million (TPM) basis. Transcripts expression in the leaves of the GP plants was compared with transcripts of the NGP and with the control. The differentially expressed transcripts were considered to be those with a false positive rate (FDR) <0.01 and classified as positively regulated when log fold Change (logFC) > 2 and negative when logFC < −2. The transcripts with the highest differential expression values in each sample were visualized on a heat map constructed with the pheatmap library.

Identification of Putative Genes Related to the Galphimines Synthetic Pathway
Data obtained from the functional annotation of transcriptomes and the results of the differential expression analysis were used to identify genes involved in the synthesis of terpenes and putative candidate genes for the synthesis of galphimines. To identify the interactions between the genes related to the synthesis of terpenes and the possible candidate genes for the synthesis of galphimines, a co-expression network was also built, using the libraries of R, GENIE3, igraph, RCy3, and Rgraphviz. The co-expression network was calculated in the form of a weighted adjacency matrix, using ensembles of regression trees. Candidate regulatory genes were not used. The tree method was Random Forests. The number of candidate regulators randomly selected at each tree node was the square root of the total number of candidate regulators. The number of trees in each target was 1000. The co-expression network was visualized by Cytoscape (v.3.8.2).

Validation of Transcriptome Data by qPCR
cDNA of GP plants and NGP plants was obtained using the PrimerScript RT Reagent Kit with gDNA Eraser (TaKaRa, San Jose, CA, USA). The concentration and purity of cDNA were evaluated by Nano-Drop 2000 spectrophotometer (NanoDrop Technologies, Technologies, Wilmington, DE). According to transcriptome data of GP plants, nucleotide sequences of ten putative candidate genes related to galphimines synthetic pathway (CYP82C4, P450 TBP, PLAC8, 5'-AMP-activated protein kinase, serine/threonine-protein kinase AtPK2/AtPK19, P450, CYP71A24, CYP71B34, CYP72A, CYP81D), were used for primers designing (Table 3) by Primer3 plus program. The quantitative PCR was conducted using the TB Green ® Advantage ® qPCR Premix (Takara Bio USA, Inc., San Jose, CA, USA) following the manufacturer's protocol in a StepOne™ Real-Time PCR System (Applied Biosystems, Waltham, MA, USA). The amplification conditions used were as follows: activation/denaturation at 95 • C for 10 s, 45 cycles of denaturation at 95 • C for 5 s, and annealing/extension at 60 • C for 30 sec followed by a melting curve analysis ranging from 56-95 • C. Three biological replicates with three technical replicates were used for each sample. The relative expression of the RNA was quantified by the 2 −∆∆Ct method using U6 as the reference gene [34].

Conclusions
Transcriptome analysis of two populations of Galphimia spp. allowed the detection of differences between a galphimine producer GP plant and a non-galphimine producer NGP plant. The higher expression of P450 family members and kinases in GP plants and their well-known role in triterpene synthesis make them putative candidates to encode enzymes involved in the galphimine synthesis pathway.