Reference Genes Screening and Gene Expression Patterns Analysis Involved in Gelsenicine Biosynthesis under Different Hormone Treatments in Gelsemium elegans

Reverse transcription quantitative polymerase chain reaction (RT-qPCR) is an accurate method for quantifying gene expression levels. Choosing appropriate reference genes to normalize the data is essential for reducing errors. Gelsemium elegans is a highly poisonous but important medicinal plant used for analgesic and anti-swelling purposes. Gelsenicine is one of the vital active ingredients, and its biosynthesis pathway remains to be determined. In this study, G. elegans leaf tissue with and without the application of one of four hormones (SA, MeJA, ETH, and ABA) known to affect gelsenicine synthesis, was analyzed using ten candidate reference genes. The gene stability was evaluated using GeNorm, NormFinder, BestKeeper, ∆CT, and RefFinder. The results showed that the optimal stable reference genes varied among the different treatments and that at least two reference genes were required for accurate quantification. The expression patterns of 15 genes related to the gelsenicine upstream biosynthesis pathway was determined by RT-qPCR using the relevant reference genes identified. Three genes 8-HGO, LAMT, and STR, were found to have a strong correlation with the amount of gelsenicine measured in the different samples. This research is the first study to examine the reference genes of G. elegans under different hormone treatments and will be useful for future molecular analyses of this medically important plant species.


Introduction
Reverse transcription quantitative polymerase chain reaction (RT-qPCR) is now widely used in molecular biology research as a reliable means of measuring gene expression levels, which is highly sensitive, specific, accurate, and rapid [1][2][3].However, RT-qPCR results are susceptible to interference by reference genes.Reference genes are often required to normalize the data for the target gene to minimize errors [4,5].However, a large number of studies have revealed that these genes might not be stably expressed among all species, or under specific experimental conditions.For example, different stable reference genes for Luffa cylindrica [6], Cryptomeria fortunei [7], Allium sativum L. [8] have been identified under different conditions.Therefore, it is essential to select the best reference genes based on various factors to ensure accurate RT-qPCR results.
A number of analyses such as GeNorm, NormFinder, BestKeeper, ∆CT, and RefFinder are commonly used to estimate the stability of candidate reference genes [9][10][11][12][13][14].Among these, GeNorm is used to rank potential reference genes by the mean pairwise change in expression of a potential reference gene compared to every other gene in the set, and to assess the variability between genes [15].NormFinder compares candidate reference genes in pairs within and between groups, and BestKeeper uses an analytical procedure based on the mean cycle threshold (Ct) values of candidate reference genes [16][17][18].The ∆CT method compares the relative expression levels of pairs of genes in each sample [19].RefFinder is used to combine the above algorithms to obtain a final result [20].Currently, suitable reference genes that can be used for normalizing RT-qPCR gene expression analyses in Gelsemium elegans under different experimental conditions have not been reported.
G. elegans is an evergreen woody vine of the genus Gelsemium in the family Loganiaceae, mainly found in China, and southeastern Asia [21].The whole plant is highly poisonous and is a traditional Chinese herb remedy for many symptoms [22].Its roots are rich in monoterpene indole alkaloids (MIAs), which have demonstrated excellent antiinflammatory, anti-tumor, anti-anxiety, immunomodulatory, and analgesic effects with promising medical applications [23][24][25][26][27][28].Gelsenicine is an ingredient of MIAs and is the most toxic alkaloid of G. elegans.It has many pharmacological effects [29].Although the biosynthesis of gelsenicine is still poorly understood, it has been found that its upstream biosynthesis pathways are closely related to that found in the iridoids and indole pathways (Figure 1) [30].The iridoid pathway starts with the condensation of isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP) to form geranyl diphosphate (GPP).Its dephosphorylation by the action of geraniol synthase (GES) leads to the formation of geraniol, followed by various enzymic reactions (ten steps) to form secologanin.The indole pathway begins with chorismite that is catalyzed by anthranilate synthase (AS) to form anthranilate, followed by various enzymic reactions (six steps) to give tryptamine.Finally, strictosidin synthase (STR) catalyzes the formation of strictosidine from secologanin and tryptamine [31][32][33][34].Gelsemium alkaloids are then obtained.However, the molecular understanding of gelsemium alkaloids and the associated biosynthetic pathways have been little studied in detail.
Several studies have shown that exogenous hormones such as salicylic acid (SA), methyl jasmonate (MeJA), ethylene (ETH), and abscisic acid (ABA), induced or otherwise, alter the biosynthesis of secondary metabolites [35][36][37][38].Catharanthine in Catharanthus roseus [39], isoquinoline alkaloids in Dendrobium officinale PLBs [40], camptothecin (CPT) in Camptotheca acuminata [41], have been found in response to hormones on biosynthetic pathways.Nevertheless, there is still a gap in research on the effects of exogenous plant hormones on the MIA biosynthesis of G. elegans.Therefore, in order to further understand the biosynthesis pathway to gelsemium alkaloids, identifying the expression patterns of pathway-related genes in response to different environmental stimuli is an important step.
In this study, ten candidate reference genes were selected based on the genomic data [42] and then assessed for stability of gene expression under SA, MeJA, ETH, and ABA treatments.Five algorithms were used for the systematic screening of optimal reference genes.Finally, STR, an important target gene in MIA upstream biosynthetic pathways, was used to validate the reference genes identified.In addition, we also screened 14 genes potentially involved in the gelsenicine upstream pathways based on a "genome + transcriptome + metabolome" co-expression analysis and analyzed their expression patterns.This work identified the best reference genes for RT-qPCR analysis of G. elegans under the different hormone treatments performed and illustrates the expression patterns of biosynthesis-related genes.This provides a foundation for further molecular studies of the MIA biosynthesis in G. elegans.
biosynthesis-related genes.This provides a foundation for further molecular studies of the MIA biosynthesis in G. elegans.

The Content of Gelsenicine under Four Hormone Treatments
The chromatograms of the standard solution and sample solution are shown in Figures S1-S4.The different hormone treatments of G. elegans leaves significantly affected the content of gelsenicine over a 48 h period compared to the control (Figure 2).After 48 h, the content of gelsenicine after treatment with SA (0.176 mg•g −1 , 51.33%) and MeJA (0.205 mg•g −1 , 59.93%) showed highly significant decreases (p < 0.01); ABA (0.381 mg•g −1 , 111.28%) showed a significant increase (p < 0.05) and there was no significant change observed with ETH.Thus, addition of SA and MeJA had a very positive impact on the content of gelsenicine in G. elegans leaves.

The Content of Gelsenicine under Four Hormone Treatments
The chromatograms of the standard solution and sample solution are shown in Figures S1-S4.The different hormone treatments of G. elegans leaves significantly affected the content of gelsenicine over a 48 h period compared to the control (Figure 2).After 48 h, the content of gelsenicine after treatment with SA (0.176 mg•g −1 , 51.33%) and MeJA (0.205 mg•g −1 , 59.93%) showed highly significant decreases (p < 0.01); ABA (0.381 mg•g −1 , 111.28%) showed a significant increase (p < 0.05) and there was no significant change observed with ETH.Thus, addition of SA and MeJA had a very positive impact on the content of gelsenicine in G. elegans leaves.

Amplification Specificity, Efficiency, and Expression Profile Analysis of Reference Genes in G. elegans
The specificity of the primers was measured by the RT-qPCR melting curve, and the results showed that all reference genes had a distinct single peak, indicating good specificity of all primers (Figure S5a).Also, the amplification efficiencies of the 10 candidate genes ranged from 97.9% to 110.1%, with the lowest and highest values for 18S and EF1α, respectively (Table 1).In addition, standard curves, calculated using five-fold dilution of cDNA, had correlation coefficients (R 2 ) greater than 0.99 for all candidate genes, which indicated a good linear relationship of the data obtained by RT-qPCR.The expression levels of the 10 candidate genes can be gauged by calculating the RT-qPCR cycle threshold (Ct) values, assuming reaction efficiencies are comparable.A lower Ct value is associated with a higher gene expression abundance.The data showed that the expression patterns of these reference genes were very diverse (Figure 3), with Ct values ranging between 16.97-28.10.The highest expression level candidate was EF1-α (18.35 ± 0.99), and the lowest expression level candidate normalization gene was SAND (25.78 ±

Amplification Specificity, Efficiency, and Expression Profile Analysis of Reference Genes in G. elegans
The specificity of the primers was measured by the RT-qPCR melting curve, and the results showed that all reference genes had a distinct single peak, indicating good specificity of all primers (Figure S5a).Also, the amplification efficiencies of the 10 candidate genes ranged from 97.9% to 110.1%, with the lowest and highest values for 18S and EF1-α, respectively (Table 1).In addition, standard curves, calculated using five-fold dilution of cDNA, had correlation coefficients (R 2 ) greater than 0.99 for all candidate genes, which indicated a good linear relationship of the data obtained by RT-qPCR.The expression levels of the 10 candidate genes can be gauged by calculating the RT-qPCR cycle threshold (Ct) values, assuming reaction efficiencies are comparable.A lower Ct value is associated with a higher gene expression abundance.The data showed that the expression patterns of these reference genes were very diverse (Figure 3), with Ct values ranging between 16.97-28.10.The highest expression level candidate was EF1-α (18.35 ± 0.99), and the lowest expression level candidate normalization gene was SAND (25.78 ± 1.07).Furthermore, GAPDH (19.26 ± 1.39) showed the largest variation in expression level, whereas PP2A (22.61 ± 0.50) showed a small range of variation.The relatively small Ct ranges of UBC, PP2A, and Actin as compared to the other candidate genes suggest that their expression is likely to be more stable across the samples analyzed.1.07).Furthermore, GAPDH (19.26 ± 1.39) showed the largest variation in expression level, whereas PP2A (22.61 ± 0.50) showed a small range of variation.The relatively small Ct ranges of UBC, PP2A, and Actin as compared to the other candidate genes suggest that their expression is likely to be more stable across the samples analyzed.

Stability Analysis of Candidate Reference Genes
Five tools (GeNorm, NormFinder, BestKeeper, ΔCT, and RefFinder) were used to assess the expression stability of ten candidate genes for G. elegans leaves after hormone treatment (Table 2).The experimental datasets were divided into six groups, Control (0 h), SA, MeJA, ETH, ABA, and Total group (all samples), and all genes in these six groups were analyzed independently.

Stability Analysis of Candidate Reference Genes
Five tools (GeNorm, NormFinder, BestKeeper, ∆CT, and RefFinder) were used to assess the expression stability of ten candidate genes for G. elegans leaves after hormone treatment (Table 2).The experimental datasets were divided into six groups, Control (0 h), SA, MeJA, ETH, ABA, and Total group (all samples), and all genes in these six groups were analyzed independently.GeNorm assessed gene stability by calculating M-values.Smaller M-values (M) indicate better stability.The M-value for each gene was less than 1.5 in all groups (Figure 4).The results showed that CDC25 and EF1-α had the highest stability both in the groups Control and SA, while Actin and EF1-α were the two most stable genes for groups MeJA and ABA.For ETH and Total group, CDC25 and PP2A, 18S and UBC were the two most stable genes, respectively.In contrast, 18S was the least stable gene for groups Control and SA, while ranked the last for its stability in groups MeJA and Total.In addition, EF1-α and GAPDH were the bottom gene in groups ETH and ABA.Therefore, there are obvious differences observed in gene expression stability for different treatments, and follow-up studies were required to identify the most suitable reference genes.GeNorm analysis also requires the pairwise variation (V) values, by which Vn/n+1 determines the optimum number of reference genes to be selected.The V2/3 values in four treatments and Control group were less than 0.15 (Figure 5), which means that at least two genes are required to improve the reliability and accuracy of the results when selecting normalized reference genes under these five treatments.In contrast, the total group required at least three reference genes for data processing.

NormFinder Analysis of Reference Genes in G. elegans
NormFinder was used to assess gene expression stability by analyzing intra-and inter-group variation.The results differ a little from GeNorm (Table 2) and showed that TUA and GAPDH, EF1-α, and GAPDH, UBC and TUA, Actin and PP2A, SAND and TUB, Actin and UBC, were the two most stable genes in groups Control, SA, MeJA, ETH, ABA, and Total, respectively.In contrast, the least stable genes were identical to GeNorm in all groups.

BestKeeper Analysis of Reference Genes in G. elegans
BestKeeper judges stability by using CV and standard deviation (SD, usually SD < 1) values, with smaller values indicating better stability of candidate reference genes.As shown in Table 2, the stabilities of Actin and UBC were the highest under the Control GeNorm analysis also requires the pairwise variation (V) values, by which V n/n+1 determines the optimum number of reference genes to be selected.The V 2/3 values in four treatments and Control group were less than 0.15 (Figure 5), which means that at least two genes are required to improve the reliability and accuracy of the results when selecting normalized reference genes under these five treatments.In contrast, the total group required at least three reference genes for data processing.GeNorm analysis also requires the pairwise variation (V) values, by which Vn/n+1 determines the optimum number of reference genes to be selected.The V2/3 values in four treatments and Control group were less than 0.15 (Figure 5), which means that at least two genes are required to improve the reliability and accuracy of the results when selecting normalized reference genes under these five treatments.In contrast, the total group required at least three reference genes for data processing.

NormFinder Analysis of Reference Genes in G. elegans
NormFinder was used to assess gene expression stability by analyzing intra-and inter-group variation.The results differ a little from GeNorm (Table 2) and showed that TUA and GAPDH, EF1-α, and GAPDH, UBC and TUA, Actin and PP2A, SAND and TUB, Actin and UBC, were the two most stable genes in groups Control, SA, MeJA, ETH, ABA, and Total, respectively.In contrast, the least stable genes were identical to GeNorm in all groups.

BestKeeper Analysis of Reference Genes in G. elegans
BestKeeper judges stability by using CV and standard deviation (SD, usually SD < 1) values, with smaller values indicating better stability of candidate reference genes.As shown in Table 2, the stabilities of Actin and UBC were the highest under the Control

NormFinder Analysis of Reference Genes in G. elegans
NormFinder was used to assess gene expression stability by analyzing intra-and inter-group variation.The results differ a little from GeNorm (Table 2) and showed that TUA and GAPDH, EF1-α, and GAPDH, UBC and TUA, Actin and PP2A, SAND and TUB, Actin and UBC, were the two most stable genes in groups Control, SA, MeJA, ETH, ABA, and Total, respectively.In contrast, the least stable genes were identical to GeNorm in all groups.

BestKeeper Analysis of Reference Genes in G. elegans
BestKeeper judges stability by using CV and standard deviation (SD, usually SD < 1) values, with smaller values indicating better stability of candidate reference genes.As shown in Table 2, the stabilities of Actin and UBC were the highest under the Control group.The SD values of all reference genes were low (<0.49)across the SA treatment group, with the two most stable genes being GAPDH and CDC25.UBC and GAPDH showed the highest stability under MeJA treatment.The stability of ETH treatment was best with Actin and PP2A.Under ABA treatment, PP2A and CDC25 were shown as the two most stable genes.From the Total group, the two genes that had the highest stability were PP2A and UBC.Interestingly, the bottom ranked genes were consistent with GeNorm and NormFinder except for the ETH and Total groups, with UBC and GAPDH being the most unstable genes in these two groups, respectively.In addition, the SD of the candidate genes under most treatments was <1.

∆CT Method of Reference Genes in G. elegans
The ∆CT method assesses the stability based on the SD, and the SD value is inversely proportional to the stability using the raw Ct value with while the minimum SD corresponds to the most stable gene (Table 2).The results showed that, the best two stable genes were TUA and GAPDH, EF1-α and GAPDH, UBC and TUA, Actin and PP2A, Actin and SAND, Actin and UBC for groups Control, SA, MeJA, ETH, ABA, and Total, respectively, which were similar to the results of NormFinder.In summary, the results of this method were very similar to those assessed by NormFinder.

RefFinder Analysis of Reference Genes in G. elegans
The results of the four algorithms mentioned above varied, and RefFinder was used to derive the geometric mean of the results from them (Table 2).The analysis showed that the best reference genes available under each treatment differed (Table 3).The top two reference genes in Control group were TUA and GAPDH, while 18S ranked last.The two most stable genes were EF1-α and CDC25 across the SA treatment and 18S at the bottom, the least stable.In the MeJA treatment, UBC and Actin showed the highest ranking while TUB showed the lowest ranking.Actin and PP2A in the ETH treatment group were the most stable genes and the least stable gene was EF1-α.Across the ABA treatment group, Actin and SAND ranked the most stable, with GAPDH ranked last.Furthermore, from the Total group, UBC, Actin, 18S, and PP2A were the top ranked genes, which correlated well with the results obtained from the four hormone treatments.

Validation of the Stability of Reference Genes
To validate the reliability of reference genes, the expression of STR was normalized using these genes (Figure 6).STR, a central gene encoding the enzyme that catalyzes the rate-limiting step of MIAs, can strongly influence the production of MIAs.Hormones like MeJA, which cause changes in the external environment, lead to changes in the expression of STR (Cao and Wang 2021).Since its expression is associated with changes in the MIA content, the accumulation of gelsemium alkaloids also undergoes relevant changes.
The relative expression patterns of STR showed similar trends when the two most stable genes were used individually or in combination as the reference genes for normalization.However, when the two unstable genes were used as references, the relative expression levels of STR showed significant fluctuations.For instance, under ETH treatment for 0-48 h, the expression level of STR was the highest at 0 h and the lowest at 8 h when using the stable genes (Actin, PP2A and Actin + PP2A) as the reference genes, with no significant change at 48 h compared to 0 h; this was shown to be the same as gelsenicine.In contrast, the expression level of STR was highest at 48 h and the general trends were different when using the least stable genes (UBC and EF1-α).In addition, the other three hormone The relative expression patterns of STR showed similar trends when the two most stable genes were used individually or in combination as the reference genes for normalization.However, when the two unstable genes were used as references, the relative expression levels of STR showed significant fluctuations.For instance, under ETH treatment for 0-48 h, the expression level of STR was the highest at 0 h and the lowest at 8 h when using the stable genes (Actin, PP2A and Actin + PP2A) as the reference genes, with no significant change at 48 h compared to 0 h; this was shown to be the same as gelsenicine.In contrast, the expression level of STR was highest at 48 h and the general trends were different when using the least stable genes (UBC and EF1-α).In addition, the other three hormone treatment groups were validated on the basis of the results obtained in Table 3 for the most and least stable genes.However, the expression levels and trends of STR under different groups pointed to similar conclusions.It is evident that using unstable reference genes for gene expression analysis in G. elegans can lead to unreliable results.

Expression Patterns of Pathway Genes Involved in the Biosynthesis of Gelsenicine
To further enrich the study of the biosynthetic pathways involved in MIAs, 15 genes related to the upstream pathways were screened based on genomic and transcriptome datasets and used the "genome + transcriptome + metabolome" co-expression association analysis method (descriptions for the 15 genes are shown in Table 4).The expression patterns of related enzyme genes on the MIA biosynthetic pathway were analyzed using the two most stable genes from each treatment as references to identify potential links between gelsenicine and MIA biosynthesis pathways.After different hormone treatments, almost all gene expressions changed to a certain extent over the six time periods between 0-48h.The results are presented as a combination of heatmaps and correlation analyses based on the trend of relative expression (Figures 7-9).

Expression Patterns of Pathway Genes Involved in the Biosynthesis of Gelsenicine
To further enrich the study of the biosynthetic pathways involved in MIAs, 15 genes related to the upstream pathways were screened based on genomic and transcriptome datasets and used the "genome + transcriptome + metabolome" co-expression association analysis method (descriptions for the 15 genes are shown in Table 4).The expression patterns of related enzyme genes on the MIA biosynthetic pathway were analyzed using the two most stable genes from each treatment as references to identify potential links between gelsenicine and MIA biosynthesis pathways.After different hormone treatments, almost all gene expressions changed to a certain extent over the six time periods between 0-48h.The results are presented as a combination of heatmaps and correlation analyses based on the trend of relative expression (Figures 7-9).

Co-Expression Screening for MIA Pathway Genes
The co-expression association analysis is highly predictive for the screening of multigene or super-gene family candidates and allows for a more precise study of key sequences.Based on established research, the trend in the relative content of gelsenicine in different parts of G. elegans was highest in roots, followed by flowers, stems, and leaves in that order.We used 8-HGO and gelsenicine for association analysis as an example and found ten 8-HGO-related candidate sequences after identification of genomic data.From the results (Figure 10), there are three genetic sequences that are likely to be involved in gelsenicine biosynthesis.As the two nearest genes clustered with gelsenicine were lowly expressed in the leaf based on transcriptome information, the focus sequence Contig3.509,which is likely to be involved in gelsenicine biosynthesis, was selected.Other genes were screened by the same approach.The co-expression association analysis is highly predictive for the screening of multigene or super-gene family candidates and allows for a more precise study of key in that order.We used 8-HGO and gelsenicine for association analysis as an example and found ten 8-HGO-related candidate sequences after identification of genomic data.From the results (Figure 10), there are three genetic sequences that are likely to be involved in gelsenicine biosynthesis.As the two nearest genes clustered with gelsenicine were lowly expressed in the leaf based on transcriptome information, the focus sequence Contig3.509,which is likely to be involved in gelsenicine biosynthesis, was selected.Other genes were screened by the same approach.The expression patterns of genes coding for enzyme in the MIA biosynthesis pathway was analyzed by using the two most stable genes from each treatment as references to identify potential links between gelsenicine and MIA biosynthesis pathways.After different hormone treatments, almost all gene expressions were changed to a certain extent during the six time periods between 0-48 h.A combination of heatmaps and correlation analyses based on the trend of relative expression is shown (Figures 7-9).

Expression Pattern Analysis of Gelsenicine-Related Genes under SA Treatments
Among the SA treatments, the 15 genes with the most similar expression patterns to gelsenicine accumulation were SGD, STR, and LAMT, which all showed a general trend of decreasing-increasing-decreasing and were finally significantly down-regulated at 48 h compared to 0 h (Figure 7a).Clearly, SGD expression showed a quite significant positive correlation with gelsenicine accumulation, with both STR and LAMT expression being quite similar (Figures 8a and 9a).Conversely, for instance, AnPRT, 8-HGO, and other six genes nearly all showed significantly increasing expression at 2 h and 48 h (Figure 7a).The expression patterns of genes coding for enzyme in the MIA biosynthesis pathway was analyzed by using the two most stable genes from each treatment as references to identify potential links between gelsenicine and MIA biosynthesis pathways.After different hormone treatments, almost all gene expressions were changed to a certain extent during the six time periods between 0-48 h.A combination of heatmaps and correlation analyses based on the trend of relative expression is shown (Figures 7-9).

Expression Pattern Analysis of Gelsenicine-Related Genes under SA Treatments
Among the SA treatments, the 15 genes with the most similar expression patterns to gelsenicine accumulation were SGD, STR, and LAMT, which all showed a general trend of decreasing-increasing-decreasing and were finally significantly down-regulated at 48 h compared to 0 h (Figure 7a).Clearly, SGD expression showed a quite significant positive correlation with gelsenicine accumulation, with both STR and LAMT expression being quite similar (Figures 8a and 9a).Conversely, for instance, AnPRT, 8-HGO, and other six genes nearly all showed significantly increasing expression at 2 h and 48 h (Figure 7a).These genes were not consistent with changes in gelsenicine and showed a significant negative correlation.

Expression Pattern Analysis of Gelsenicine-Related Genes under MeJA Treatments
After MeJA treatment, the changes of gelsenicine content were mainly up-regulated significantly at 8 h and down-regulated significantly at 4 h and 48 h (Figure 2).The trends of GES and AnPRT were similar to that of gelsenicine, while the trends of 8-HGO, 7-DLS, and another four genes were also similar to that of gelsenicine, with a significant increase and peak at 2 h; most of them were significantly down-regulated at 48 h compared to 0 h (Figure 7b).Interestingly, there was a highly significant positive correlation between GES, AnPRT, and 8-HGO with gelsenicine, as well as a significant positive correlation between 7-DLS, SLS, and another four genes with gelsenicine (Figures 8b and 9b).This suggests that with MeJA treatment, it is likely that all of these genes are positively connected with gelsenicine production, although there are differences in the degree of response.

Expression Pattern Analysis of Gelsenicine-Related Genes under ETH Treatments
Following ETH hormone treatment, gelsenicine content increased significantly at 24 h, and finally there was no significant change at 48 h compared with 0 h (Figure 2).Most of the expression profiles showed a tendency to decrease and then increase in expression, such as 8-HGO, LAMT, STR, and another four genes, which reached their lowest expression level at 2 h or 8 h (Figure 7c).The expression profiles of AS, PRAI, and TSA genes were very similar, which rebounded at 8 h after decreasing at 0 h, with the expression of most of them then significantly decreasing at 48 h compared with 0 h (Figure 7c).Their overall expression profile was positively correlated with gelsenicine (Figures 8c and 9c).In addition, ETH treatment indicated up-regulation of the most MEP pathway-related genes versus down-regulation of the indole pathway-related genes, which may account for the lack of significant changes in gelsenicine content.

Expression Pattern Analysis of Gelsenicine-Related Genes under ABA Treatments
Under ABA treatment, gelsenicine content decreased and then increased at 48 h and was up-regulated by 1.11-fold relative to 0 h (Figure 2).Similar to the gelsenicine change, 8-HGO, LAMT, and STR gene expression profiles showed an overall trend of decreasing and then increasing (Figure 7d), in which 8-HGO showed a highly significant positive correlation with gelsenicine, and the other two genes showed a significant positive correlation with gelsenicine (Figures 8d and 9d).Conversely, IS and another three gene expression profiles first rose, peaking at 8 h and then decreased (Figure 7d).The trends of TSB, IGPS, and SLS were not exactly the same from those genes, as they showed a complex response (Figure 7d).Under the influence of ABA, the genes responded at varying degrees, but ultimately there was an up-regulation of most genes, which may have been responsible for the significant up-regulation of gelsenicine levels.
Overall, the expression patterns between gelsenicine and the genes on the pathway were somewhat correlated at different time intervals after different treatments.In particular, the expression patterns of 8-HGO, LAMT, and STR had the most similar expression profile relative to gelsenicine content, suggesting that these genes may be involved in coordinating the biosynthetic process of gelsenicine, as well as potentially the biosynthetic pathways of other MIA classes in G. elegans.

Discussion
Gene expression analysis is a key technique to access the connection between the changes in plant secondary metabolism and the activity of related genes in the biosynthesis pathway [44].RT-qPCR is one of the most accurate technique to analyze gene expression profiles, and the screening of optimal reference genes is a crucial step [45].In many plant species, optimal reference genes have been identified under various experimental conditions [46].At present, EF-1α and TUA are the best two genes for Chenopodium quinoa seedlings after treating with ABA [47].EF-1α, and TUB were selected as the best reference genes for studying Petroselinum crispum gene expression profiles under different hormone treatments and abiotic stress [48].Actin was found to be the most stable reference gene in Celery subjected to various abiotic stresses as well as hormone treatments [49].Thus, reference genes can be different for different experimental conditions.In this work, we selected ten candidate reference genes to identify the most suitable reference genes in G. elegans under specific conditions, thus contributing to the gelsenicine biosynthesis pathway-related genes for expression pattern analysis by RT-qPCR.Our results showed that EF1-α and CDC25, UBC and Actin, Actin and PP2A, Actin and SAND were the optimal reference genes under SA, MeJA, ETH, and ABA, respectively.In contrast, 18S, TUB, and GAPDH were considered to be inappropriate genes for normalization in G. elegans leaves under SA, MeJA, and ABA respectively.Notably, EF1-α showed the least stability in ETH treatment, which is opposite to the SA treatment.As identifying reference genes in different experimental conditions is crucial for the accuracy of RT-qPCR studies, our work lays the foundation for future gene expression analysis and functional studies of G. elegans.
To ensure the reliability and accuracy of the results of reference gene stability analysis, GeNorm, NormFinder, BestKeeper, ∆CT, and RefFinder tools are commonly used.However, the genes identified as stable depend on the specific analysis performed.For example, under SA treatment, analysis by NormFinder and ∆CT methods identified that the most stable genes were EF1-α and GAPDH, GeNorm placed these two genes in second and fifth place, while BestKeeper placed these genes in third and first place.Although there were slight differences between rankings of these genes assessed by the various algorithms, the top five stable genes across these algorithms were similar for each group.For instance, UBC, TUA, Actin, GAPDH, and EF1-α were the top five stable genes based on the BestKeeper and ∆CT under the MeJA treatment, and NormFinder showed similar results except for EF1-α.In addition to GAPDH, the other four genes are also the most stable in GeNorm.A number of other studies showed similar differences between the results of these tools that occur from the use of different statistical approaches [50,51].We found that RefFinder results were ranked similar to these four tools, which further supports the reliability and accuracy of these results.It is noteworthy that the V value obtained in GeNorm played an important role in deciding the optimal number for selecting the reference gene.In this study, the optimal number of reference genes selected for the four treatments was two genes, and the optimal reference gene combinations were the top two genes according to RefFinder (Table 3).A further validation using STR indicated that we had identified the best reference gene combinations for use in G. elegans leaves under the different hormone treatments that we studied.
As gelsenicine is formed from the MIA upstream biosynthesis pathway via a series of reactions, we carried out co-expression analysis of the "genome + transcriptome + metabolome" dataset to find candidate upstream pathway-related genes, whose expression profiles could be associated with the content accumulation of gelsenicine in different tissues of G. elegans.Previously, some studies had used similar co-expression analysis to identify pathwayassociated genes.In Salvia miltiorrhiza, it was hypothesized that tanshinone biosynthetic enzymes would be expressed in the rhizome, due to the accumulation of tanshinones in this organ, and six of the probable 14 CYP450s were then further screened on this basis.One CYP450 could finally be validated as being related to tanashinone biosynthesis [52].In another study in Siraitia grosvenorii, using the terms "genome + transcriptome + metabolome" seven candidate CYP450s (out of 80) and five candidate UDPGs (out of 90) with potential to be involved in the mogroside V biosynthesis pathway were targeted for their successful association with this pathway [53][54][55].In this work, we used co-expression analysis to identify 15 candidate genes related to the gelsenicine biosynthesis pathway.We found that the expression levels of these genes as well as the gelsenicine content significantly changed under different hormone treatments.We hypothesize that candidate gelsenicine biosynthesis related genes that are strongly associated with changes in content are more likely to be involved in the gelsenicine biosynthesis.8-HGO, LAMT, and STR expression profiles were similar to the variation trends in the content of gelsenicine, suggesting these genes are likely to be involved in coordinating the biosynthetic process of gelsenicine.For instance, after treating with ABA, the expression pattern of 8-HGO, LAMT, and STR showed good correlation with the content trend of gelsenicine.These genes will be subjected to further molecular examination to determine their actual association with the gelsenicine biosynthesis.If any one of these three candidates is important for gelsenicine biosynthesis, this methodology with be of great use for quick identification of other MIA associated genes in G. elegans.

Determination of the Content of Gelsenicine by HPLC
Samples were fully ground into powder under liquid nitrogen and dried in a MODU-LYOD freeze dryer (Thermo Scientific, Santa Clara, CA, USA).Samples to be analyzed were weighed (2 g) and extracted twice by ultrasonication with 80% ethanol (1:25, w/v) for 0.5 h at 60 • C. The extraction solutions were combined for filtration, using 1mL as the sample solution.The prepared MeOH (stock) solution with 1 mg gelsenicine mL −1 (Chengdu Must Bio-technology Co., Chengdu, China) was added to give a final concentration for treatments of 0.04 mg•mL −1 .
Analysis was performed on a Thermo Scientific Ultimate 3000 HPLC instrument (Thermo Scientific, Santa Clara, CA, USA).Samples were separated on an Agilent C 18 column (5 µm, 4.6 mm × 250 mm).The flow rate was 1 mL•min −1 , and the column temperature was maintained at 30 • C. The mobile phase consisted of acetonitrile (B) and phosphoric acid aqueous solution (D).The gradient elution for 40 min was as follows: 0-2 min, 10% B; 2-7 min, 10% B to 15% B; 7-30 min, 15% B; 30-40 min, 15% B to 90% B; 40-43 min, 90% B; 43.01-50 min, 10% B. The injection volume was 5 µL.Five concentrations of gelsenicine were injected, and then calibration curves were set by plotting the peak area versus the concentration.The sample solution was passed through a 0.22 µM organic microporous membrane filter (Tianjin Jinteng Experimental Equipment Co., Tianjin, China), and was analyzed at the above chromatographic conditions.Combined with the HPLC results and the linear relationship, the content of gelsenicine under different treatments was calculated.All samples were biologically repeated three times.

RNA Extraction and cDNA Synthesis
Total RNA was extracted from samples of G. elegans by the SteadyPure Plant RNA Extraction Kit (Hunan Acres Biological Engineering Co., Changsha, China).RNA mass and RNA purity were measured in an ultra-micro spectrophotometer (item no.912A0959, purchased from BIO-DL, Shanghai, China) and RNA integrity was assessed by electrophoresis on a 1.0% agarose gel.cDNA from samples of G. elegans was synthesized according to the Evo M-MLV RT Mix Kit with the gDNA Clean product protocol (Hunan Acres Bioengineering Co., Changsha, China) and the amount of total RNA used was 700 ng.

RT-qPCR
RT-qPCR was performed with the ABI 7300 Real-Time Fluorescence PCR System (Applied Biosystems, Foster City, CA, USA) using the SYBR ® Green Premix Pro Taq HS qPCR Kit (Hunan Acres Biological Engineering Co., Changsha, China).A total of 20 µL of reaction mixture containing 2 µL of diluted cDNA (~25 ng/µL), 10 µL of 2 × SYBR Green Pro Taq HS premix, 0.4 µL of forward primer (10 µM), 0.4 µL of reverse primer (10 µM), 0.4 µL of ROX (20 µM), and 6.8 µL of ddH 2 O was used.PCR reaction conditions were as follows: 95 • C for 3 min, 40 cycles at 95 • C for 30 s, and 60 • C for 30 s. Melting curves were generated by heating the amplicon from 60 • C to 95 • C to confirm primer specificity.Each RT-qPCR reaction was repeated with three technical replicates and Ct was measured automatically.Relative changes in gene expression were calculated using the method of comparing 2 −∆∆CT [57].Additionally, a 5-fold dilution series (5 0 , 5 −1 , 5 −2 , 5 −3 and 5 −4 ) of template cDNA was used for all samples and (~100 ng/µL) for the standard curve.Correlation coefficients (R 2 ) were measured from the standard curve based on these cDNA templates.The primer specificity was verified by the presence of a single peak in the melting curve analysis.Amplification efficiency (E) of the primer pair could be calculated from the slope value with the following formula: E = 10 −1÷k .

Evaluation of Reference Genes
When performing GeNorm and NormFinder analyses, the original Ct value is first converted into a relative quantitative value (Q value) with the following equation Q = 2 −∆CT [58,59].The expression stability (M) value and the pairwise variation (V) value are used for GeNorm.A smaller M value reflecting higher stability, and V value (V n/n+1 < 0.15 is a cut-off) determines the optimum number of reference genes to select.NormFinder also obtains the M values based on the variation of reference genes within and between samples [60].BestKeeper requires the raw Ct value, which is then assessed by the standard deviation (SD) and coefficient of variance (CV) as the criteria for stable expression of the reference gene [61].The ∆CT method assesses gene expression stability by calculating the mean standard deviation (SD).The RefFinder website (http://blooge.cn/RefFinder/,accessed on 30 November 2022) combined the above four algorithms to select the most suitable reference genes [62].

Validation of Reference Genes
To verify the stability of the screened reference genes, STR was selected as the target gene and the relative expression levels of STR under different treatments were observed by using the two most stable and the least stable reference genes.The expression patterns of 15 genes related to the upstream biosynthesis pathway of MIAs were also further analyzed under different treatments.The Ct values obtained from the raw data were also calculated based on 2 −∆∆CT .Three biological replicates were performed for each sample.In addition, relevant graphs were produced by using Prism9.0.0 software, TBtools2.012,Adobe Illustrator 2021, and OmicStudio online website (https://www.omicstudio.cn/,accessed on 8 June 2023), while statistical analysis was carried out using SPSS 17.0 software.

Conclusions
The identification of functional reference genes is a prerequisite for qualification of gene expression in specific experimental conditions when using RT-qPCR.In this research, we calculated the gene expression stability of ten candidate reference genes of G. elegans leaves with or without the application of one of four hormone treatments associated with changes in gelsenicine biosynthesis.Our data recommends that two reference genes be used to normalize RT-qPCR data.When leaves were treated with SA, EF1-α and CDC25 should be used as reference genes whereas, UBC and Actin were the best reference genes for the MeJA treated leaves, and Actin and PP2A for ETH treated samples.Actin and SAND were identified to be the most suitable genes when G. elegans leaves were treated with ABA.Our research helps to accurately quantify the biosynthesis pathway-related genes of MIAs in G. elegans and provides a basis for further analyses by RT-qPCR of genes that participate in the complex regulation of the MIA biosynthesis pathway.

Figure 3 .
Figure 3. Distribution of Ct values among the 10 candidate reference genes.Lines shown in the boxplot graph of Ct value display the median values.Lower and upper boxes represent the 25th percentile to the 75th percentile.Whiskers indicate the maximum and minimum values.

Figure 3 .
Figure 3. Distribution of Ct values among the 10 candidate reference genes.Lines shown in the box-plot graph of Ct value display the median values.Lower and upper boxes represent the 25th percentile to the 75th percentile.Whiskers indicate the maximum and minimum values.

Figure 4 .
Figure 4. Average expression stability values (M) of 10 candidate reference genes using GeNorm.

Figure 5 .
Figure 5. Pairwise variation (V) values obtained by GeNorm analysis on 10 candidate reference genes in the six groups.

Figure 4 .
Figure 4. Average expression stability values (M) of 10 candidate reference genes using GeNorm.

Figure 4 .
Figure 4. Average expression stability values (M) of 10 candidate reference genes using GeNorm.

Figure 5 .
Figure 5. Pairwise variation (V) values obtained by GeNorm analysis on 10 candidate reference genes in the six groups.

Figure 5 .
Figure 5. Pairwise variation (V) values obtained by GeNorm analysis on 10 candidate reference genes in the six groups.

Figure 6 .
Figure 6.Validation of the reference gene by the relative expression of the target gene STR in different hormone treatments.Graphs (a-d) are the results of SA, MeJA, ETH and ABA.The (*) indicates p < 0.05, and the (**) indicates p < 0.01.

Figure 6 .
Figure 6.Validation of the reference gene by the relative expression of the target gene STR in different hormone treatments.Graphs (a-d) are the results of SA, MeJA, ETH and ABA.The (*) indicates p < 0.05, and the (**) indicates p < 0.01.

Figure 7 .
Figure 7. (a) The results for the expression patterns of 15 candidate genes related to the upstream pathway of MIA biosynthesis at 0-48 h of SA treatment; (b) the results for the expression patterns of 15 candidate genes related to the upstream pathway of MIA biosynthesis at 0-48 h of MeJA treatment; (c) the results for the expression patterns of 15 candidate genes related to the upstream pathway of MIA biosynthesis at 0-48 h of ETH treatment; (d) the results for the expression patterns of 15 candidate genes related to the upstream pathway of MIA biosynthesis at 0-48 h of ABA treatment.The (*) indicates p < 0.05, and the (**) indicates p < 0.01.

Figure 7 .
Figure 7. (a) The results for the expression patterns of 15 candidate genes related to the upstream pathway of MIA biosynthesis at 0-48 h of SA treatment; (b) the results for the expression patterns of 15 candidate genes related to the upstream pathway of MIA biosynthesis at 0-48 h of MeJA treatment; (c) the results for the expression patterns of 15 candidate genes related to the upstream pathway of MIA biosynthesis at 0-48 h of ETH treatment; (d) the results for the expression patterns of 15 candidate genes related to the upstream pathway of MIA biosynthesis at 0-48 h of ABA treatment.The (*) indicates p < 0.05, and the (**) indicates p < 0.01.

Figure 8 .
Figure 8.(a) The results for the heatmaps of gelsenicine and the 15 upstream biosynthetic pathway genes associated with it under SA treatment; (b) the results for the heatmaps of gelsenicine and the 15 upstream biosynthetic pathway genes associated with it under MeJA treatment; (c) the results for the heatmaps of gelsenicine and the 15 upstream biosynthetic pathway genes associated with it under ETH treatment; (d) the results for the heatmaps of gelsenicine and the 15 upstream biosynthetic pathway genes associated with it under ABA treatment.The darker the colors, the stronger is the correlation.

Figure 8 .
Figure 8.(a) The results for the heatmaps of gelsenicine and the 15 upstream biosynthetic pathway genes associated with it under SA treatment; (b) the results for the heatmaps of gelsenicine and the 15 upstream biosynthetic pathway genes associated with it under MeJA treatment; (c) the results for the heatmaps of gelsenicine and the 15 upstream biosynthetic pathway genes associated with it under ETH treatment; (d) the results for the heatmaps of gelsenicine and the 15 upstream biosynthetic pathway genes associated with it under ABA treatment.The darker the colors, the stronger is the correlation.

Figure 9 .
Figure 9. (a) The results for the correlation analyses between gelsenicine and 15 upstream biosynthetic pathway genes under SA treatment; (b) the results for the correlation analyses between gelsenicine and 15 upstream biosynthetic pathway genes under MeJA treatment; (c) the results for the correlation analyses between gelsenicine and 15 upstream biosynthetic pathway genes under ETH treatment; (d) the results for the correlation analyses between gelsenicine and 15 upstream biosynthetic pathway genes under ABA treatment.Thick lines (weak correlation), thin lines (strong correlation), red and rightward slash (positive correlation), blue and leftward slash (negative correlation) only the more relevance cases are shown with asterisks [43].The darker the colors, the more asterisks, the stronger is the correlation.The (*) indicates p < 0.05, and the (**) indicates p < 0.01.The (***) indicates p < 0.001.

Figure 9 .
Figure 9. (a) The results for the correlation analyses between gelsenicine and 15 upstream biosynthetic pathway genes under SA treatment; (b) the results for the correlation analyses between gelsenicine and 15 upstream biosynthetic pathway genes under MeJA treatment; (c) the results for the correlation analyses between gelsenicine and 15 upstream biosynthetic pathway genes under ETH treatment; (d) the results for the correlation analyses between gelsenicine and 15 upstream biosynthetic pathway genes under ABA treatment.Thick lines (weak correlation), thin lines (strong correlation), red and rightward slash (positive correlation), blue and leftward slash (negative correlation), only the more relevance cases are shown with asterisks [43].The darker the colors, the more asterisks, the stronger is the correlation.The (*) indicates p < 0.05, and the (**) indicates p < 0.01.The (***) indicates p < 0.001.

Figure 10 .
Figure 10.Screening of genes related to the biosynthetic pathway of MIAs based on co-expression correlation analysis, shown here with 8-HGO as an example.

Figure 10 .
Figure 10.Screening of genes related to the biosynthetic pathway of MIAs based on co-expression correlation analysis, shown here with 8-HGO as an example.

Table 1 .
Description and primer information for 10 candidate reference genes.

Table 1 .
Description and primer information for 10 candidate reference genes.

Table 3 .
Most stable and least stable combination of reference genes based on RefFinder.

Table 4 .
Description and primer information for 15 candidate pathway genes.

Table 4 .
Description and primer information for 15 candidate pathway genes.