Identification, In Silico Characterization, and Differential Expression Profiles of Carotenoid, Xanthophyll, Apocarotenoid Biosynthetic Pathways Genes, and Analysis of Carotenoid and Xanthophyll Accumulation in Heracleum moellendorffii Hance

Heracleum moellendorffii Hance is a non-woody forest plant widely used in China, Korea, and Japan because of its various therapeutic properties. However, the genetic details of the carotenoid pathway (CP), xanthophyll pathway (XP), and apocarotenoid pathway (AP) genes have not been studied. Thus, the CP, XP, and AP genes of H. moellendorffii were detected and analyzed. A total of fifteen genes were identified, of which eight, four, and three belonged to CP, XP, and AP, respectively. All identified genes possessed full open reading frames. Phylogenetic characterization of the identified gene sequences showed the highest similarity with other higher plants. Multiple alignments and 3D dimensional structures showed several diverse conserved motifs, such as the carotene-binding motif, dinucleotide-binding motif, and aspartate or glutamate residues. The results of real-time PCR showed that the CP, XP, and AP genes were highly expressed in leaves, followed by the stems and roots. In total, eight different individual carotenoids were identified using HPLC analysis. The highest individual and total carotenoid content were achieved in the leaves, followed by the stems and roots. This study will provide more information on the gene structure of the CP, XP, and AP genes, which may help to increase the accumulation of carotenoids in H. moellendorffii through genetic engineering. These results could be helpful for further molecular and functional studies of CP, XP, and AP genes.


Introduction
Heracleum moellendorffii Hance is a perennial shade herb that belongs to the Apiaceae family and is widely distributed in China, Korea, and Japan [1]. It is mainly grown in forested areas and is commonly used as a wild vegetable because of its high nutritional value and trace elements. H. moellendorffii has long been used to treat rheumatoid arthritis and colds [1,2], and the roots have been used as traditional medicine for healing various inflammatory diseases, including back pain, arthritis, and fever [3,4]. Few studies have

Mining and Sequence Analysis of CP, XP, and AP Genes
The CP, XP, and AP genes of H. moellendorffii were mined from transcriptomic data acquired in our laboratory. The retrieved genes were searched for using BLASTN. The results showed that all retrieved gene sequences had high similarity with other higher plant sequences. To identify the full open reading frame (ORF) in the retrieved sequences, the sequences were analyzed using the ORF finder program in the NCBI database. From the transcriptomic data, nucleotide sequences of different lengths were obtained for each gene, and genes that possessed the highest number of nucleotides and full ORFs were used for structural and functional analyses. A total of 15 full ORFs genes were mined, of which 8 CP (HmGGPS, HmPSY, HmPDS, HmZ-ISO, HmZDS, HmCrtISO, HmL-CYB, and HmLCYE), 4 XP (HmCHXB, HmCHXE, HmVDE, and HmZEP), and 3 AP (Hm-CCD, HmNCED, and HmAO) genes were identified ( Figure 1). The above sequences were submitted to GenBank with the following accession numbers: 8 CP (OM732401, OM732402, OM732403, OM732404, OM732405, OM732406, OM732407, and OM732408), 4 XP (OM732409, OM732410, OM732411, and OM732412), and 3 AP (OM732413, OM732414, and OM732415). Moreover, several physicochemical parameters were analyzed using the free online ProtParam software (Figure 1). The estimated MWs and isoelectric points of the retrieved CP, XP, and AP sequences are shown in Figure 1. The MWs of the CP, XP, and AP genes ranged from 34.11 kDa (HmCHXB) to 150.15 kDa (HmAO). The isoelectric points ranged from 5.87 to 9.14. The predicted MW and isoelectric points were more or less similar to other higher plant species, such as I. dentate [28], Brassica napus [36], L. chinensis [27], C. majus [19], S. baicalensis [31,37], and N. officinale [18]. The highest aliphatic and instability indices of the identified AA sequences were for HmZ-ISO (105. 49) and HmCHXB (51.93), whereas the lowest were observed in HmCCD (74.12) and HmZ-ISO (28.72), respectively. The relatively high aliphatic index values indicated that all enzymes were thermostable over a wide range of temperatures [38]. In addition, instability index analysis showed that HmZDS and HmZEP enzymes were more stable than HmPDS, HmCHXB, and HmCHXE. GRAVY indices of the AA sequence were used to identify the hydrophilicity and hydrophobicity of the protein. The results of the GRAVY index analysis showed that most of the CP, XP, and AP pathway genes possessed negative values, indicating that most of these enzymes were hydrophilic. In contrast, HmZ-ISO was hydrophobic with a positive GRAVY value (0.263). Negative GRAVY index values indicated that these enzymes interacted better with water ( Figure 1). A similar result was obtained with microalgae, indicating that these enzymes are hydrophobic [39]. Signal IP analyses revealed that HmZDS had the highest original shearing site score (C score), followed by HmLCYB, HmCHXE, HmPSY, HmZEP, HmCCD, HmCrtISO, HmZ-ISO, HmAO, HmNCED, HmCHXB, HmVDE, HmPDS, and HmLCYE. HmCHXE had the highest synthesized shearing site score (Y score), followed by HmPSY, HmZDS, HmZEP, HmLCYB, HmCrtISO, HmVDE, HmLCYE, HmZ-ISO, HmPDS, HmCCD, HmNCED, HmCHXB, and HmAO also had the highest signal peptide score (S score), followed by HmCHXE, HmPSY, HmZEP, HmCrtISO, HmVDE, HmZDS, HmLCYE, HmZ-ISO, HmPDS, HmCCD, HmCHXB, HmNCED, HmAO, and HmLCYB (Table S1). None of these genes possesses any signal peptide transmembrane region (Table S1) or cleavage site (data not shown). In several other higher plants such as Brassica napus [40], Musa acuminate [41], C. majus [19], Triticum aestivum [42], N. officinale [18], and the marine green algae Tetraselmis suecica [15], similar results regarding the signal peptide transmembrane regions in the CP genes were reported. In addition, transmembrane helices were predicted for all enzymes using online free TMHMM software. The results showed that the enzymes HmZ-ISO, HmCHXB, and HmLCYE consisted of six, three, and one transmembrane helices, respectively, whereas none of the other enzymes contained any transmembrane helices (Table S1). This result was in agreement with the experimental study of D. salina, which reported that ZDS is not a potential membrane protein [43]. Subsequently, NetNGlyc, PSIPRED, and radar tools were used to predict the N-glycosylation sites, disordered regions, and internal repeats of the CP, XP, and AP proteins (Table S2). In addition, secondary structure analysis of CP, XP, and AP genes showed that most of the genes were rich in alpha-helices. In contrast, HmCrtISO, HmZEP, HmVDE, HmCCD, HmNCED, and HmAO were rich in random coils, indicating that these proteins might be essential for flexibility and conformational changes during catalysis (Table S2) [44]. Narang et al. [39] also reported that most CP genes in microalgae contained a higher percentage of alpha-helices than extended strands and random coils. These results support the sequence analysis results that the CP, XP, and AP genes possess highly conserved AA sequences in higher plants and green algae.
Structural analyses of the AA sequences are necessary to identify the residues that play an important role in the catalytic function of the enzymes and are the main focus for the development of genetically engineered strains that can accumulate higher carotenoid content. Therefore, we identified the conserved motifs of the CP, XP, and AP genes using the MEME software, and the resulting sequence motif is shown in Figures S1 and S2. Motif analysis of HmGGPS and HmPSY enzymeS showed that they consist of conserved Asp-rich motifs (DDILD) and DELVD motifs, respectively ( Figure S1). In CHXE, the fourth variable position of the pentapeptide consensus sequence (ATIXT) is a histidine residue in all plants and algae [39]. A similar result was obtained for the HmCHXE enzyme ( Figure S1). For the ZEP enzyme, the conserved motif YFVXSD was found in a distinct organism, and the fourth position was replaced by aspartic acid, serine, valine, or threonine. However, in HmZEP, it is occupied by serine ( Figure S1). The predicted dinucleotide FAD/NAD-binding domain (DX4GXG) was found in LCYB, whereas HmLCYB possesses a DLAVVGGG motif in their AA sequence. A recent study reported that they identified a few novel conserved motifs, such as RXXRHPQ, TPXINXGMV, and NNYGVW in Z-ISO, CrtISO, and LCYE sequences, in algae [39]. Similar conserved genes have been identified in H. moellendorffii ( Figure S1). These conserved domains were present in various microalgae and plant AA sequences, indicating that these enzymes play vital roles in their biological function, substrate-binding activity, catalytic activity, and structural maintenance [39,47]. Hence, in the algae and plant kingdoms, the AA sequences of CP, XP, and AP are highly conserved [15,18,19,41,48].

Phylogenetic Tree and Percent Identity Matrix Analysis
Phylogenetic tree analysis between H. moellendorffii and other CP, XP, and AP genes was performed using the neighbor-joining method. The phylogenetic tree showed that it formed a cluster with other higher plants, whereas bacteria, chlorophytes, heterokonts, and dinoflagellates formed a separate cluster ( Figure S2). Similar results have been obtained for several other plants [18,19,37,[49][50][51]. The identity matrix results of the AA sequence showed that it shared the highest percentage of sequence identity with higher plants, especially Daucus carota. In addition, other species, such as bacteria, chlorophytes, dinoflagellates, and heterokonts showed a lower percentage of sequence identity with H. moellendorffii (Table S3). Previously, several higher plant studies, such as C. majus, S. baicalensis, I. dentate, and N. officinale, showed higher similarity with other higher plant AA sequences of CP, XP, and AP [18,19,28,29,37]. These results showed that the CP, XP, and AP genes in higher plants are highly conserved and may share the highest percentage of sequence identity with other higher plants.
In addition, we constructed a concatenated phylogenetic tree based on the 15 AA sequences of the CP, XP, and AP genes. The phylogenetic results showed that these 16 higher plant species share a high bootstrap value with other higher plants, indicating that the CP, XP, and AP genes are highly conserved ( Figure 3). In addition, these 15 CP, XP, and AP genes in H. moellendorffii share a similar putative conserved sequence; therefore, the function of these genes was also similar to those of other higher plants, especially D. carota ( Figure 3). Similarly, the pairwise identity matrix of all AA sequences showed the highest percentage sequence identity with D. carota ( Figure 3). This result was in agreement with a previous study, in which concatenated phylogenetic tree analysis and percentage identity matrix analysis of 11 N. officinale CP genes formed a cluster and the highest percentage identity matrix with A. thaliana [18]. These results indicate that all CP, XP, and AP genes share highly conserved sequences in higher plants.  Figure S2 and Table S3. Arabidopsis thaliana-A. thaliana, Brassica napus-B. napus, Brassica oleracea-B. oleracea, Brassica rapa-B. rapa, Camelina sativa-C. sativa, Capsella rubella-C. rubella, Citrus sinensis-C. sinensis-1, Eutrema salsugineum-E. salsugineum, Jatropha curcas-J. curcas, Nicotiana tabacum-N. tabacum, Pistacia vera-P. vera, Raphanus sativus-R. sativus, Solanum lycopersicum-S. lycopersicum, Spinacia oleracea-S. oleracea. Accession numbers of the amino acid sequences are shown in Table S4.

Multiple Alignment and Protein Tertiary Structure Analysis of CP, XP, and AP Genes
Multiple alignments and protein tertiary structure (3D) prediction of H. moellendorffii CP, XP, and AP proteins revealed conserved domains similar to those in other higher plants [18,19,28,32,[51][52][53] and microalgae [45,[54][55][56] (Figures 4-6 and S3). Although there were few changes in AA, the protein's function is primarily based on its 3D structure and stability [57]. The 3D structures of AA sequences exhibited similar substrate-binding pockets and configurations of α and β secondary structural elements as those of C. reinhardtii, A. thaliana, C. majus, D. salina, and N. officinale (data are not shown). However, slight structural differences were observed in the variable loop regions of the CP, XP, and AP protein models, which could be due to their comparatively low sequence similarities [58]. These results support the multiple alignment and percentage identity results of this study ( Figure S3 and Table S3).
The predicted 3D structure of H. moellendorffii CP, XP, and AP genes possesses a hydrophobic substrate-binding pocket at the center, formed by the folding of α-helices and β-sheet strands. The substrate-binding pocket present in the structure was almost hidden within the core of the α-helices. Other domains, such as the aspartate-rich domain (ARD), dinucleotide-binding domain (DBD), and carotene-binding domain (CBD) were located near the cavity, which might play an important role in enzymatic activity [45]. Briefly, the key CP enzyme HmPSY possesses a conserved ARD and trans-isoprenyl diphosphate synthase motif in their sequence ( Figure 4 and Figure S3). This result is in agreement with previous studies showing that these conserved domains are also present in higher plants (C. majus, I. dentate, N. officinale, and S. baicalensis) [18,19,28,31]. The second step in CP is the chemical reaction catalyzed by HmPDS, which possesses both CBD and DBD in its sequence. The third enzyme in the CP is CmZDS which shares structural features similar to those of HmPDS, consisting of a DBD and CBD at the N-and C-terminal regions, respectively. Similar structural features were found in the PDS and ZDS amino acid sequences of other higher plants (Carica papaya, I. dentate, C. majus, S. baicalensis, and N. officinale) and microalgae (D. salina) [18,19,28,31,56,59]. Both HmLCYB and HmLCYE contain a DBD in their AA sequences (Figure 4 and Figure S3). This conserved domain binds flavin adenine dinucleotides (FAD) and is found in all lycopene cyclases. Moreover, plant-type cyclases specifically consist of a plant β-conserved domain in its AA sequence, whereas it is absent in bacteria, and this β-conserved domain might play a significant role in the interaction between the components associated with membrane-bound enzymes and cyclases [60]. In addition, cyclase motif 1, cyclase motif 2, and charged regions are also present in lycopene cyclases, which are involved in the catalytic role of substrate-binding interactions [24]. Similar conserved motifs were identified in some higher plants (C. majus, Arabidopsis, Capsicum annuum, and N. officinale) and green microalgae (Haematococcus pluvialis) [18,19,24,60,61].
The most common genes in CP and XP were HmCHXB and HmCHXE, respectively. HmCHXB possesses four histidine residues in its structure, which may be helpful for the binding of Fe 2+ ions during hydroxylation [28,63]. The VDE gene in C. sativus and C. majus consists of three conserved domains: lipocalin, Glu-rich, and Cys-rich motifs [19,51]. The central lipocalin domain serves as an attachment site for the hydrophobic V substrate [64]. In addition, a high number of glutamate residues and a transit peptide that targets the protein to chloroplasts were found in the C-and N-terminal regions, respectively [64,65]. HmVDE consisted of similar conserved protein domains in their AA sequences in this study (Figures 5 and S3). HmZEP possessed various phosphopeptide-binding sites, two short motifs of lipocalin family proteins, one forkhead-associated binding motif, and one FADbinding motif in its AA sequence. Similar motifs have been identified in the AA sequences of some higher plants (C. majus, I. dentate, S. baicalensis, and N. officinale) [18,19,28,31].
Among XP genes, HmCCD and HmNCED consisted of four highly conserved histidine residues in their AA sequences (Figures 6 and S3), similar to the CCD and NCED structures of C. majus, N. officinale, and citrus plants [18,19,66]. Previous studies indicated that these highly conserved histidine moieties aid in coordinating the Fe 2+ cofactor required for the activity or the aspartate or glutamate moieties that help fix the histidine positions [67][68][69]. In Pisum sativum and C. majus, the AO gene consists of an FAD-binding motif, a molybdenum cofactor (Moco) binding motif, and a consensus sequence for two iron-sulfur centers [19,70]. In this study, HmAO also consisted of similar conserved domains (Figures 6 and S3). Multiple alignments and 3D structural analysis indicated that most of the H. moellendorffii CP, XP, and AP genes have highly conserved sequences. Therefore, these genes are closely related to those from other higher plants and algae. However, further studies are needed to identify the functions of the H. moellendorffii CP, XP, and AP genes.

In Silico Subcellular Localization Prediction of CP, XP, and AP Genes
The subcellular localization prediction of H. moellendorffii CP, XP, and AP amino acid sequences were analyzed using free online programs such as CELLO2GO, DeepLOC-1.0, Plant-PLoc, WoLF PSORT, and TargetP 1.1. All CP, XP, and AP genes were directed to the chloroplast, whereas some of these genes were also targeted to different organelles, such as the cytoplasm, cell membrane, endoplasmic reticulum, nucleus, mitochondrion, thylakoid membrane, and plasma membrane (Table 1). Several previous studies in higher plants (A. thaliana, C. majus, transgenic Ipomoea batatas, and N. officinale) and microalgae have reported that most of the CP, XP, and AP genes are targeted to the chloroplast [18,19,46,71,72], confirming that methylerythritol 4-phosphate/1-deoxy-D-xylulose 5-phosphate (MEP/DOXP) and CP occur in the chloroplast/plastid of higher plants and algae [39,73]. Therefore, this result confirmed that all CP, XP, and AP proteins in H. moellendorffii share highly conserved sequences with those in higher plants and algae; therefore, the prediction of subcellular localization analysis of all these AA sequences showed similar results.  [62]. The amino (NH 2 ), carboxyl (COOH) termini, α-helices, and β-strands are shown in blue, red, green, and pink, respectively. Multiple alignments of each XP gene are shown in Figure S3.

Differential Expression of CP, XP, and AP Genes in Different Parts of H. moellendorffii
The results of qRT-PCR showed that the CP, XP, and AP genes were highly expressed in the leaves of H. moellendorffii, whereas none of these genes were significantly expressed in the roots. The highest gene expression was achieved in HmCCD and HmNCED, whereas the lowest expression level was achieved with HmPDS ( Figure 7). All the CP genes (HmPSY, HmPDS, HmZ-ISO, HmCrtISO, HmLCYB, and HmLCYE) were significantly expressed in leaves. The expression level of HmPSY was elevated in leaves, which was 980.39-and 1.21-times higher than that in the roots and stems, respectively. In addition, most XP genes, such as HmCHXB, HmCHXE, HmZEP, and HmVDE, were highly upregulated in leaves compared to stems and roots. In addition, the HmVDE expression level was elevated in leaves, which was 333.33-and 2.0-times higher than that in roots and stems, respectively. The expression level of HmCCD was significantly higher in the leaves than in the stems. In contrast, HmNCED had the highest expression in the stem, which was 71.0-and 1.42-times higher than that in roots and leaves, respectively (Figure 7). HmAO was significantly expressed in leaves, which was 3.85-and 50-times higher than that in the stem and root, respectively. Previous studies reported that the CP, XP, and AP genes were significantly expressed in the leaves and flowers of plants, such as C. majus [19], Brassica rapa [74], and N. officinale [18], when compared with different parts of the plants. In D. carota, the expression level of CP, XP, and AP genes was significantly expressed in the leaves compared to that of the roots [75]. A similar result was obtained in this study, in that most of the CP, XP, and AP genes were highly expressed in leaves. This indicates that most gene sequences are highly conserved, and all these genes have roles similar to their orthologs in other species. In Arabidopsis, it has been reported that genes involved in CP, such as AtPSY, AtPDS, AtZDS, and AtZEP play important roles [76]. PSY is the rate-limiting enzyme for CP, resulting in the highest accumulation of individual and total carotenoids in plants [9]. The results showed that HmPSY was involved in controlling carotenoid flux in the leaves of H. moellendorffii. In addition, enzymes such as PSY, PDS, and ZDS genes are facilitated by light, which results in carotenoid accumulation in carrot leaves [75]. A similar result was obtained in the present study due to the exposure of leaves and stems to light; the expression levels of HmPSY, HmPDS, and HmZDS were higher, which led to the highest accumulation of carotenoids in leaves and stems. In addition, the roots showed the lowest expression level and carotenoid accumulation when compared to leaves and stems due to the lack of exposure to light. Moreover, β-carotene degradation genes such as HmZEP, HmVDE, HmAO, and HmCCD4 were highly expressed in leaves compared to roots. A previous study reported that the leaf-specific enzyme CCD4 is an important enzyme involved in apocarotenoid glycoside formation [77]. In this study, the expression of HmCCD4 in H. moellendorffii leaves might be due to the association with apocarotenoid glycoside formation in the leaves; however, the exact functions need to be studied in the future. This study showed that, except for HmNCED, all other genes were highly expressed in leaves. This expression analysis of CP, XP, and AP genes will provide immense knowledge in the future to perform molecular studies on H. moellendorffii to enhance carotenoid accumulation through genetic engineering approaches. Expression levels of CP, XP, and AP genes were analyzed in different parts such as leaf, stem, and root of H. moellendorffii using qRT-PCR. Transcriptional levels in leaves were set as a control (the value of 1) to calculate the relative gene expression in other tissues. The relative gene expression levels of CP, XP, and AP genes were normalized to α-tubulin as a reference gene. Results are given as the means of three replicates ± standard deviation. Different alphabetical letters a-c denote significant differences (p < 0.05).

Carotenoid and Xanthophyll Content in Different Organs of H. moellendorffii
Eight different carotenoids were identified in different organs of H. moellendorffii using high-performance liquid chromatography (HPLC) (Figure 8). The total carotenoid level varied between 0.73 and 1668.20 µg/g of dry weight (DW). Among the different organs, the highest total carotenoid level was found in leaves (1668.20 µg/g DW), 2285.21and 16.61-times higher than that in the roots and stems, respectively. All eight individual carotenoids were significantly higher in leaves (Figure 8). Specifically, E-β-carotene, lutein, 9Z-β-carotene, and 13Z-β-carotene levels were considerably higher in leaves than in other organs.  Among the carotenoids, E-β-carotene, lutein, and 9Z-β-carotene were detected in all the plant organs. Specifically, the E-β-carotene level was higher in leaves, which was 2112.0and 18.11-times higher than that in roots and stems, respectively. The lutein content was also higher in leaves than in stems and roots. In addition, 9Z-β-carotene accumulation was the highest in leaves, and it was 1400.11-and 19.03-times higher than that in roots and stems, respectively. Antheraxanthin, violaxanthin, zeaxanthin, and β-cryptoxanthin were detected only in leaves. The 13-Z-β-carotene was considerably high in leaves (128.70 µg/g DW), followed by that in stems (7.45 µg/g DW), and it was not detected in stem roots. Violaxanthin, antheraxanthin, zeaxanthin, and β-cryptoxanthin levels were considerably higher in the leaves than in the stems and roots. Among the specific carotenoids, β-cryptoxanthin, antheraxanthin, and violaxanthin showed the lowest accumulation in different organs of H. moellendorffii (Figure 8). These findings are in agreement with those of previous studies of Allium sativum [78], C. majus [19,79], N. officinale [18], B. rapa [30], D. carota [18], and M. charantia [78,80,81]. In this study, the CP, AP, and XP genes were highly expressed in the leaves, which led to the highest accumulation of carotenoids in the leaves (Figure 9). A previous study reported that in red and yellow carrots, the accumulation of main carotenoids and total carotenoids might be due to the participation of transcriptional levels of genes that lead to CP [82]. In addition, CHXB overexpression regulated α-carotene and total carotenoid levels in orange carrots [83]. The carotenoid genomic database of H. moellendorffii may be a useful tool for researchers to create a model to enhance the accumulation of carotenoids in different plant parts.

Plant Growth
Heracleum moellendorffii seeds were obtained from Aram Seed Co., Ltd., Seoul, Korea, and grown in the greenhouse of Chungnam National University, Daejeon, Korea. The seeds were sown in a plastic pot containing perlite and grown in a greenhouse for three months. Plants were watered every second day. Various plant parts such as the leaves, stems, and roots were harvested after three months (April 2021-July 2021), frozen in liquid nitrogen, and immediately stored at −80 • C. Each sample was harvested in triplicate.

Mining and Sequence Analysis of CP, XP, and AP Genes
CP, XP, and AP gene sequences were retrieved from H. moellendorffii transcriptome data acquired in our laboratory. The identified sequences were analyzed using Basic Local Alignment Search Tool (BLAST). Consequently, the gene sequences were also examined using Conserved Domain Database (CCD) [84] and PFAM [85] on the National Center for Biotechnology Information (NCBI) GenBank database to identify the signature motifs present in the gene sequences. The theoretical pI (isoelectric point)/molecular weight (MW), aliphatic index, instability index, and grand average of hydropathicity (GRAVY) values were estimated using the ExPASy ProtParam platform [86]. The in silico subcellular locations of the CP, XP, and AP proteins were predicted using free online software, such as TargetP 1.1 [87], CELLO2GO [88], WoLF PSORT [89], DeepLOC-1.0 [90], and Plant-PLoc [91]. Signal peptide, secondary structure analyses, and the disordered region of proteins were determined using the SignalP 4.0 server [92], SOPMA program [93], and PSIPRED tool [94], respectively. The total number of transmembrane helices present in the CP, XP, and AP genes was predicted using the TMHMM-2.0 server [95]. Moreover, the internal repeats and N-glycosylation sites in the CP, XP, and AP genes were identified using the RADAR tool [96] and NetNglyc 1.0 server [97], respectively.

Multiple Alignment and Tertiary Structural Analysis of CP, XP, and AP Genes
The BioEdit 7.2.5 program [98] was used to perform the multiple alignment of the identified sequences. Tertiary structural analysis of the CP, XP, and AP protein sequences was performed using the Chimera 1.14 software [62]. The MEME tool [99] was used to detect the conserved motifs.

Phylogenetic Tree Construction and Percent Identity Matrix
For phylogenetic analysis, all the identified sequences were created using MEGA 7.0 [100]. A phylogenetic tree was constructed using the neighbor-joining (NJ) algorithm with 1000 bootstrap replicates [101,102] The percent identities of the CP, XP, and AP amino acid sequences were estimated using the free online software Clustal Omega [103], and pairwise sequence alignment [104] was used to calculate the identities of the sequences.

RNA Isolation and cDNA Synthesis
The leaves, stems, and roots of H. moellendorffii were used for total RNA extraction. Each sample was crushed into a fine powder with liquid nitrogen using a mortar and pestle. Each sample (100 mg) was aliquoted into new microcentrifuge tubes. The Plant Total RNA Mini Kit (Geneaid, Taiwan) was used for extraction according to the manufacturer's instructions. RNA quality and concentration were estimated using a NanoVue Plus spectrophotometer (GE Healthcare Life Sciences, MA, USA). cDNA was then synthesized using a ReverTra Ace-α-kit (Toyobo Co. Ltd., Osaka, Japan), according to the manufacturer's instructions. The cDNA was diluted 20-fold with sterile RNase-free water for analysis.

Genes Expression Analysis
The primers, listed in Table S5, were designed using Gene Runner 5.0 software (http:// www.generunner.net accessed on 22 October 2021). The house-keeping gene α-tubulin was used as an internal control to calculate the relative gene expression levels. The quantitative reverse transcription-polymerase chain reaction (qRT-PCR) was performed according to the protocol described by Sathasivam et al. [18,19]. All reactions were performed in triplicate.

Extractions of Carotenoid and HPLC Analysis
Extraction and analysis of carotenoids were performed according to the procedure described by Sathasivam et al. [18,19], with slight modifications. Finely ground samples (300 mg) were collected, and 3 mL of 0.1% ascorbic acid (w/v) in ethanol was added. The samples were immediately placed on ice for 5 min to terminate the reaction. After termination, 0.05 mL of β-apo-8 -carotenal (internal standard) and 1.5 mL of ice-cold water were added to the mixture. Subsequently, 1.5 mL hexane was added, and the sample was centrifuged at 4 • C for 5 min at 12,000 rpm. After centrifugation, the mixture was dried under nitrogen and liquefied in 250 µL of CH 2 Cl 2 /MeOH (50:50 v/v). These extracts were filter-sterilized in glass screw-cap vials. The carotenoids were analyzed by using an Agilent Technologies 1100 Series HPLC system (Palo Alto, CA, USA) that consisted of a C30 YMC column (250 × 4.6 mm, 3 m; Waters Corporation, Milford, MA, USA), a degasser, and a PDA detector. The mobile phase and gradient program were as follows: solvent A, methanol/water (92:8 v/v) with 10 mM ammonium acetate; solvent B, methyl tert-butylether; 0 min, 90% A; 20 min, 83% A; 29 min, 75% A; 35 min, 30% A; 40 min, 30% A; 42 min, 25% A; 45 min, 90% A; 55 min, 90% A (Total 50 min). The separation of carotenoid compounds was detected at 450 nm at 40 • C with the flow rate of 1.0 mL/min, and the injection volume was 10 µL. The carotenoid content of each part was quantified with reference to a corresponding calibration curve.

Statistical Analysis
The results are expressed as the mean ± standard deviation of three replicates. Data were evaluated by one-way analysis of variance (ANOVA) with Duncan's multiple range tests (DMRT) to compare means. p-values < 0.05, were measured as statistically significant using SPSS version 26 (SPSS, Chicago, IL, USA).

Conclusions
In conclusion, we have studied the genes involved in CP, XP, and AP because of their importance in plant metabolism and function, such as physiology, ecology, development, and evolution. Recently, researchers have focused on carotenoid accumulation at various regulatory levels. Therefore, mining of CP, XP, and AP genes from the available transcriptomic data, in silico characterization of the mined genes using free online bioinformatics software, and understanding the expression levels of CP, XP, and AP genes will help in finding the underlying connection between the transcriptomic and metabolomic profiles. In addition, mining of CP, XP, and AP genes and analyzing their 3D structure will be useful for researchers in gene manipulation, potential gene engineering, and transformation of multiple CP, XP, and AP genes into the desired host to enhance carotenoid biosynthesis and enrich the desired or novel carotenoid products in stable crops. These structural analyses, gene expression profiles of CP, XP, and AP genes, and carotenoid accumulation in different parts of the plant will increase our knowledge regarding the accumulation of carotenoids at the molecular level in H. moellendorffii. This information might be a useful resource for altering genes through genetic engineering to enhance the carotenoid content of H. moellendorffii. However, further studies are needed to identify new H. moellendorffii CP, XP, and AP genes using a genome-wide approach which will aid in identifying several homologous genes, gene families, and alleles.