Metabolome and Transcriptome Association Analysis Reveals Dynamic Regulation of Purine Metabolism and Flavonoid Synthesis in Transdifferentiation during Somatic Embryogenesis in Cotton

Plant regeneration via somatic embryogenesis (SE) is a key step during genetic engineering. In the current study, integrated widely targeted metabolomics and RNA sequencing were performed to investigate the dynamic metabolic and transcriptional profiling of cotton SE. Our data revealed that a total of 581 metabolites were present in nonembryogenic staged calli (NEC), primary embryogenic calli (PEC), and initiation staged globular embryos (GE). Of the differentially accumulated metabolites (DAMs), nucleotides, and lipids were specifically accumulated during embryogenic differentiation, whereas flavones and hydroxycinnamoyl derivatives were accumulated during somatic embryo development. Additionally, metabolites related to purine metabolism were significantly enriched in PEC vs. NEC, whereas in GE vs. PEC, DAMs were remarkably associated with flavonoid biosynthesis. An association analysis of the metabolome and transcriptome data indicated that purine metabolism and flavonoid biosynthesis were co-mapped based on the Kyoto encyclopedia of genes and genomes (KEGG) database. Moreover, purine metabolism-related genes associated with signal recognition, transcription, stress, and lipid binding were significantly upregulated. Moreover, several classic somatic embryogenesis (SE) genes were highly correlated with their corresponding metabolites that were involved in purine metabolism and flavonoid biosynthesis. The current study identified a series of potential metabolites and corresponding genes responsible for SE transdifferentiation, which provides a valuable foundation for a deeper understanding of the regulatory mechanisms underlying cell totipotency at the molecular and biochemical levels.


Introduction
Somatic embryogenesis (SE) plays a crucial role in the genetic transformation of plants and in their in vitro rapid propagation. SE refers to the process of transformation from the somatic to the embryogenic state and is a unique phenomenon similar to what occurs during the development of zygotic embryos. The SE process can be artificially controlled in vitro conditions. It is also a classic example of cell totipotency. The underlying mechanisms of SE are significant for revealing important scientific theoretical problems that involve cell development, differentiation, and morphogenesis [1][2][3][4].
Nic-Can et al. [5] reported that SE is a complex process of molecular regulation in which somatic cells acquire totipotency to transform into embryonic cells. How does a somatic cell become a whole plant? This question has been reported to be one of the 25 questions most important within the scientific community [6]. However, the regulatory networks that are involved in the transition from the nonembryogenic staged callus (NEC) to the somatic embryo during SE remain poorly understood [7]. In addition to environmental factors, external stimuli and hormones, many specifically expressed genes also participate in the SE process and play a decisive role [8][9][10][11]. Therefore, it is particularly important to identify and isolate key genes that regulate SE. Candidate genes that may regulate SE were identified using bioinformatics tools [12]. The results showed that auxin response factor (ARF) [11,13,14], leafy cotyledon (LEC) [10,15], Wuschel (WUS) [16][17][18], somatic embryogenesis receptor kinase (SERK) [8], shoot meristemless (STM), and baby boom (BBM) [10,19,20] were involved in SE. In addition, the salicylic acid (SA) and jasmonic acid (JA) signaling pathways were also predicted to regulate SE [12,[21][22][23][24][25][26][27].
During the process of cotton SE, Hu et al. [28] found that interference of the high-mobility group box 3 (GhHmgb3) gene enhanced the proliferation and differentiation of embryogenic callus. Poon et al. [29] found that embryogenic callus could secrete a specific AGP protein (GhPLA1), which was extracted and added to the medium to enhance SE. Min et al. [7] showed that the process of SE was negatively regulated by a casein kinase gene (GhCKI) via a complex regulatory network. Several differentially expressed small RNAs and their target genes were identified via small RNA sequencing and degradation sequencing, which indicated that the SE process in cotton was regulated by complex gene expression networks [30]. Genes related to stress, such as SERK1, abscisic aldehyde synthesis enzyme 2 (ABA2), abscisic acid insensitive 3 (ABI3), jasmonate ZIM-domain 1 (JAZ1), late embriogenesis abundant protein 1 (LEA1), and transcription factors (NACs, WRKYs, MYBs, ERFs, Zinc finger family proteins) were also involved in callus induction [31][32][33][34][35][36][37]. Transcriptome sequencing was carried out during callus dedifferentiation and redifferentiation. Auxin and stress response molecules were found to be involved in SE in cotton [38]. Xu et al. [39] also found that plant growth regulators played an important regulatory role in SE.
In recent years, metabolomics, which involves the collection of all low molecular weight metabolites that are present in a certain organism, cell, or tissue during a specific physiological period, has attracted a great deal of attention. It mainly involves the examination of endogenous small molecules with molecular weights less than 1000 Da. Plants can produce 200,000 to 1 million kinds of metabolites, which can be divided into primary metabolites and secondary metabolites [40].
Metabolites are the ultimate result of gene transcription and protein expression in organisms that are under the influence of internal and external factors. They form a bridge between genes and phenotypes and can directly reflect the physiological phenomena within plants. At the same time, metabolites can regulate gene transcription and protein expression. As the final products of cellular regulatory processes, metabolites represent the building blocks of macromolecules and are also an essential component of cellular energy pathways [41]. Metabolomic technologies enable the examination and identification of endogenous biochemical reaction products and thereby reveal information about the metabolic pathways and processes occurring within a living cell [42].
Metabolomics data can provide a wealth of information about the biochemical status of tissues, and the interpretation of such data offers an effective approach that can be used for the functional characterization of genes [43][44][45]. Transcriptomic and proteomic studies are only able to predict changes at the gene expression and protein levels, respectively, while metabolomics studies investigate the changes in functioning exhibited by these genes or proteins [46]. Compared with the genome, transcriptome, or proteome, the metabolome more accurately reflects the phenotype of the organism, and the minor changes in the genome and proteome can be reflected and amplified by the metabolome [47]. The applicability of metabolomics to the SE process has been demonstrated in many plant genera [48][49][50][51][52].
The development of 'omics' technology has allowed for comprehensive analysis of the SE process at the transcript, protein, and metabolite levels [50,52]. The combination of modern omics technologies such as transcriptomics and metabolomics provides a great opportunity to acquire a deeper understanding of the mechanisms of cell totipotency at the molecular and biochemical levels [52]. To date, there have been few studies that have used transcriptomics and metabolomics techniques to identify potential key factors involved in the SE process [48,49,51,52]. Additionally, the relationship between callus tissues and the media used to culture them is not well understood at the biochemical level in terms of nutrient uptake by callus from the medium or release of chemicals into the medium [52]. Integrated metabolomic and transcriptomic network analyses can elucidate the functioning of a series of secondary metabolites along with changes in content, as well as the corresponding differentially expressed genes, which can broaden the global view of SE regulation [52].
In this study, we comparatively investigated the dynamic metabolomic and transcriptomic profiles of two transdifferentiation processes, embryogenic differentiation and somatic embryo development, during the SE process in cotton. By interactively comparing metabolomic and transcriptomic data, we were able to identify potential metabolites and the corresponding differentially expressed genes at the molecular and biochemical levels, as well as draw a comprehensive picture of the events underpinning SE development to provide a valuable foundation for uncovering the regulatory mechanisms underlying plant cell totipotency.

Transdifferentiation Staged Cultures Derived from Cotton Somatic Embryogenesis
We established an efficient cotton SE system and obtained cultures from different developmental stages, including nonembryogenic staged calli (NEC), primary embryogenic calli (PEC), and initiation staged embryos with globular-like enriched (GE) based on our previous approach as published recently [53] (Figure 1). These representative staged samples would be highly enriched and collected to establish the metabolic and transcriptional profiles. metabolome [47]. The applicability of metabolomics to the SE process has been demonstrated in many plant genera [48][49][50][51][52]. The development of 'omics' technology has allowed for comprehensive analysis of the SE process at the transcript, protein, and metabolite levels [50,52]. The combination of modern omics technologies such as transcriptomics and metabolomics provides a great opportunity to acquire a deeper understanding of the mechanisms of cell totipotency at the molecular and biochemical levels [52]. To date, there have been few studies that have used transcriptomics and metabolomics techniques to identify potential key factors involved in the SE process [48,49,51,52]. Additionally, the relationship between callus tissues and the media used to culture them is not well understood at the biochemical level in terms of nutrient uptake by callus from the medium or release of chemicals into the medium [52]. Integrated metabolomic and transcriptomic network analyses can elucidate the functioning of a series of secondary metabolites along with changes in content, as well as the corresponding differentially expressed genes, which can broaden the global view of SE regulation [52].
In this study, we comparatively investigated the dynamic metabolomic and transcriptomic profiles of two transdifferentiation processes, embryogenic differentiation and somatic embryo development, during the SE process in cotton. By interactively comparing metabolomic and transcriptomic data, we were able to identify potential metabolites and the corresponding differentially expressed genes at the molecular and biochemical levels, as well as draw a comprehensive picture of the events underpinning SE development to provide a valuable foundation for uncovering the regulatory mechanisms underlying plant cell totipotency.

Transdifferentiation Staged Cultures Derived from Cotton Somatic Embryogenesis
We established an efficient cotton SE system and obtained cultures from different developmental stages, including nonembryogenic staged calli (NEC), primary embryogenic calli (PEC), and initiation staged embryos with globular-like enriched (GE) based on our previous approach as published recently [53] (Figure 1). These representative staged samples would be highly enriched and collected to establish the metabolic and transcriptional profiles.

UPLC-MS/MS-Based Quantitative Metabolomic Analysis and Overall Metabolite Identification
Ultra-performance liquid chromatography (UPLC) and tandem mass spectrometry (MS/MS) were conducted to assess the dynamic metabolite changes in NEC, PEC, and GE in cotton. During the process of instrumental analysis, one quality control (QC) sample was injected after every ten samples to monitor the repeatability of the analysis process. The repeatability of metabolite

UPLC-MS/MS-Based Quantitative Metabolomic Analysis and Overall Metabolite Identification
Ultra-performance liquid chromatography (UPLC) and tandem mass spectrometry (MS/MS) were conducted to assess the dynamic metabolite changes in NEC, PEC, and GE in cotton. During the process of instrumental analysis, one quality control (QC) sample was injected after every ten samples to monitor the repeatability of the analysis process. The repeatability of metabolite extraction and detection can be judged using overlapping analysis of the total ion current (TIC) in the different QC samples ( Figure 2). The results showed the overlap in the TIC curves during metabolite detection. The retention times and peak intensities were consistent, which indicated that the signal was stable when an identical sample was detected at a different time. The instrumental stability provided an important guarantee of the repeatability and reliability of our metabonomic data. extraction and detection can be judged using overlapping analysis of the total ion current (TIC) in the different QC samples ( Figure 2). The results showed the overlap in the TIC curves during metabolite detection. The retention times and peak intensities were consistent, which indicated that the signal was stable when an identical sample was detected at a different time. The instrumental stability provided an important guarantee of the repeatability and reliability of our metabonomic data. The determination of Pearson's correlation coefficient reflected the repeatability among the intragroup samples ( Figure 3a). Principal component analysis (PCA) was used to determine the rate of contribution of the first two primary components (67.49%). The three period materials were obviously separated, and each formed a cluster ( Figure 3b). These results suggested that there was sufficient reproducibility of the materials, which made them suitable for the following qualitative and quantitative analyses.
After quality validation, a total of 581 metabolites with known structures were identified in NEC, PEC and GE, each of which was analyzed using three biological replicates (Table S1). Of the 581 metabolites, amino acids (15%), flavones (15%), organic acids (12%), lipids (11%), and nucleotides (10%) accounted for a large proportion (Figure 3c). Detailed information about the identified metabolites, including the compounds, classes, molecular weights (Da), ionization models, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and quantities for each of the three periods is shown in Table S1. The determination of Pearson's correlation coefficient reflected the repeatability among the intragroup samples ( Figure 3a). Principal component analysis (PCA) was used to determine the rate of contribution of the first two primary components (67.49%). The three period materials were obviously separated, and each formed a cluster ( Figure 3b). These results suggested that there was sufficient reproducibility of the materials, which made them suitable for the following qualitative and quantitative analyses.
After quality validation, a total of 581 metabolites with known structures were identified in NEC, PEC and GE, each of which was analyzed using three biological replicates (Table S1). Of the 581 metabolites, amino acids (15%), flavones (15%), organic acids (12%), lipids (11%), and nucleotides (10%) accounted for a large proportion (Figure 3c). Detailed information about the identified metabolites, including the compounds, classes, molecular weights (Da), ionization models, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and quantities for each of the three periods is shown in Table S1.

Identification of Differentially Accumulated Metabolites (DAMs)
Differentially accumulated metabolites (DAMs) were defined as those exhibiting a fold change ≥2 or a fold change ≤0.5 and a variable importance in project (VIP) ≥1 between PEC versus NEC, GE versus PEC, or GE versus NEC (p < 0.05). In total, 156, 139, and 159 DAMs were identified, respectively (Table 1 and Table S2). For PEC vs. NEC, 124 of 156 (79.5%) were upregulated and 32 of 156 (20.5%) were downregulated. Of the 139 DAMs identified between GE and PEC, 86 (61.9%) and 53 (38.1%) metabolites were upregulated and downregulated, respectively. Of the 159 metabolites differentially accumulated in GE compared to NEC, 128 (80.5%) and 31 (19.5%) metabolites were upregulated and downregulated, respectively. Volcano plots were generated to represent the significant differences between PEC vs. NEC, GE vs. PEC, and GE vs NEC ( Figure  4a-c). A hierarchical cluster analysis was also performed to assess the DAM accumulation patterns (Figure 4d-f).

Identification of Differentially Accumulated Metabolites (DAMs)
Differentially accumulated metabolites (DAMs) were defined as those exhibiting a fold change ≥2 or a fold change ≤0.5 and a variable importance in project (VIP) ≥1 between PEC versus NEC, GE versus PEC, or GE versus NEC (p < 0.05). In total, 156, 139, and 159 DAMs were identified, respectively (Table 1 and Table S2). For PEC vs. NEC, 124 of 156 (79.5%) were upregulated and 32 of 156 (20.5%) were downregulated. Of the 139 DAMs identified between GE and PEC, 86 (61.9%) and 53 (38.1%) metabolites were upregulated and downregulated, respectively. Of the 159 metabolites differentially accumulated in GE compared to NEC, 128 (80.5%) and 31 (19.5%) metabolites were upregulated and downregulated, respectively. Volcano plots were generated to represent the significant differences between PEC vs. NEC, GE vs. PEC, and GE vs NEC (Figure 4a-c). A hierarchical cluster analysis was also performed to assess the DAM accumulation patterns (Figure 4d-f).

Functional Annotation and KEGG Enrichment Analysis of DAMs
Differentially accumulated metabolites in PEC vs. NEC, GE vs. PEC, and GE vs. NEC were functionally annotated using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, which was summarized in Table S4. To further investigate the functioning of SE-related metabolites, we analyzed the differences and dynamic changes of biological processes among the three groups. For PEC vs. NEC (embryogenic differentiation), the KEGG enrichment analysis showed that the terms 'purine metabolism', 'cGMP-PKG signaling pathway', and 'olfactory transduction' were significantly enriched ( Figure 6a). However, the terms 'phenylpropanoid biosynthesis', 'flavone and flavonol biosynthesis', 'flavonoid biosynthesis', and 'biosynthesis of phenylpropanoids' were enriched in GE vs. PEC (somatic embryo development) (Figure 6b). Meanwhile, for GE vs. NEC, the DAMs were most strongly associated with the terms 'purine metabolism', 'microbial metabolism in diverse environments', 'biosynthesis of plant secondary metabolites', and 'cGMP-PKG signaling pathway' (Figure 6c). Differentially accumulated metabolites in PEC vs. NEC, GE vs. PEC, and GE vs. NEC were functionally annotated using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, which was summarized in Table S4. To further investigate the functioning of SE-related metabolites, we analyzed the differences and dynamic changes of biological processes among the three groups. For PEC vs. NEC (embryogenic differentiation), the KEGG enrichment analysis showed that the terms 'purine metabolism', 'cGMP-PKG signaling pathway', and 'olfactory transduction' were significantly enriched (Figure 6a). However, the terms 'phenylpropanoid biosynthesis', 'flavone and flavonol biosynthesis', 'flavonoid biosynthesis', and 'biosynthesis of phenylpropanoids' were enriched in GE vs. PEC (somatic embryo development) (Figure 6b). Meanwhile, for GE vs. NEC, the DAMs were most strongly associated with the terms 'purine metabolism', 'microbial metabolism in diverse environments', 'biosynthesis of plant secondary metabolites', and 'cGMP-PKG signaling pathway' (Figure 6c).

Association Analysis of the Metabolome and Transcriptome Data Sets
To provide an overview of the system-wide changes that occur during SE in cotton, transcriptomic profiling based on RNA-seq was performed. Significant differentially expressed genes (DEGs) were detected among NEC, PEC, and GE (unpublished data). By interactively comparing the metabolomic and transcriptomic data, we were able to identify potential metabolites and the corresponding differentially expressed genes at the molecular and biochemical levels.

KEGG Enrichment Analysis of DAMs and DEGs During Cotton SE
To investigate the relationships between genes and metabolites involved in the two transdifferentiation processes during SE in cotton, the DAMs and DEGs in the two compared groups (PEC vs. NEC and GE vs. PEC) were mapped according to the KEGG database. There were 96 and 55 co-mapped pathways in PEC vs. NEC and GE vs. PEC, respectively (Table S5).

Association Analysis of the Metabolome and Transcriptome Data Sets
To provide an overview of the system-wide changes that occur during SE in cotton, transcriptomic profiling based on RNA-seq was performed. Significant differentially expressed genes (DEGs) were detected among NEC, PEC, and GE (unpublished data). By interactively comparing the metabolomic and transcriptomic data, we were able to identify potential metabolites and the corresponding differentially expressed genes at the molecular and biochemical levels.

KEGG Enrichment Analysis of DAMs and DEGs During Cotton SE
To investigate the relationships between genes and metabolites involved in the two transdifferentiation processes during SE in cotton, the DAMs and DEGs in the two compared groups (PEC vs. NEC and GE vs. PEC) were mapped according to the KEGG database. There were 96 and 55 co-mapped pathways in PEC vs. NEC and GE vs. PEC, respectively (Table S5).

Correlation Analysis of DAMs and DEGs during SE in Cotton
Log 2 conversion data for the differentially accumulated metabolites and differentially expressed genes were selected that had a Pearson's correlation coefficient (PCC) > 0.8. To obtain a systematic view of the variations in metabolites and their corresponding genes with PCC > 0.8, nine quadrant diagrams were generated for the embryogenic differentiation and somatic embryo development processes (Figure 7).
The black dotted lines divide each graph into 9 quadrants. Upregulated metabolites and downregulated genes are displayed in quadrant 1, upregulated metabolites and unchanged genes are displayed in quadrant 2, upregulated metabolites and upregulated genes are displayed in quadrant 3, unchanged metabolites and downregulated genes are displayed in quadrant 4, unchanged metabolites and unchanged genes are displayed in quadrant 5, unchanged metabolites and upregulated genes are displayed in quadrant 6, downregulated metabolites and downregulated genes are displayed in quadrant 7, downregulated metabolites and unchanged genes are displayed in quadrant 8, and downregulated metabolites and upregulated genes are displayed in quadrant 9. Moreover, the DAMs and DEGs shown in quadrant 3 and quadrant 7 are positively correlated and have similar consistent patterns, while the DAMs and DEGs shown in quadrant 1 and quadrant 9 are negatively correlated and have opposite patterns.
Notably, in quadrant 3, several representative factors that are related to receptor-like protein kinases, signal recognition, transcription, stress regulation, lipid binding, hormone responses, and histone modification were significantly activated during embryogenic differentiation process (Table 4). The black dotted lines represent the differential thresholds. Outside of the threshold lines, there were significant differences in the genes/metabolites, and within the threshold lines are shown the unchanged genes/metabolites. Each point represents a gene/metabolite. Black dots represent the unchanged genes/metabolites; green dots represent differentially accumulated metabolites with unchanged genes; red dots represent differentially expressed genes with unchanged metabolites; blue dots represent both differentially expressed genes and differentially accumulated metabolites. NEC, nonembryogenic staged calli; PEC, primary embryogenic calli; GE, initiation staged embryos with globular-like enriched.  The black dotted lines represent the differential thresholds. Outside of the threshold lines, there were significant differences in the genes/metabolites, and within the threshold lines are shown the unchanged genes/metabolites. Each point represents a gene/metabolite. Black dots represent the unchanged genes/metabolites; green dots represent differentially accumulated metabolites with unchanged genes; red dots represent differentially expressed genes with unchanged metabolites; blue dots represent both differentially expressed genes and differentially accumulated metabolites. NEC, nonembryogenic staged calli; PEC, primary embryogenic calli; GE, initiation staged embryos with globular-like enriched.

Transcript-Metabolite Correlation Network Representing DAMs and DEGs during SE in Cotton
To model the synthetic and regulatory characteristics of DAMs and DEGs, subnetworks were constructed to determine transcript-metabolite correlations. Only the correlation pairs with a correlation coefficient > 0.8 were included in the analysis.
These results indicated that several classic SE-related genes were highly correlated with their corresponding metabolites involved in purine metabolism and flavonoid biosynthesis, which reconfirmed the specific importance of purine metabolism during embryogenic differentiation, as well as the importance of flavonoid biosynthesis during somatic embryo development. The transcriptome data validated the authenticity and accuracy of the metabolic analysis.     (a)

Discussion
Metabolomics is an emerging omics technology that, like genomics and proteomics, can be used to qualify and quantify all metabolites of small molecular weight within the cells of an organism. Plant metabolomics has been widely applied to the investigation of patterns of metabolite accumulation and their underlying genetic basis via the identification of genes involved in metabolism, which is currently a topic of interest in modern plant biology. As the final products of genome expression, metabolites directly define the biochemical characteristics of a cell or tissue, which allows metabolomic data to be used to explain the biochemical mechanisms underlying SE. Meanwhile, integrated transcriptomic and metabolomic analysis allows for the more precise representation of gene-to-metabolite networks and thus will be an effective method that can be used to decipher the mechanisms involved in SE regulation in cotton. In the current study, we combined transcriptome and metabolome analyses to generate dynamic profiles of two transdifferentiation processes, embryogenic differentiation, and somatic embryo development, during SE in cotton, with the aim to provide a better understanding of the processes of SE transdifferentiation that resulted in cell totipotency at the molecular and biochemical levels.

Accumulated DAMs Specifically Involved in Two SE Transdifferentiation Processes in Cotton
In this study, a hierarchical cluster analysis (HCA) was performed to assess the patterns of accumulation of the metabolites among the different samples. Results showed that PEC and GE can be clustered together while NEC forms a separate cluster, which suggests that there is significant differential accumulation of metabolites between the two transdifferentiation processes, embryogenic differentiation (PEC-NEC) and somatic embryo developmental initiation (GE-PEC).

Discussion
Metabolomics is an emerging omics technology that, like genomics and proteomics, can be used to qualify and quantify all metabolites of small molecular weight within the cells of an organism. Plant metabolomics has been widely applied to the investigation of patterns of metabolite accumulation and their underlying genetic basis via the identification of genes involved in metabolism, which is currently a topic of interest in modern plant biology. As the final products of genome expression, metabolites directly define the biochemical characteristics of a cell or tissue, which allows metabolomic data to be used to explain the biochemical mechanisms underlying SE. Meanwhile, integrated transcriptomic and metabolomic analysis allows for the more precise representation of gene-to-metabolite networks and thus will be an effective method that can be used to decipher the mechanisms involved in SE regulation in cotton. In the current study, we combined transcriptome and metabolome analyses to generate dynamic profiles of two transdifferentiation processes, embryogenic differentiation, and somatic embryo development, during SE in cotton, with the aim to provide a better understanding of the processes of SE transdifferentiation that resulted in cell totipotency at the molecular and biochemical levels.

Accumulated DAMs Specifically Involved in Two SE Transdifferentiation Processes in Cotton
In this study, a hierarchical cluster analysis (HCA) was performed to assess the patterns of accumulation of the metabolites among the different samples. Results showed that PEC and GE can be clustered together while NEC forms a separate cluster, which suggests that there is significant differential accumulation of metabolites between the two transdifferentiation processes, embryogenic differentiation (PEC-NEC) and somatic embryo developmental initiation (GE-PEC).
To investigate the differential metabolites specifically accumulated in (PEC-NEC) vs. (GE-PEC) vs. (GE-NEC), a Venn diagram was generated (Figure 5a; Table S3 We examined the metabolites that were specifically accumulated between the two transdifferentiation processes, embryogenic differentiation (PEC-NEC), and somatic embryo development (GE-PEC). The results showed that there were 111 DAMs that were specifically accumulated in PEC vs. NEC, of which amino acids, organic acids, nucleotides, flavones, and lipids were the most represented (Figure 5b). Meanwhile, 94 DAMs were accumulated specifically in GE vs. PEC, of which flavones hydroxycinnamoyl derivatives, other metabolites, amino acids and organic acids accounted for a large proportion (Figure 5c). Specifically, nucleotides and lipids were more greatly accumulated during embryogenic differentiation, whereas greater amounts of flavones and hydroxycinnamoyl derivatives were accumulated during the somatic embryo development process. These data indicated that nucleotides and lipids may play important and special roles during embryogenic differentiation, whereas flavonoids are more important during the embryo maturation process.

Enrichment of Purine Metabolism in Embryogenic Differentiation
Purine metabolism refers to the metabolic pathways that synthesize and break down the purines that are present in many organisms. In our study, to investigate the functioning of SE-related metabolites, we analyzed the differences and dynamic metabolite changes among the three groups, PEC vs. NEC, GE vs. PEC, and GE vs. NEC. KEGG enrichment analysis showed that the 'purine metabolism' pathway was significantly enriched during embryogenic differentiation (Figure 6b). All of the differentially accumulated metabolites related to purine metabolism were upregulated. Meanwhile, enrichment analysis of DAMs and DEGs showed that 'purine metabolism' was co-mapped based on results from the KEGG database (Table S5).
In our study, several major DEGs of regulatory factors were identified and significantly associated with purine metabolism in embryogenic differentiation, including receptor-like protein kinases, signal recognition, transcription, stress regulation, lipid binding, hormone responses, and histone modification were significantly upregulated during the embryogenic differentiation process ( Table 4).
The role of purine metabolism in the SE process has been demonstrated in many plants. A comparative omic analysis of tree fern Cyathea delgadii explants that were undergoing direct SE was performed. The results revealed that the differentially regulated proteins adenine phosphoribosyl transferase 3 and adenosine kinase 2 were assigned to the purine metabolism category and were associated with direct SE in C. delgadii [60]. To understand the molecular mechanisms that regulate early SE in Eleutherococcus senticosus Maxim, a high-throughput RNA-seq technology was used to investigate its transcriptome. The initiation of SE affected gene expression in many KEGG pathways but predominantly affected expression in metabolic pathways and those related to the biosynthesis of secondary metabolites. Other unigenes were classified into the purine metabolism pathway [61]. Similar results were obtained in cotton (Gossypium hirsutum L.). RNA-seq was performed to analyze the genes expressed during SE and their expression dynamics using RNAs isolated from nonembryogenic callus (NEC), embryogenic callus (EC), and somatic embryos (SEs). The differentially expressed genes in NEC, EC, and SEs were identified, annotated, and classified. Partial DEGs were identified that were related to purine metabolism [62]. A possible critical role for purines during embryogenesis in geranium hypocotyl tissues (Pelargonium x hortorum) has also been reported [63]. The above results revealed the significant role of purine metabolism in EC proliferation.

Enrichment of Flavonoid Biosynthesis in Somatic Embryo Developmental Initiation
Flavonoids are synthesized by the phenylpropanoid metabolic pathway in which the amino acid phenylalanine is used to produce 4-coumaroyl-CoA. The metabolic pathway continues through a series of enzymatic modifications to yield flavanones, dihydroflavonols, and then anthocyanins. Along this pathway, many products can be formed, including flavonols, flavan-3-ols, proanthocyanidins (tannins) and a host of other various polyphenolics. In the current study, 139 differentially accumulated metabolites were identified in GE vs. PEC, and flavone was largely detected (Figure 4h). 94 DAMs accumulated specifically in somatic embryo development, of which flavone accounted for a large proportion (Figure 5c). KEGG enrichment analysis of DAMs suggested that 'phenylpropanoid biosynthesis', 'flavone and flavonol biosynthesis', 'flavonoid biosynthesis', and 'biosynthesis of phenylpropanoids' were enriched in GE vs. PEC (Figure 6b). Meanwhile, in co-mapped pathways between genes and metabolites, 'phenylpropanoid biosynthesis' (ko00940), 'flavonoid biosynthesis' (ko00941), and 'flavone and flavonol biosynthesis' (ko00944) showed significant accumulation in GE vs. PEC (Table S5).
The significant role of flavonoid biosynthesis in the somatic embryo development process has been investigated. In citrus cell cultures, there was no detectable accumulation of flavonoid in the undifferentiated calli, but flavonoid accumulated after the morphological changes to embryos. Two chalcone synthase (CHS) genes differentially expressed during citrus SE and CHS gene may regulate the accumulation of flavonoid [64].
With the goal to better understand SE development and to improve the efficiency of SE conversion in Theobroma cacao L., gene expression differences between zygotic and somatic embryos were examined using a whole genome microarray. Expression levels of genes involved in fatty acid metabolism and flavonoid biosynthesis were differentially expressed in the two types of embryos. The relatively higher expression of flavonoid related genes during SE suggested that the developing tissues may be experiencing high levels of stress during SE maturation caused by the in vitro environment [65].
Moreover, Wang et al. reported that overexpression of GhSPL10, a target of GhmiR157a, activated the flavonoid biosynthesis pathway, and promoted initial cellular dedifferentiation and callus proliferation [66]. The synthesis of flavonoids like anthocyanin in several plant tissues has been associated with increased phenylalanine ammonia lyase (PAL) activity. The increased PAL level was detected in samples collected from different growth phases during SE in Silybum marianum. As intermediary products of phenylpropanoid metabolism, flavonoids may stimulate differentiation and create a situation that is more favorable for embryogenesis [67]. These conclusions demonstrated that flavonoids biosynthesis might be frequently associated with somatic embryo development during cotton SE transdifferentiation.

Plant Materials and Culture Conditions
Upland cotton (Gossypium hirsutum cv. YZ-1) seeds were sterilized in 0.1% HgCl 2 (w/v) for 8 min and were then rinsed 3-4 times with distilled water. The seeds were then germinated in Murashige and Skoog (MS) medium supplemented with 3% (w/v) sucrose and 0.25% (w/v) phytagel. Hypocotyl explants (0.5-1.0 cm) from 7-day-old seedlings were cultured in MS plus B 5 vitamin (MSB) medium containing 0.45 µmol·L −1 2,4-dichlorophenoxyacetic acid (2,4-D) and 0.46 µmol·L −1 kinetin (KT). NEC were maintained in MSB medium for 6 weeks at 28 • C in a 16/8 h light/dark photoperiod and were then subcultured in fresh MSB medium without hormones. Following an additional 3-4 weeks of growth, the somatic-to-embryogenic transition progressed to the point where it induced the development of PEC and GE. Based on our previous approach as published recently [53], these representative staged samples of NEC, PEC, and GE could be highly enriched and collected, frozen immediately in liquid nitrogen, and stored at −80 • C for subsequent metabolic and transcriptomic profiling. The sample from each stage was prepared using three biological replicates.

Sample Preparation and Extraction for Widely Targeted Metabolic Profiling
Chemical extraction was carried out on nine samples (three biological replicates for each of three developmental stages) for the purposes of metabolic analyses. Each freeze-dried sample was crushed using a mixer mill (MM 400, Retsch, Haan, Germany) with a zirconia bead for 1.5 min at 30 Hz. One hundred milligrams of powder was weighed and extracted overnight at 4 • C in 1.0 mL 70% aqueous methanol. Following centrifugation at 10000 g for 10 min, the extracts were absorbed using a CNWBOND Carbon-GCB SPE Cartridge (250 mg, 3 mL; ANPEL, Shanghai, China, www.anpel.com.cn/cnw) and filtered with a SCAA-104 filter (0.22 µm pore size; ANPEL, Shanghai, China, http://www.anpel.com.cn/) prior to LC-MS analysis.

ESI-Q TRAP-MS/MS
LIT and triple quadrupole (QQQ) scans were acquired using a triple quadrupole-linear ion trap mass spectrometer (Q TRAP) API 6500 Q TRAP LC/MS/MS System equipped with an ESI Turbo ion-spray interface, operated in positive ion mode and controlled using Analyst 1.6 software (AB Sciex). The ESI source operation parameters were as follows: ion source, turbo spray; source temperature, 500 • C; ion spray voltage (IS), 5500 V; ion source gas I (GSI), gas II (GSII), and curtain gas (CUR), 55, 60, and 25.0 psi, respectively; the collision gas (CAD) was set to high. The instrument tuning and mass calibration were performed using 10 and 100 µmol/L polypropylene glycol solutions in QQQ and LIT modes, respectively. QQQ scans were acquired during MRM experiments in which the collision gas (nitrogen) was set to 5 psi. DP and CE for individual MRM transitions were conducted after further DP and CE optimization. A specific set of MRM transitions were monitored during each period based on the metabolites that eluted during this period [68,69].

Widely Targeted Metabolic Profiling
The samples were analyzed using a metabolomic platform that combined ultra-performance liquid chromatography (UPLC) and tandem mass spectrometry (MS/MS). Metabolite identification was performed using the MWDB metware database (Metware Biotechnology Co., Ltd. Wuhan, China) and other public databases according to standard metabolic operating procedures.

Statistical Analysis
The metabolite abundances were quantified using the peak areas. The data obtaining from the metabolite profiling were normalized for the principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) [70,71]. Metabolites with significant differences in content were defined as having a variable importance in project (VIP) ≥ 1 and a fold change ≥ 2 or ≤ 0.5. Fisher's exact test was applied to identify the significant KEGG pathways with a false discovery rate (FDR) < 0.05 [72]. Gene-metabolite pairs with a Pearson's correlation coefficient > 0.8 were used to construct the transcript-metabolite network.

Conclusions
Somatic embryogenesis is the developmental reprogramming of somatic cells toward the embryogenesis pathway. In this study, the dynamic metabolomic and transcriptomic profiling of cotton SE transdifferentiation processes, embryogenic differentiation, and somatic embryo development, were comparatively investigated.
During embryogenic differentiation (PEC vs. NEC), nucleotides and lipids were specifically accumulated. And the metabolome wide DAMs significantly enriched in purine metabolism (Table 2).
During somatic embryo development (GE vs. PEC), flavones and hydroxycinnamoyl derivatives were largely accumulated. DAMs were most significantly associated with flavonoid biosynthesis (Table 3). Classic SE genes that control somatic embryo development, including WUS, CLV1, CUC2, SHR, and SCW, were highly correlated with the corresponding metabolites that were involved in flavonoid biosynthesis (Table 6).
By interactively comparing metabolomic and transcriptomic data, we identified a series of potential metabolites and the corresponding differentially expressed genes candidate for a relevant role in SE transdifferentiation. The findings in our work provide new insights into the underlying molecular and biochemical basis associated with embryogenic competence acquisition underpinning cotton SE development.