Metabolome and Transcriptome Association Analysis Reveals Regulation of Flavonoid Biosynthesis by Overexpression of LaMIR166a in Larix kaempferi (Lamb.) Carr

Somatic embryogenesis is an ideal model process for studying early plant development. Embryonic cell lines of Larix kaempferi (Lamb.) Carr overexpressing LaMIR166a were obtained in our previous study. Here, a combination of de novo transcriptomics and extensively targeted metabolomics was used to study the transcriptional profiles and metabolic changes in wild-type and LaMIR166a-overexpressed embryonic cell lines. A total of 459 metabolites were found in the wild-type and transgenic cell lines. Compared to those in the wild-type cell lines, transcripts and metabolites were significantly altered in the LaMIR166a-overexpressed cell lines. Among differentially expressed genes (DEGs), phenylalanine and flavonoid synthesis genes were significantly enriched, and among differentially accumulated metabolites (DAMs), phenolic acids and flavonoids accumulated in particularly high amounts. Thus, the flavonoid biosynthetic pathway seems to be the most abundant pathway in response to LaMIR166a overexpression. Based on the Kyoto Encyclopedia of Genes and Genomes database, the association analysis of metabolome and transcriptome data showed that flavonoid biosynthesis and plant hormone signal transduction processes were significantly changed in miR166a-overexpression lines, suggesting that miR166 might be involved in these processes. The present study identified a number of potential metabolites associated with LaMIR166a overexpression, providing a significant foundation for a better understanding of the regulatory mechanisms underlying miR166.


Introduction
Larix kaempferi (Lamb.) Carr is a coniferous tree species used for afforestation in the northern parts of China [1]. It a significant source of forest timber and plays an important role in ecological construction. The somatic embryogenesis system (SE) is an ideal model system for studying the growth and development of gymnosperms, including L. kaempferi [2].
Micro RNAs (miRNAs) are a class of small molecular non-coding RNAs with a length of [19][20][21][22][23][24] nucleotides and are widely found in eukaryotes [3]. Since the discovery of the first miRNA [4], a growing number of miRNAs have been discovered and recognized, and nearly 39,000 miRNAs have been registered in the miRBase database (release 22.1) (http://www.mirbase.org/). Mir165/166 (miR166) is a rich and highly conserved RNA family present in all terrestrial plants [5,6], and it plays a key regulatory role in plant growth and development as well as in plant response to different stresses by specific binding to class III homeodomain leucine zipper proteins (HD-ZIP III) [7][8][9][10]. Previous studies have made significant progress in exploring the role of miR166a, including in shoot apical meristem (SAM) maintenance and the development of roots, stems, leaves, flowers, and seeds, as well as in various developmental processes [11][12][13][14]. Lateral root generation [10] and responses to stresses (such as drought, cold, and heavy metal stress) [15,16] also help mediate plant embryo development and seed germination [17,18]. Previous studies have found that LaMIR166a may indirectly regulate the development of SAM by affecting WOX expression [19]. LaMIR166a plays an important role in auxin biosynthesis and signal transmission during somatic embryo maturation and germination [20,21]. However, the role of miR166 and its regulatory mechanisms in the proembryo stage are not clear.
Previous studies have indicated that miRNAs are involved in plant metabolism; e.g., light-responsive miRNAs appear to play important roles in secondary metabolic pathways in potato (Solanum tuberosum L.) [22]. In Taxus cell lines, miRNAs could up-regulate primary metabolism [23]; this led to a hypothesis that miRNAs act as metabolic pathway regulators. In recent years, metabolomics has become a hot research topic because metabolites are the final products of gene expression, and they can also regulate gene transcription and protein expression [24]. The integration of metabolomics and transcriptomic network analysis can help elucidate the changes in function and content of many secondary metabolites as well as the changes in corresponding genes [25,26].
Previously, we obtained five LaMIR166a-overexpressed embryonic cell lines [19]. In the present study, we randomly selected three lines and conducted transcriptomics and metabolomics research. By comparing the interaction and transcription data, our goal was to identify potential metabolic products and their corresponding genes related to LaMIR166a overexpression and to predict the relationship between genes and metabolites. Besides providing a theoretical basis for the use of L. kaempferi somatic embryo technology in forestry production, the results of our study can also be used as a valuable reference for effective development and utilization of other gymnosperms.

Plant Materials
Wild embryogenic cultures were obtained from L. kaempferi immature zygotic embryos on the induction medium, LaMIR166a was cloned into a psuper1300+ binary vector and successfully transformed into embryogenic suspensor masses (ESMs), and five transgenic embryogenic cell lines were obtained, which increased lateral root number and biomass at the germination stage [19][20][21]. Based on our previous studies, we selected three of these soft and fresh cell clusters (named a-3, a-4, and a-5), which were grown for 20 days on the proliferation medium at the same developmental stage; they were cultured in triplicate and taken from different culture masses. The wild-type embryogenic cultures (WT) were cultured in triplicate and used as control. The cultures were immediately frozen in liquid nitrogen and stored at −80 • C until further RNA extraction. Part of the samples was used for transcriptome library construction and sequencing, whereas the other part was used for metabolome analysis.

Sample Preparation
Sample preparation, metabolome profiling, and statistical analyses were performed at Wuhan MetWare Biotechnology Co., Ltd. (Wuhan, China, www.metware.cn) following standard procedures. Vacuum-freeze-dried cell samples of WT and a-3, a-4, and a-5 were crushed to powder using a grinder (MM 400, Retsch, Haan, Germany). A total of 100 mg of crushed powder was weighed, and aliquots were extracted at 4 • C with 0.6 mL of 70% aqueous methanol. Then, the samples were vortexed six times during the extraction process to increase the extraction rate. The obtained aliquots were centrifuged at 10,000× g for 10 min to obtain the supernatant (CNWBOND Carbon-GCB SPE Cartridge, 250 mg, 3 mL; ANPEL, Shanghai, China, www.anpel.com.cn/cnw), and the supernatants were filtered through a microporous membrane (SCAA-104, 0.22 µm pore size; ANPEL, Shanghai, China, http://www.anpel.com.cn/) and subsequently stored for UPLC-MS/MS analysis.
In addition, all the tissues were ground in liquid nitrogen and the total RNA was prepared by isolation with the RNAiso Plus and RNAiso-mate for Plant Tissue kits (Takara, Japan) following the manufacturer's instructions, total RNA was treated with DNase (Takara, Japan) to remove DNA. The same amount of RNA from each sample (WT, a-3, a-4, and a-5) was used for sequencing analysis.

Chromatographic Mass Spectrometry Conditions
The sample extracts were analyzed using a UPLC-ESI-MS/MS system (UPLC, Shim-pack UFLC SHIMADZU CBM30A system, www.shimadzu.com.cn/; MS, Applied Biosystems 4500 Q TRAP, www.appliedbiosystems.com.cn/). The liquid phase conditions were as follows: UPLC column, Agilent SB-C18 (1.8 µm, 2.1 mm × 100 mm); mobile phase: solvent A, pure water with 0.1% formic acid and solvent B, acetonitrile. Sample measurements were performed with a gradient program using starting conditions of 95% A and 5% B. Within 9 min, a linear gradient to 5% A and 95% B was programmed, and these conditions were retained for 1 min. Subsequently, the conditions of 95% A and 5% B were adjusted within 1 min and retained for 2 min. The column oven temperature was set to 40 • C, and the injection volume was 4 µL. The effluent was alternatively connected to an ESI-triple quadrupole-linear ion trap (QTRAP)-M.
Linear ion trap (LIT) and triple quadrupole (QQQ) scans were acquired with a triple quadrupole-linear ion trap mass spectrometer (Q TRAP, AB Sciex, Waltham, MA, USA ) API 4500 Q TRAP UPLC/MS/MS system equipped with an ESI Turbo Ion-Spray interface, operating in positive and negative ion mode and controlled by the Analyst 1.6.3 software (AB Sciex). The ESI source operation parameters were as follows: ion source, turbo spray; source temperature, 550 • C; ion spray voltage (IS), 5500 V (positive ion mode)/−4500 V (negative ion mode); ion source gas I (GSI), gas II(GSII), and curtain gas (CUR) were set at 50, 60, and 30.0 psi, respectively; the collision gas (CAD) level was set at high. Instrument tuning and mass calibration were performed with 10 and 100 µmol/L polypropylene glycol solutions in QQQ and LIT modes, respectively. QQQ scans were acquired from multiple reaction monitoring (MRM) experiments with collision gas (nitrogen) set to 5 psi. Declustering potential (DP) and collision energy (CE) for individual MRM transitions were performed with further DP and CE optimization, respectively. A specific set of MRM transitions was monitored for each period according to the metabolites eluted within the period.

Metabolomic and Transcriptome Data Analysis
Unsupervised principal component analysis (PCA) was performed using the statistics function prcomp in R (www.r-project.org). The data were scaled to unit variance before the unsupervised PCA.
The results of the hierarchical cluster analysis (HCA) of the samples and metabolites were presented as heatmaps with dendrograms, whereas Pearson correlation coefficients (PCC) between samples were calculated using the cor function in R and presented only as heatmaps. Both HCA and PCC were carried out using the R package pheatmap. For HCA, normalized signal intensities of metabolites (unit variance scaling) were visualized as a color spectrum. Significantly regulated metabolites between groups were determined by VIP ≥ 1 and absolute Log 2 C (fold change) ≥ 1. VIP values were extracted from orthogonal partial least squares discriminant analysis (OPLS-DA) results. The data were log-transformed (log 2 ) and mean-centered before OPLS-DA. In order to avoid overfitting, a permutation test (200 permutations) was performed.
The identified metabolites were annotated using the Kyoto Encyclopedia of Genes and Genomes (KEGG) Compound database (http://www.kegg.jp/kegg/compound/), after which they were mapped to the KEGG Pathway database (http://www.kegg.jp/kegg/pathway.htmL). The pathways with significantly regulated metabolites mapped were then fed into the metabolite set enrichment analysis (MSEA), and their significance was determined by hypergeometric test p-values.
EdgeR software was used to analyze the p-value and false discovery rate (FDR) values of differential expression of unigenes in each sample based on the alignment results. FDR values were combined with FoldChange (FC) to screen for differential unigenes, whose FC was ≥1 and whose FDR was <0.05 in a comparison as the significantly differentially expressed screening condition.

Library preparation, RNA sequencing and De Novo Assembly
The RNA library were constructed and sequenced on an Illumina NovaSeq6000 at the Berry. Genomics (Beijing, China). Filtering the raw reads which affected alignment and subsequent analysis quality to obtain clean reads. A small number of spliced, repetitive, and low-quality reads (the mass value Q ≤ 3 and the base number accounts for more than 50% of the whole read) were removed to gain more reliable results. De novo assembly of L. kaempferi transcriptome was using the Trinity-v2.4.0 software.

Combined Analysis of Transcriptome and Metabolome
Differentially expressed genes (DEGs) and differentially accumulated metabolites (DAMs) were mapped onto the KEGG pathway at the same time. The enrichment results were used to show the degree of enrichment pathways with a false discovery rate (FDR) < 0.05. The transcript-metabolite network was constructed with a Pearson's correlation coefficient > 0.8.

Qualitative and Quantitative Metabolomic Analysis Based on UPLC-MS/MS and Identification of Metabolites
The results showed that in the process of metabolite detection, the total ion current (TIC) curves of quality control (QC) in different samples overlapped, and the retention time and peak intensity were consistent, indicating that the same sample signal was detected to be stable at different times ( Figure 1). Cluster analysis can reflect the differential patterns of accumulated metabolites. Cluster analysis results showed that the same groups had similar metabolite accumulation patterns, and different groups had different metabolite accumulation patterns ( Figure 2a). Pearson's correlation coefficient reflects the repeatability between samples within a group (Figure 2b). PCA showed that WT and transgenic samples were separately aggregated, each forming a cluster except a-4-2, indicating that there were significant differences between WT and transgenic lines (Figure 2c).

Identification of DAMs
DAMs were defined as those showing a fold change of ≥ 2 or a fold change of ≤ 0.5 and a variable importance in project (VIP) ≥ 1 between WT and a-3, a-4, and a-5 (p < 0.05). A total of 77, 72, and 76 DAMs were identified between WT and a-3, a-4, and a-5, respectively. For WT and a-3, 55 of the 77 DAMs (71.4%) were up-regulated and 22 (28.6%) were down-regulated. Among the 72 DAMs identified between WT and a-4, 54 (75%) and 53 (25%) metabolites were up-regulated and down-regulated, respectively. Among the 76 DAMs identified between WT and a-5, 45 (59.2%) and 31 (40.8%) were up-regulated and down-regulated, respectively (Table 1 and Table S2). The volcano map shows significant differences between WT and a-3, a-4, and a-5 (Figure 3a-c). The results of the hierarchical cluster analysis show the DAM accumulation models (Figure 3d-f).
Group Group The number of DAM categories in each group is listed in Table 2. An overview of the three groups showed that compared with WT, the transformant lines had the largest differences in phenolic acids, flavonoids, lipids, and amino acids (Figure 3g-i) following LaMIR166a overexpression. DAMs involved in flavonoid biosynthesis are shown in Table 3. We also used Venn analysis to study the differences in metabolites accumulated by WT and those accumulated by a-3, a-4, and a-5 ( Figure 4). In addition, there were a total of 43 common metabolites in the three transformant groups (Table 4).

DAM Functional Annotation and KEGG Enrichment Analysis
Metabolites were functionally annotated using the KEGG database, and the DAMs in WT and a-3, a-4, and a-5 are shown in Table S3. In order to further study the metabolites that may have changed as a consequence of LaMIR166a overexpression, we analyzed the differences in biological processes between the control and transformant lines. In general, KEGG enrichment analysis showed that "flavonoid synthesis", "auxin signal transduction", and "phenylpropane biosynthesis" were significantly enriched in WT and transformant lines (Figure 5a-c). DAMs involved in flavonoid biosynthesis, plant hormone signal transduction, and phenylpropanoid biosynthesis are shown in Table 5.

Association Analysis of Metabolome and Transcriptome Data Sets
In order to provide an overview of the changes that occur after the overexpression of LaMIR166a in embryonic masses, we performed RNA-seq-based transcription analysis. The raw data have been submitted to Sequence Read Archive (SRA) (http://www.ncbi.nlm.nih.gov/) under the accession number PRJNA680899. The RNA-seq yielded a total of 82.5 Gb clean data, and 203,256 non-redundant unigenes were obtained after assembly. 39276 unigenes have length of more than 1 kb (Table S4). Significant DEGs were detected between WT and a-3, a-4, and a-5. By comprehensive analysis of metabolomic and transcriptomic data, DEGs and the corresponding potential metabolites could be found.

KEGG Enrichment Analysis of DAMs and DEGs in L. kaempferi Embryonic Masses
In order to study the relationship between genes and metabolites after the overexpression of LaMIR166a, the DAMs and DEGs of WT, a-3, a-4, and a-5 were analyzed. According to KEGG database mapping, there were 29, 26, and 32 co-mapped pathways in WT vs. a-3, WT vs. a-4, and WT vs. a-5, respectively. The number of common pathways in the three groups was 16. Interestingly, among these commonly mapped pathways, "phenylpropane biosynthesis" (ko00940), "flavonoid biosynthesis" (ko00941), "Cutin, suberine and wax biosynthesis", and "Plant hormone signal transduction" (ko04075) showed significant levels of accumulation. The above results can be found in Table S5.

Correlation Analysis of DAMs and DEGs
In order to obtain a systematic overview of metabolites and their corresponding gene PCC > 0.8 variants, nine quadrant diagrams were generated (Figure 6a-c).
The black dashed line divides each graph into nine quadrants: the first quadrant shows up-regulated metabolites and down-regulated genes; the second quadrant shows up-regulated metabolites and unchanged genes; the third quadrant shows up-regulated metabolites and genes; the fourth quadrant shows unchanged metabolites and down-regulated genes; the fifth quadrant shows unchanged metabolites and genes; the sixth quadrant shows unchanged metabolites and up-regulated genes; the seventh quadrant shows down-regulated metabolites and genes; the eighth quadrant shows down-regulated metabolites and unchanged genes; and the ninth quadrant shows down-regulated metabolites and up-regulated genes. In addition, the DAMs and DEGs shown in the third and seventh quadrant were positively correlated and had similar consistent patterns. It is worth noting that the genes and metabolites related to the flavonoid synthesis pathway and phenylpropane synthesis pathway were all in the third quadrant, and they were all up-regulated, indicating that the up-regulated genes were involved in flavonoid biosynthesis. The up-regulated genes included chalcone synthase (CHS) TRINITY_DN101081_c0_g2, TRINITY_DN101081_c0_g5, TRINITY_DN94950_c1_g3, flavonol synthase (FLS) TRINITY_DN114346_c3_g1, TRINITY_DN112973_c0_g1, dihydroflavonol-4-reductase (DFR) TRINITY_DN106194_c1_g2, and cytochrome P450 (CYP75A) TRINITY_DN104908_c1_g1, TRINITY_DN94249_c6_g1. The metabolites correlative with genes were mainly dihydroquercetin, catechin, and procyanidins. The DEGs and corresponding DAMs involved in the "flavonoid biosynthesis process" and "phenylpropane biosynthesis" are listed in Table S6. The abscissa represents the fold changes in the transcriptome data, and the ordinate represents the fold changes in the metabolome data. The black dotted lines represent the differential thresholds. Each point represents a gene/metabolite. Blue dots represent that genes and metabolites were both changed; black dots represent unchanged genes/metabolites; green dots represent DAMs with unchanged genes; red dots represent differentially expressed genes (DEGs) with unchanged metabolites.

Representing the Transcript-Metabolite Correlation Network of DAMs and DEGs
In order to simulate the synthesis and regulation characteristics of DAMs and DEGs, a subnet was constructed to determine the transcript-metabolite correlation. Only related pairs with a correlation coefficient > 0.8 were included in the analysis. KEGG pathways related to flavonoid biosynthesis (ko00940 and ko00941) were highlighted. Related tests included several classic key genes, which are specifically activated by the overexpression of LaMIR166a (Figure 7, Table S6).

Discussion
Combining metabolomics and transcriptomics to analyze the regulatory mechanisms of overexpressed miRNA in plants can further deepen our understanding of the function of miRNA. In the present study, we analyzed the metabolomics and transcriptomics of LaMIR166a-overexpressed proembryo mass in L. kaempferi. The ESMs stage is the necessary material signal for embryo formation, and it is the key to transformation into embryos. This was the first time that multi-omics were combined to analyze the ESMs in order to better understand the regulatory role of LaMIR166a in L. kaempferi.
Studies have shown that L. kaempferi contains up to 30% of the non-structural hemicellulose polysaccharide ArGal, which has high medical value [27] because of its immune-enhancing properties of stimulating the cytotoxicity of natural killer (NK) cells, enhancing other functions of the immune system and inhibiting the metastasis of tumor cells to the liver [28]. Compared to spruce, L. kaempferi contains a much higher level of hydrophilic phenols; as hydrophilic flavonoids appear to be the key components in wood protection, large amounts of flavonoids play a key role in L. kaempferi's defense mechanism [29]. Studies have demonstrated that conifer shoots contain high levels of healthy compounds such as phenolics [30]. Using metabolomic studies, researchers can significantly increase the value of existing data, and multigroup analysis has become a powerful tool in recent years [24,31,32]. Combined analysis of transcriptome and proteome provided the basis for the molecular regulation of ovule development in gymnosperms [33], whereas metabolomics combined with potential target prediction, degradome validation, and transcriptome sequencing can expand our knowledge of metabolite profiling and potential functional roles in gymnosperm pollination drops [34]. Transcriptome and metabonomic analysis revealed the metabolic variation in sex differences in Ginkgo biloba L. [35]. Therefore, identification and analysis of L. kaempferi metabolites have become an important topic.
The investigated cell lines a-3, a-4, and a-5 are produced by the same gene transformation, and the background of the transformed cell lines was consistent. In previous studies, the relative expression level of LaMIR166a in the transgenic cell line a-4 was more than five times higher than in WT, which was the highest among the five transgenic cell lines [19]. In the present study, we analyzed the differences and metabolite changes between WT and a-3, a-4, and a-5. The results of this study showed that the accumulation of flavonoids in the three transgenic lines was significantly higher than that in WT; moreover, a-4 had a higher flavonoid accumulation than that of a-3 and a-5, suggesting that LaMIR166a had a great influence on the flavonoid biosynthesis pathway. In addition, 43 differential metabolites were the same among a-3, a-4, and a-5; phenolic acids, amino acids, and flavonoids were the most abundant metabolites in all three cell lines. In particular, flavonoids, lipids, alkaloids, organic acids, and tannins were all up-regulated, whereas the other metabolites were either up-regulated or down-regulated. Accordingly, a-3, a-4, and a-5 had 11, 14, and 21 unique accumulated metabolites, respectively. This indicated that the function of LaMIR166a is very complex. LaMIR166a overexpression had different intensity effects as a consequence of different expression levels and gene insertion locations, which might be the main reason for the differences between the three transgenic lines.
Studies have shown that the leaf extracts of flavonoids in Myrcia guianensis (Aubl.) DC. altered the expression of miR166 [36]. Gregory S. found that flavonols played a positive role in the formation of lateral roots in tomato [37]. Substantial results support the role of flavonoids in the modulation of auxin transport in plants [38,39]. Based on previous studies, we found that LaMIR166a was involved in auxin synthesis and signaling regulation during the germination of somatic embryos [21], and we hypothesized that this may also be associated with flavonoid biosynthesis. In the present study, KEGG enrichment analysis showed that "phenylpropane biosynthesis" (KO00940), "flavonoid biosynthesis" (Ko00941), and "flavonoid and flavonoid biosynthesis" (Ko00944) were significantly enriched in the LaMIR166a-overexpressed cell lines, and flavonoid compounds were synthesized via the phenylpropane metabolism pathway. A number of products can be produced in this metabolic pathway, including flavonols, flavano-3-alcohols, proanthocyanidins (tannic acids), and many other polyphenols. In the present study, DAMs identified in WT and three transgenic cell lines had a high amount of flavonoids, the accumulation of polyphenols, anthocyanins, flavanones, flavanones, and isoflavones determined by the metabonomics map of LaMIR166a-overexpressed cell lines was consistent with the transcriptome data. These compounds are regulated by genes such as CHS, FLS, and DFR, which have been found in correlation analysis of transcriptome and metabolome (Table S6). In particular, the correlation network of DAMs and DEGs found that those genes might participate in the biosynthesis of metabolites. These results give us a hint that we should pay more attention to flavonoid biosynthesis during somatic embryo maturation and germination in our future work. Therefore, the accumulation of flavonoid biosynthesis-related genes and metabolites in the overexpressed cell lines indicated that miR166 might be involved in flavonoid biosynthesis in L. kaempferi.

Conclusions
In conclusion, our results demonstrated that the function of miR166 is far more complex than indicated by previous studies, and the combination of multi-omics can help us obtain more information about its function in plants, although it will make the results difficult to understand.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/11/12/1367/s1, Table S1: title Total identified metabolites in WT, a-3, a-4 and a-5; Table S2: title Differentially accumulated metabolites (DAMs) in WT, a-3, a-4 and a-5 respectively; Table S3: title KEGG enrichment analysis of differentially accumulated metabolites in WT vs a-3, WT vs a-4, WT vs a-5; Table S4: title Distribution of the unigenes length; Table S5: title Metabolome and transcriptome association analysis of co-mapped KEGG pathways enriched in WT vs a-3, WT vs a-4 and WT vs a-5; Table S6: title Metabolome and transcriptome association analysis of flavonoid biosynthesis and phenylpropane biosynthesis enriched in WT vs a-3, WT vs a-4 and WT vs a-5.

Conflicts of Interest:
The authors declare no conflict of interest.