Volatile Profiling and Transcriptome Sequencing Provide Insights into the Biosynthesis of α-Pinene and β-Pinene in Liquidambar formosana Hance Leaves

Liquidambar formosana Hance is a pinene-rich deciduous plant species in the Altingiaceae family that is used as a medicinal plant in China. However, the regulatory mechanisms underlying α-pinene and β-pinene biosynthesis in L. formosana leaves remain unknown. Here, a joint analysis of the volatile compounds and transcriptomes of L. formosana leaves was performed to comprehensively explore the terpene synthase (TPS) that may participate in α-pinene and β-pinene biosynthesis. Headspace solid-phase microextraction (HS-SPME) and gas chromatography–mass spectrometry (GC–MS) jointly detected volatile L. formosana leaves. Trees with high and low levels of both α-pinene and β-pinene were defined as the H group and L group, respectively. RNA sequencing data revealed that DXR (1-deoxy-D-xylulose-5-phosphate reductoisomerase), HDS [(E)-4-hydroxy-3-methylbut-2-eny-l-diphosphate synthase], and TPS may be the major regulators of monoterpenoid biosynthesis. We identified three TPSs (LfTPS1, LfTPS2, and LfTPS3), which are highly homologous to α-pinene and β-pinene synthases of other species in phylogenetic analysis. Four TPS genes (LfTPS1, LfTPS2, LfTPS4, LfTPS5) may be critically involved in the biosynthesis and regulation of α-pinene and β-pinene in L. formosana. Bioinformatic and transcriptomic results were verified using quantitative real-time PCR. We identified LfTPS1, LfTPS2 as candidate genes for α-pinene and β-pinene biosynthesis that significantly improve the yield of beneficial terpenoids.


Introduction
Liquidambar formosana is a deciduous tree species in the Altingiaceae family distributed in subtropical China [1]. It is a common forest tree species widely used for forestry and landscaping [2,3]. This species produces high-quality timber and is a well-known source of pharmaceuticals and cosmetics. Diverse bioactive products (antibacterial [4,5], fruit preservation [6], anti-inflammatory [7], antioxidant [8], and antiseptic [9]) extracted from L. formosana leaves are widely used [10]. However, L. formosana is mainly used in landscaping and its utilization in other production forms is extremely deficient.
The fragrance of L. formosana leaves is unique. Volatile organic compounds (VOCs) from L. formosana account for 95.51% of the total volatile components using a solid-phase microextraction (SPME) pipeline [11]. The primary components are α-pinene, β-pinene, caryophyllene, and limonene. Diverse VOCs were identified in L. formosana forest varieties using thermal desorption-gas chromatography-mass spectrometry (TD-GC-MS) [12,13]; fragrance. The results of this study may help to select high terpenoid-yielding species using molecular marker-assisted selection.
In this study, transcriptome sequencing data analysis showed that there were 1202 DEGs in group H vs. group L, including 689 upregulated and 513 downregulated genes. Two genes (LfTPS1, LfTPS2,) containing typical TPS domains were identified from differentially expressed genes, and the expression levels of the above two genes were highly correlated with the relative content of α-pinene and β-pinene, which were defined as candidate genes regulating the biosynthesis of α-pinene and β-pinene in Liquidambar formosana leaves.

Materials and Instruments
L. formosana plants were kindly provided by Dr. Li (Dongjiang Forestry Station, Guangdong Province, China), (114 • 44 E, 23 • 38 N). The field conditions included an average annual temperature of 20.9-21.5 • C, and the average annual rainfall was 1600-1900 mm. Afforestation occurred in May 2005, belongs to native broad-leaved seed generation test forest. Healthy L. formosana samples from 12 different individual plants were harvested between 14:00 and 16:00 in August 2021. Samples for headspace solid-phase microextraction gas chromatography-mass spectrometry (HS-SPME-GC-MS) were collected and stored at 15 • C before analysis. According to the results of HS-SPME-GC-MS, 6 plants with low and high pinene content were selected. The transcriptome sequencing materials were harvested and immediately frozen in liquid nitrogen. Later, the materials were stored at −80 • C for transcriptome sequencing and reverse transcription-quantitative polymerase chain reaction (RT-qPCR) analyses.

HS-SPME of Volatile Components from L. formosana Leaves
Samples were quickly ground into powder by pulverizer, approximately 1 g was weighed and immediately sealed in 20 mL SPME vials for further analysis. Solid phase microextraction fibers of DVB/CAR/PDMS (divinylbenzene/carboxen/poly-dimethylsiloxane, 50/30 µm) (Shanghai Amp Experimental Technology Co., Ltd., Shanghai, China) were used to extract the VOCs of L. formosana. The fibers were conditioned for 60 min according to the temperature recommended by the manufacturer. The SPME fiber was equilibrated for 30 min to account for the distance between the sample and the headspace. Then, the SPME fiber was heated at 98 • C for 33 min to extract volatile compounds from each sample. The fiber was quickly moved into the GC-MS for thermal desorption and analysis performed. Three technical replicates were used from each of the three biological replicates.

GC/MS Analysis
Samples were analyzed and identified using an Agilent 7890A Series GC system coupled with an Agilent 5975C Mass Selective Detector (Agilent Technologies, Santa Clara, CA, USA). Helium was used as the carrier gas at a flow rate of 1.0 mL/min with an HP-INNOWAX capillary column (30 m × 250 µm × 0.25 µm). The inlet temperature was 200 • C. The column temperature program of the GC was initially set at 40 • C for 3 min, heated to 100 • C at 3 • C/min, maintained for 5 min, increased to 200 • C at 20 • C/min, and maintained for 7 min. Mass spectrometry detection used an electron impact (EI) ionization system at 70 eV, the ionization source temperature was 280 • C, the quadrupole temperature was 150 • C, the full-scan acquisition mode was performed with a mass range of 33-350 (m/z), and the tuning file included standard tuning. Constituents were identified by comparing the mass spectra with the National Institute of Standards and Technology (NIST08, http://webbook.nist.gov/chemistry/ (accessed on 29 November 2022)) library and published data. Furthermore, the retention time (RT) of the standard substances for the representative compounds was measured according to the above experimental conditions. Relative percentages (RPs) were calculated to determine the proportion of each component.
RNA-seq transcriptome libraries were created using 1 g of total RNA and the Illumina TruSeq RNA sample preparation kit (San Diego, CA, USA). Briefly, messenger RNA was isolated using polyA beads and fragmented using a fragmentation buffer. Illumina-indexed adaptors were ligated after cDNA synthesis, end repair, A-base addition, and ligation, according to the Illumina protocol. Libraries were size selected for cDNA target fragments of 200-300 bp on 2% low-range ultra-agarose, which were then PCR-amplified for 15 cycles with Phusion DNA polymerase (NEB). Paired-end libraries were sequenced using Illumina NovaSeq 6000 (150 bp × 2, Shanghai BIOZERON Co., Ltd., Shanghai, China) after quantification using a TBS-380 mini-fluorometer. The Illumina paired-end sequencing service was provided by Shanghai Ling En Biotechnology Co., Ltd., Shanghai, China.

Transcriptome Data Analysis and Annotation
Processing of the transcriptome sequence data was performed through a filtering process considering a Q-value of 30 using the fastqc and Trimmomatic toolkit. We first removed adaptor sequences, empty reads, and low-quality sequences. The total clean reads were aligned to the L. formosana reference genome (unpublished) using Hisat 2 software. The alignment results were quantified using the R language package feature Counts, and the read count for each gene was normalized to transcripts per million (TPM) [40]. The effects of gene length and sequencing depth can be eliminated using TPM, which can also directly compare gene expression differences between different samples. Differential expression analysis was conducted using DESeq2 v1.10.1 (Michael Love, 2016). Multiple testing hypothesis correction was performed to estimate relative expression levels. An adjusted p-value < 0.05 and |log2FC| ≥ 1 were used as the threshold parameters of significantly differentially expressed genes (DEGs).
Functional annotation was performed according to the Gene Ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) to analyze the biosynthesis of terpenoid pathways and genetic pathways involved in this study. Pathway analysis was performed in KOBAS v3.0.3 (Dechao Bu, Haitao Luo, et al., 2017), to systematically explore the functions of DEGs. The main biological functions were determined and the terpenoid synthesis pathways were obtained. The coding sequence of key enzymes involved in this pathway was screened by searching for the gene name of the EC number.

Screening of TPS Genes Related to α-Pinene and β-Pinene
We initially selected DEGs that were directly annotated as TPSs by KEGG. Then, a local BLASTP analysis was performed for TPS sequences using previously reported related sequences from other plants. Then, phylogenetic analysis of the TPSs of L. formosana was performed in MEGA 11.0 using the neighbor-joining (NJ) method with n = 1000 bootstrap replicates. The TPS amino acid sequence was analyzed by MEME (https://meme-suite. org/meme/doc/meme.html; accessed on 18 March 2022) online analysis website, and the conserved motif was predicted. The predicted motif was further verified by the Pfam (https://pfam.xfam.org; accessed on 20 March 2022) online website. According to the differential genes containing specific domains, the information of phylogenetic tree and the prediction results of structural motifs, the genes containing TPS domains were screened. IBM SPSS (version 21.0; Armonk, NY, USA) was used to obtain Spearman's correlation coefficients between terpene synthase and the content of α-pinene and β-pinene. Finally, a gene metabolite network map was established using Cytoscape (v3.9.1, Shannon P et al., 2003) [41] based on Spearman's correlation coefficients.

Quantitative Real-Time PCR (qRT-PCR) Analysis
Quantitative real-time PCR was used to validate the reliability of RNA sequencing data as previously described [42]. The total RNA of the samples was extracted and reversed into cDNA for RT-qPCR analysis using applied biosystems 7500. The PCR procedure was pre-denaturation at 95 • C for 3 min, denaturation at 95 • C for 10 s, annealing at 60 • C for 30 s, 40 cycles. Three biological replicates per reaction, relative expression levels calculated using the 2 −∆∆Ct method, with actin serving as the reference gene. Measurements were obtained in triplicate from three biological replicates. All primers were designed using Primer Premier software (version 5.0, PREMIER Biosoft, San Francisco, CA, USA) for the six unigenes (Supplementary Table S1).

Gene Expression Quantification and DEG Identification
The Illumina HiSeq 6000 platform was used to determine the transcrip quences of the six trees (H and L groups). A total of 88 Gb of raw data was Quality control filtered reads were compared to the L. formosana reference gen published) using Hisat 2(Mihaela Pertea et al,2020). The reads of unassigned un unassigned nofeatures, and unassigned ambiguity were at a lower level, and reads were high enough. The overall alignment rates of H1, H2, H3, L1, L2, and 86.91%, 86.54%, 86.89%, 88.13%, 87.31%, and 88.50%, respectively (Table 1; Fi These results indicated that the transcriptome sequencing and alignment rat study were high enough for subsequent analyses.

Gene Expression Quantification and DEG Identification
The Illumina HiSeq 6000 platform was used to determine the transcriptome sequences of the six trees (H and L groups). A total of 88 Gb of raw data was obtained. Quality control filtered reads were compared to the L. formosana reference genome (unpublished) using Hisat 2 (Mihaela Pertea et al., 2020). The reads of unassigned unmapped, unassigned nofeatures, and unassigned ambiguity were at a lower level, and assigned reads were high enough. The overall alignment rates of H1, H2, H3, L1, L2, and L3 were 86.91%, 86.54%, 86.89%, 88.13%, 87.31%, and 88.50%, respectively (Table 1; Figure 2a). These results indicated that the transcriptome sequencing and alignment rates in this study were high enough for subsequent analyses. These results indicated that the transcriptome sequencing and alignment rates in this study were high enough for subsequent analyses. The DEGs were designated according to the expression level of |log2(fold change)|>1 and p-value < 0.05 in the H vs. L group. A total of 1202 DEGs were identified between the two groups (689 upregulated and 513 downregulated) (Figure 2b and Supplementary Table S3).

Gene Ontology Functional Annotation and KEGG Enrichment Analysis of DEGs
The synthesis of related secondary metabolites may be regulated by DEGs at the transcriptional level. Gene Ontology enrichment analyses were performed to reveal the biological functions of L. formosana leaves to exhaustively explore the presumed function of DEGs (Supplementary Table S4). The main molecular function (MF), cellular component (CC), and biological process (BP) categories were identified. A total of 1202 DEGs were clustered into 21 biological processes, eight cellular components, and six molecular functions (Figure 3a). Most subcategories were related to the cellular response to blue light, entrainment of the circadian clock, starch catabolic processes, and positive regulation of circadian rhythm in the BP categories. The largest percentages of DEGs in the CC category were related to the thylakoid membrane, protein-folding chaperones, and plastoglobules. There were only eight subcategories enriched in cellular components. This mainly included secondary active sulfate transmembrane transporter activity and water channel activity.
Additionally, enrichment analysis focused on the metabolic pathways and the (KEGG) direct homology system [KEGG Orthology (KO)] to gain insight into the biological functions of different proteins in coordination with each other. The DEGs from the H vs. L group comparison were assigned to 13 pathways, while the top three enriched pathways were "arachidonic acid metabolism," "circadian rhythm-plant," and "photosynthesis-antenna proteins" (Figure 3b). The top 10 terms with the largest number of differential genes were taken for the plot, the horizontal coordinate is the number of differential genes in that GO term, and the vertical coordinate is the specific GO term. (b) KEGG enrichment classification of DEGs. The horizontal coordinate is the −log10 p-value of the DEGs in the KEGG pathway name, and the vertical coordinate is the specific pathway name. The darker the orange-red color, the smaller the −log10 p-value; the darker the blue-green color, the larger the −log10 p-value. The top 10 terms with the largest number of differential genes were taken for the plot, the horizontal coordinate is the number of differential genes in that GO term, and the vertical coordinate is the specific GO term. (b) KEGG enrichment classification of DEGs. The horizontal coordinate is the −log10 p-value of the DEGs in the KEGG pathway name, and the vertical coordinate is the specific pathway name. The darker the orange-red color, the smaller the −log10 p-value; the darker the blue-green color, the larger the −log 10 p-value.
Additionally, enrichment analysis focused on the metabolic pathways and the (KEGG) direct homology system [KEGG Orthology (KO)] to gain insight into the biological functions of different proteins in coordination with each other. The DEGs from the H vs. L group comparison were assigned to 13 pathways, while the top three enriched pathways were "arachidonic acid metabolism", "circadian rhythm-plant", and "photosynthesis-antenna proteins" (Figure 3b).

Identification of Genes Related to Monoterpene Biosynthesis
Kobas software was used to identify contigs related to the biosynthesis of the MEP and MVA pathways. KEGG metabolic pathway analysis showed that the MEP pathways related to L. formosana monoterpene synthesis were the terpene skeleton biosynthesis pathways (ko00900) and the monoterpene biosynthesis pathways (ko00902). We focused on the expression patterns of genes involved in terpene biosynthesis to further assess the differences in volatile terpene biosynthesis between the H and L groups (Figure 4). Nine DEGs participating in the terpene synthesis pathway showed significant activities. Gene expression of key enzymes in the MEP pathway is more active than that in the MVA pathway. These nine DEGs consisted of two homologs of HDS and one homolog of DXR. Differentially expressed genes were substantially increased in the H group compared to those in the L group. Of these, DXR (LfDXR1), was the highest in the H group, and the HDS (LfHDS1, LfHDS2), displayed the highest level in the H group. Five TPS-encoding genes related to monoterpene synthesis in the L. formosana transcriptome. LfTPS1, LfTPS3 and LfTPS4 were upregulated and LfTPS2 and LfTPS5 were downregulated in the H group. This correlated with the content of α-pinene and β-pinene. We believe that this will help identify novel genes involved in the synthesis of pathways of secondary metabolites or terpenoid biosynthesis in L. formosana. of gene changes in qRT-PCR was roughly consistent with that detected in the transcriptome. The correlation between the expression of LfTPS3 in each sample and qRT-QPCR reached 0.90. Three genes (LfTPS3, LfTPS1, and LfTPS4) exhibited similar transcriptional patterns to α-pinene and β-pinene production. This suggested that they directly affected α-pinene and β-pinene production. Simultaneously, we identified five TPS-encoding genes related to monoterpene synthesis in the L. formosana transcriptome. Quantitative real-time PCR was used to detect the expression levels of five DEGs in the H and L groups (Supplementary Table S5). The trend of gene changes in qRT-PCR was roughly consistent with that detected in the transcriptome. The correlation between the expression of LfTPS3 in each sample and qRT-QPCR reached 0.90. Three genes (LfTPS3, LfTPS1, and LfTPS4) exhibited similar transcriptional patterns to α-pinene and β-pinene production. This suggested that they directly affected α-pinene and β-pinene production.
Phylogenetic analysis was performed based on published terpene synthase genes associated with the synthesis of α-pinene and β-pinene, combined with the corresponding amino acid sequences (Figure 5a). LfTPS3, LfTPS2, and LfTPS1 were highly homologous to α-pinene and β-pinene synthases of other species, such as Vitis vinifera. LfTPS4 and LfTPS5 are members of distinct branches. Conserved motif analysis performed with the MEME program revealed that TPS members clustered in the same branch had similar motif characteristics (Figure 5b). The predicted results indicated that LfTPS3, LfTPS1, LfTPS2, and other published terpene synthase genes contained 10 identical motifs with the same sequence. For example, the RR (X) 8W and DD (X) 2D motifs (Figure 5c,d). An identical motif was not detected in LfTPS4 and LfTPS5. Furthermore, Pfam analysis of all motif sequences showed that motifs 1, 2, 3, 4, 5, and 9 were located in the C-terminal domain of TPS, and motifs 6, 7, and 10 were located in the N-terminal domain of TPS. LfTPS3, LfTPS1, and LfTPS2 all contained the C-terminal and N-terminal TPS domains.
Phylogenetic analysis was performed based on published terpene synthase genes associated with the synthesis of α-pinene and β-pinene, combined with the corresponding amino acid sequences (Figure 5a). LfTPS3, LfTPS2, and LfTPS1 were highly homologous to α-pinene and β-pinene synthases of other species, such as Vitis vinifera. LfTPS4 and LfTPS5 are members of distinct branches. Conserved motif analysis performed with the MEME program revealed that TPS members clustered in the same branch had similar motif characteristics (Figure 5b). The predicted results indicated that LfTPS3, LfTPS1, LfTPS2, and other published terpene synthase genes contained 10 identical motifs with the same sequence. For example, the RR (X) 8W and DD (X) 2D motifs (Figure 5c,d). An identical motif was not detected in LfTPS4 and LfTPS5. Furthermore, Pfam analysis of all motif sequences showed that motifs 1, 2, 3, 4, 5, and 9 were located in the C-terminal domain of TPS, and motifs 6, 7, and 10 were located in the N-terminal domain of TPS. LfTPS3, LfTPS1, and LfTPS2 all contained the C-terminal and N-terminal TPS domains.  A correlation analysis was performed to clarify the relationship between the TPM values of the five TPS genes and terpene content and form correlation networks (Supplementary Table S6). The correlation coefficients between LfTPS3 and α-pinene and β-pinene were >0.6, and the p-values were >0.05; therefore, this gene was not selected for further correlation analysis. The correlation network expression pattern showed that LfTPS1 expression positively correlated with α-pinene and β-pinene accumulation, whereas  Table S6). The correlation coefficients between LfTPS3 and α-pinene and β-pinene were >0.6, and the p-values were >0.05; therefore, this gene was not selected for further correlation analysis. The correlation network expression pattern showed that LfTPS1 expression positively correlated with α-pinene and β-pinene accumulation, whereas LfTPS5 and LfTPS2 negatively correlated with the production of α-pinene and β-pinene. LfTPS4 positively correlated with α-pinene production ( Figure 6).
A correlation analysis was performed to clarify the relationship between the values of the five TPS genes and terpene content and form correlation networks (Su mentary Table S6). The correlation coefficients between LfTPS3 and α-pinene and nene were >0.6, and the p-values were >0.05; therefore, this gene was not selected fo ther correlation analysis. The correlation network expression pattern showed that L expression positively correlated with α-pinene and β-pinene accumulation, wh LfTPS5 and LfTPS2 negatively correlated with the production of α-pinene and β-p LfTPS4 positively correlated with α-pinene production ( Figure 6).

Discussion
Epigenetically related metabolites are associated with epigenetic modifications and epigenetic modification patterns vary from species to species [43]. Terpenoids are natural products that have a complex biosynthesis and release mechanism and a diverse spectrum of biological functions, which play an important role in adaptation to specific ecological conditions, information transfer, and chemical defense in plant life processes [44,45]. For example, floral volatile terpenoids attract pollinators, while toxic terpenoids are released as phytotoxins to defend against herbivores, harmful insects, pathogenic microorganisms, and other threats. Some can even mediate the interaction of plants with the surrounding biotic and abiotic factors [46][47][48][49][50]. L. formosana leaves are rich in terpenoids, and its primary component (pinene) has great potential for application in food preservation, and pharmaceutical and cosmetic industries. This study showed that the vast majority of volatile compounds from L. formosana leaves are α-pinene, β-pinene, caryophyllene, and limonene based on HS-SPME-GC-MS characterization. This correlated with previous studies [11,14,51]. In addition, the germplasm of L. formosana in this study contained high concentrations of α-pinene and β-pinene. This germplasm has not been effectively utilized and protected, and its leaf fragrance has not been comprehensively evaluated, hindering better exploitation and utilization of high-quality L. formosana resources. Pharmacological activities of α-pinene and β-pinene include antibacterial, antiviral, anti-spasmodic, anti-fungal, anti-viral, anti-cancer, anti-malarial, anti-oxidation, anti-inflammatory, and sedative effects [52][53][54].
Terpenoid content is a complex and diverse cause of species differences in various environments [55,56]. Pathways related to α-pinene and β-pinene biosynthesis and molecular regulation of L. formosana are not extensively elucidated compared with other well-studied plant families such as Lamiaceae [57], Pinaceae [36], and Apiaceae. Highthroughput sequencing techniques have rapidly increased our knowledge and understanding of the biosynthesis and molecular regulatory frameworks of valuable phytochemicals by generating transcriptome sequence data from important plants [58]. This study performed transcriptome sequencing of six trees of L. formosana leaves. This provides insights into the molecular mechanisms underlying terpenoid synthesis in L. formosana. The H group had mostly higher expression of the DEGs relevant to the terpenoid biosynthesis pathway compared with those in the L group. This indicated that the H group had en-hanced terpenoid synthesis. Terpenoids consist of IPP (C5) units that are derived from two distinguishable biochemical pathways in plants: the plastidial MEP pathway and the cytosolic acetate-MVA pathway. However, monoterpenoids are solely produced by the MEP pathway from IPP [58,59]. The MEP pathway is involved in the synthesis of IPP and DMIPP by seven enzymes (DXS, DXR, MCT, CMK, MDS, HDS, and HDR). DXS is a key enzyme in the MEP pathway and some studies suggest that DXS regulates monoterpene production and is mainly involved in the buildup of linalool, geraniol, α-terpinol, and other substances in grape aroma and Freesia hybrida [35,60,61]. Kiwifruit has two DXS genes, with co-expression of DXS1 and TPS genes significantly increasing monoterpenoid accumulation [62]. DXS overexpression in peppermint enhances the accumulation of essential oil substances [63]. DXR encodes the second key enzyme in the MEP pathway, is the process's rate-limiting enzyme, is an effective regulatory target, and is involved in terpene synthesis [64]. HDR genes influence the ultimate synthesis of terpenoids by indirectly regulating isoprene synthesis. GPPS6 is distributed in secretory gland cells and chloroplasts and indirectly regulates monoterpene biosynthesis. This study showed that the DXR, HDR, and GPPS genes were highly expressed in the H group, and we presumed that they are involved in the regulation of principal compounds in L. formosana leaf fragrance.
IPP is synthesized into monoterpenes, sesquiterpenes, and diterpenes with different carbon chain lengths by the action of isopentenyl transferases and terpene synthases. The structural diversity of terpenoids is mainly based on terpene synthases and related genes [36,65,66], and a series of monoterpenes and sesquiterpenes are synthesized by different TPSs with FPP and GPP as substrates. Phylogenetic analysis revealed that LfTPS3, LfTPS2, and LfTPS1 are highly homologous and similar to α-pinene and β-pinene synthases of other species. Four TPS genes (LfTPS1, LfTPS2, LfTPS4, and LfTPS5) might be critically involved in the biosynthesis and regulation of α-pinene and β-pinene in L. formosana according to interaction networks. However, predicting the function of the TPS gene solely based on sequence similarity is rather unpromising because the sequence homology between terpene synthase proteins is independent of function [67]. Furthermore, most TPSs are multiproduct enzymes capable of producing a diverse set of compounds [68]. For example, 2-limonene synthase in spearmint (Mentha spicata) and peppermint (Mentha × piperita) produces α-pinene and (-)-β-pinene, together with the monocyclic product [69,70]. Therefore, we propose that future research should focus on characterizing the specific functions of these three genes.
L. formosana leaves contain various high-value volatile compounds, including α-pinene, β-pinene, caryophyllene, and limonene. They are important raw ingredients in the pharmaceutical, chemical, and spice industries. At present, L. formosana leaf resources are underutilized because of a lack of good asexual lines, and the extraction of essential oils and chemical analysis methods are subject to the season, degree of growth and development, and other factors. We performed a fully integrated analysis of the transcriptome and volatile profiles in L. formosana leaves to gain fresh insight into how α-pinene and β-pinene accumulate and to distinguish the genes involved. These findings provide a promising pool of genes involved in the biosynthesis of α-pinene and β-pinene in L. formosana leaves. This offers a reference for further exploration into its economical, medicinal, and ecological values.

Conclusions
Overall, individual plants with significantly high (H) and low (L) contents of α-pinene and β-pinene were identified in this study. α-pinene and β-pinene were the most abundant volatile compounds in the "H group".
Transcriptome sequencing data analysis showed that there were 1202 DEGs in group H vs. group L, including 689 upregulated and 513 downregulated genes. Specifically, DXR, HDS, and TPS may be the major regulators of leaf terpene biosynthesis. We identified three terpene synthases (LfTPS3, LfTPS2, LfTPS1) that were highly homologous to α-pinene and β-pinene synthases from other species in the phylogenetic analysis. The above three genes were identified as containing typical TPS domains, and four TPS genes (LfTPS1, LfTPS2, LfTPS4, and LfTPS5) were highly correlated with α-pinene and β-pinene. These may be critically involved in the biosynthesis and regulation of α-pinene and β-pinene according to interaction networks. Therefore, LfTPS1 and LfTPS2 are defined as the key candidate genes regulating the biosynthesis of α-pinene and β-pinene. This study sheds light on the genes involved in the biosynthesis of α-pinene and β-pinene in L. formosana leaves. The results lay the groundwork for future research on the functional characterization of candidate TPSs.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/xxx/s1, Figure S1: Comparison of L. formosana leaf compounds in all samples.; Figure S2: shows ten predicted motifs; Table S1: Primer information used in qRT-PCR assays of five genes related to monoterpene pathways; Table S2: GC-MS detection data of the metabolome; Table  S3: The expression of all the genes in the transcriptome.; Table S4: GO functional annotation and KEGG enrichment analysis data of DEGs.; Table S5: qRT-PCR results.; Table S6: Interaction network data between the 4 TPSs of L. formosana, α-pinene, and β-pinene.; Table S7: Provenance and accession number of terpene synthase.

Conflicts of Interest:
The authors declare no conflict of interest.