Transcriptomic Analysis for the Identification of Metabolic Pathway Genes Related to Toluene Response in Ardisia pusilla

Toluene is an industrial raw material and solvent that can be found abundantly in our daily life products. The amount of toluene vapor is one of the most important measurements for evaluating air quality. The evaluation of toluene scavenging ability of different plants has been reported, but the mechanism of plant response to toluene is only partially understood. In this study, we performed RNA sequencing (RNA-seq) analysis to detect differential gene expression in toluene-treated and untreated leaves of Ardisia pusilla. A total of 88,444 unigenes were identified by RNA-seq analysis, of which 49,623 were successfully annotated and 4101 were differentially expressed. Gene ontology analysis revealed several subcategories of genes related to toluene response, including cell part, cellular process, organelle, and metabolic processes. We mapped the main metabolic pathways of genes related to toluene response and found that the differentially expressed genes were mainly involved in glycolysis/gluconeogenesis, starch and sucrose metabolism, glycerophospholipid metabolism, carotenoid biosynthesis, phenylpropanoid biosynthesis, and flavonoid biosynthesis. In addition, 53 transcription factors belonging to 13 transcription factor families were identified. We verified 10 differentially expressed genes related to metabolic pathways using quantitative real-time PCR and found that the results of RNA-seq were positively correlated with them, indicating that the transcriptome data were reliable. This study provides insights into the metabolic pathways involved in toluene response in plants.


Introduction
As a toxic volatile organic solvent, toluene is widely used in the production of daily life products, such as gasoline mixtures, benzene, cosmetics, ink, adhesives, paints, coatings, and glues [1]. It is difficult to degrade toluene by ultraviolet rays or atmospheric oxidation [2]. With the increase in population and development of cities, people stay indoors longer, and interior air quality has become more important. Toluene is easily absorbed by humans through the respiratory tract because of its widespread use at the individual and industrial levels [3]. In addition, the concentration of toluene in indoor air is usually higher than that in outdoor air, which makes it inevitable for humans to be exposed to toluene [4]. Toluene is hazardous to human health. Both short-term exposure to high concentrations or long-term exposure to low concentrations of toluene may cause serious health-related issues, such as dizziness, nausea, vomiting, loss of appetite, dermatitis, lung injury, and damage to the central nervous and immune systems, and it may even cause death in severe cases [5][6][7].
Ornamental plants are receiving greater attention for indoor purification of air because they are economically viable and environmentally friendly. To date, the ability of many plants to specifically remove indoor harmful gases has been studied [8,9]. In these studies, the purification capacity of different plants for toluene has also been investigated [8,[10][11][12].
Among the 12 plant species tested, Sansevieria trifasciata has been found to have the highest toluene removal capacity [12]. However, little is known about the toluene metabolism pathway in plants. Therefore, there is an urgent need to understand the metabolic response mechanism of plants to toluene and develop new germplasms with the ability to enhance toluene metabolism. These new germplasms can be used as purification plants to improve air quality economically and effectively.
Ardisia is a flowering plant in the subfamily Myrsinoideae. At present, more than 700 species of this plant have been reported, located mainly in the tropics [13]. Ardisia pusilla (A. pusilla) is a shrub with light green leaves, white flowers, and red berries. It is a popular indoor ornamental plant in Korea [14]. According to Song, the removal efficiency of toluene by A. pusilla is significantly higher than that of other plants [15]. Ahn et al. overexpressed the AtNDPK2 gene in A. pusilla to further improve its ability to absorb toluene [14]. Previous studies on the AtNDPK2 gene have mainly reported that it can improve abiotic stress tolerance in plants [11,[16][17][18].
As a high-throughput next-generation sequencing (NGS) technology, RNA sequencing (RNA-seq) has developed rapidly and has become an indispensable tool for analyzing differential gene expression and gene regulatory networks in transcriptome research. Particularly for atypical plants, even if there is no reference genome, the genome can be constructed by de novo transcriptome assembly [19]. Chen et al. revealed the metabolic defense mechanism of rice against predominant polybrominated diphenyl ether (PBDE) pollution by performing RNA-seq on rice grown to maturity in soil containing PBDE [20]. To date, RNA-seq has not been used to study the metabolic defense mechanisms of plants against toxic volatile organic compounds.
In this study, we performed RNA-seq analysis using Illumina NovaSeq 6000 (Illumina, San Diego, CA, USA) to conduct an in-depth sequencing of A. pusilla after toluene treatment, followed by comparison of toluene-treated and untreated plants. To understand the reactive response mechanism of plants to toluene exposure and the metabolic pathway of toluene in plants, we analyzed the differential gene expression of the transcriptome and verified it by quantitative real-time PCR (qRT-PCR). We also examined the gene expression related to toluene metabolism in AtNDPK2-transgenic A. pusilla plants with enhanced toluene absorption capacity.

Illumina Sequencing and De Novo Assembly
To gain additional insight into the transcriptomic response of A. pusilla to toluene treatment, the leaves of the toluene-treated (1-1, 1-2) and untreated groups (2-1, 2-2) were chosen to construct cDNA libraries. Four cDNA libraries were generated containing 135. 59 Trimmed reads for each sample were aligned to the assembled reference using the Bowtie program [21]. For DEG analysis, the abundance of unigenes across the samples was estimated into read count as an expression measure using the RSEM algorithm [22]. Table 1 shows the overall read mapping ratio for each sample; 1-1, 1-2, 2-1, and 2-2 had 74.33%, 72.08%, 76.99%, and 71.92% of the mapped sequencing reads, respectively, obtained after aligning to the assembled reference genome.
Trimmed reads for every sample were merged into one file to construct a combined reference. Since there were no genomic sequences of A. pusilla, the merged data were assembled into transcript contigs using Trinity software. The statistics of the assembled transcript contigs are listed in Table S2. The sequence reads were assembled into 99,322 contigs covering 93,764,604 bases, with N50 values of 1137 bp and an average contig length of 694.06 bp (Table S2). The CD-HIT-EST program (http://weizhongli-lab.org/cd-hit, accessed on 22 February 2020) was used to filter and cluster 88,444 unigenes from the longest contigs.

Annotation Results
For functional annotation, 49 Trimmed reads for every sample were merged into one file to construct a reference. Since there were no genomic sequences of A. pusilla, the merged da sembled into transcript contigs using Trinity software. The statistics of the assem script contigs are listed in Table S2. The sequence reads were assembled into 9 tigs covering 93,764,604 bases, with N50 values of 1137 bp and an average co of 694.06 bp (Table S2). The CD-HIT-EST program (http://weizhongli-lab.org cessed on 19 April 2021) was used to filter and cluster 88,444 unigenes from t contigs.

Annotation Results
For functional annotation, 49 Annotated unigene ratio for merge

Annotation on GO Database
The GO database was used to classify the annotated unigenes for functional annotation of the A. pusilla leaf transcriptome. The GO terms belonging to BP, CC, and MF were identified. There were 52 subcategories related to BP, including representative subcategories as the "cellular process" (21,167 unigenes) and "metabolic process" (18,724 unigenes). There were 26 subcategories related to CC, mainly including the "cellular part" (30,669 unigenes) and "organelle" (19,538 unigenes). There were 16 subcategories related to MF, which were mainly involved in "catalytic activity" (17,323 unigenes) and "binding" (16,558 unigenes) (Figure 2a and: Table S3). The results revealed that most of the identified genes were involved in the "cellular part" and "metabolic process" in the response of A. pusilla to toluene.

Annotation on GO Database
The GO database was used to classify the annotated unigenes for functional annotation of the A. pusilla leaf transcriptome. The GO terms belonging to BP, CC, and MF were identified. There were 52 subcategories related to BP, including representative subcategories as the "cellular process" (21,167 unigenes) and "metabolic process" (18,724 unigenes). There were 26 subcategories related to CC, mainly including the "cellular part" (30,669 unigenes) and "organelle" (19,538 unigenes). There were 16 subcategories related to MF, which were mainly involved in "catalytic activity" (17,323 unigenes) and "binding" (16,558 unigenes) (Figure 2a and: Table S3). The results revealed that most of the identified genes were involved in the "cellular part" and "metabolic process" in the response of A. pusilla to toluene.

Annotation on EggNOG Database
To identify the proteins distributed in eukaryotic orthologous groups, clusters of orthologous groups and non-supervised orthologous groups, the annotated unigenes were mapped to the annotation of the corresponding orthologous groups in the EggNOG database. For EggNOG annotation, 42,197 unigenes were divided into 25 EggNOG categories. Some of the unigenes were assigned to more than one category. The largest proportion of unigenes belonged to "undefined function" (44.66%), followed by "posttranslational modification, protein turnover, chaperones" (8.7%), "translation, ribosomal structure and biogenesis" (5.65%), "transcription" (5.04%), and "signal transduction mechanisms" (4.88%) (Figure 2b). In particular, "secondary metabolites biosynthesis, transport and catabolism" category accounted for 1.77% of total annotated unigenes by EggNOG (Figure 2b), and this category was further analyzed for its role in the metabolic pathway underlying toluene response in A. pusilla plants.

DEG Analysis Results
Based on the results of the annotation analysis, DEGs were analyzed, and significant results were selected according to the following conditions: |fc| ≥ 2 and p-value < 0.05. Comparing the transcription profiles of toluene-treated and untreated groups, 4101 DEGs were obtained, of which 2100 were upregulated and 2001 were downregulated ( Table 2).  Figure S1). The number of upregulated DEGs (1456) was similar to that of downregulated DEGs (1398) ( Table 2). GO analysis demonstrated that a large number of GO subcategories, such as cellular processes (1504), metabolic processes (1340), biological regulation (809), and response to stimulus (626), were the most enriched categories in BP ( Figure S1). The highest number of DEGs involved in CC belonged to cell part (2253), organelle (1454), membrane (675), and organelle part (584) ( Figure S1). In addition, in terms of MF, catalytic activity (1245), binding (1130), transcription regulator activity (229), and structural molecular activity (194) were the most enriched subcategories ( Figure S1).
To identify different pathways of activation in A. pusilla plants after toluene treatment, we analyzed the KEGG pathway of all DEGs, in which 1610 genes were upregulated and 1491 genes were downregulated ( Table 2). In this study, DEGs were mapped to 95 pathways (data not shown), and the top 20 major pathways are listed in Figure 3. In addition, the metabolic pathway, biosynthesis of secondary metabolites, ribosome, and carbon metabolic pathways had the highest number of DEGs between toluene-treated and untreated A. pusilla plants ( Figure 3). Interestingly, all 48 genes of the ribosome pathway were upregulated ( Figure S2). A total of 11 genes involved in endocytosis were downregulated ( Figure S3). In particular, we focused on the AT4G09320 gene (NDPK1) of the MAPK signaling pathway. The expression of NDPK1 in the toluene-treated group was 23.2-fold higher than that in the untreated group (Additional file 1: Table S4). A previous study also reported that NDPK2 transgenic A. pusilla plants show significantly increased toluene uptake compared to non-transgenic plants [14].

Toluene Metabolic Pathway
Based on the KEGG analysis of DEGs, we constructed a metabolic pathway for toluene in A. pusilla plants (Figure 5a). Our results revealed that the expression of the malate dehydrogenase gene (MDH) of the tricarboxylic acid cycle (TCA cycle) was upregulated, indicating that the production of oxaloacetate and phosphoenolpyruvate had increased. Phosphoenolpyruvate is catalyzed by enolase (encoded by LOS2) to form 2phosphoglycerate, which enters glycerophospholipid metabolism and starch and sucrose metabolism through glycolysis and glycolysis, respectively. In addition, phosphoenolpyruvate is catalyzed by pyruvate kinase (encoded by GGPS1) to form pyruvate, which enters carotenoid, phenylpropanoid, and flavonoid biosynthesis through terpenoid backbone biosynthesis (Figure 5a,b). Moreover, the expression of three genes, GDH2, GLN1.3, and GSR_1, involved in alanine, aspartate, and glutamate metabolism, respectively, was upregulated ( Figure 5b).

Toluene Metabolic pathway
Based on the KEGG analysis of DEGs, we constructed a metabolic pathway for toluene in A. pusilla plants (Figure 5a). Our results revealed that the expression of the malate dehydrogenase gene (MDH) of the tricarboxylic acid cycle (TCA cycle) was upregulated, indicating that the production of oxaloacetate and phosphoenolpyruvate had increased. Phosphoenolpyruvate is catalyzed by enolase (encoded by LOS2) to form 2-phosphoglycerate, which enters glycerophospholipid metabolism and starch and sucrose metabolism through glycolysis and glycolysis, respectively. In addition, phosphoenolpyruvate is catalyzed by pyruvate kinase (encoded by GGPS1) to form pyruvate, which enters carotenoid, phenylpropanoid, and flavonoid biosynthesis through terpenoid backbone biosynthesis (Figure 5a,b). Moreover, the expression of three genes, GDH2, GLN1.3, and GSR_1, involved in alanine, aspartate, and glutamate metabolism, respectively, was upregulated ( Figure 5b).

Validation of RNA-Seq Results by qRT-PCR
To verify the reliability of the RNAseq results, ten genes related to the metabolic pathway were randomly selected for qRT-PCR analysis, including eight upregulated genes and two downregulated genes (Table S1). The relative expression changes in qRT-PCR results were largely consistent with the RNA-seq data. When we compared the toluenetreated group to the untreated group, the expression of OMT1, APG1, HAI, GAPCP1, YUC4, MDH, EFE, and TIM genes was upregulated by 5.  Figure 6). In RNA-seq analysis, two downregulated genes, NIR −4.7-fold) and AT4G17260 (−6.6-fold) were detected with similar trends to those revealed by qRT-PCR analysis with −2.9-fold and −2.2-fold downregulation, respectively ( Figure 6).

Validation of RNA-Seq Results by qRT-PCR
To verify the reliability of the RNAseq results, ten genes related to the metabolic pathway were randomly selected for qRT-PCR analysis, including eight upregulated genes and two downregulated genes (Table S1). The relative expression changes in qRT-PCR results were largely consistent with the RNA-seq data. When we compared the toluene-treated group to the untreated group, the expression of OMT1, APG1, HAI, GAPCP1, YUC4, MDH, EFE, and TIM genes was upregulated by 5.  Figure 6). In RNA-seq analysis, two downregulated genes, NIR −4.7-fold) and AT4G17260 (−6.6-fold) were detected with similar trends to those revealed by qRT-PCR analysis with −2.9-fold and −2.2-fold downregulation, respectively ( Figure 6). Our previous study showed that the detoxification ability of AtNDPK2-transgenic A. pusilla plants is enhanced by the exogenous application of toluene [14]. In the current study, the results of DEG analysis showed that AtNDPK1 expression was upregulated in toluene-treated A. pusilla. AtNDPK1-and AtNDPK2-coded proteins displayed 57% se-

Gene Expression Analysis Related to Toluene Purification in AtNDPK2-Transgenic A. pusilla Plants
Our previous study showed that the detoxification ability of AtNDPK2-transgenic A. pusilla plants is enhanced by the exogenous application of toluene [14]. In the current study, the results of DEG analysis showed that AtNDPK1 expression was upregulated in toluene-treated A. pusilla. AtNDPK1and AtNDPK2-coded proteins displayed 57% sequence identity. We speculated that the increase in NDPK expression may contribute to the absorption and metabolism of toluene in plants. The expression of genes related to the toluene metabolic pathway was detected in the NDPK2 transgenic plants by qRT-PCR. The expression of OMT1, APG1, HAI, GAPCP1, YUC4, MDH, EFE, and TIM genes in four AtNDPK2-transgenic plants was considerably higher than that in non-transgenic plants. In addition, the expression of NIR and AT4G17260 was higher in the non-transgenic plants than in transgenic plants. These results indicate that the introduction of AtNDPK2 could be helpful in increasing or decreasing the expression of these genes (Figure 7). RNA-seq analysis and qRT-PCR validation results showed that these genes were associated with toluene metabolism.

Discussion
In recent years, RNA-seq has become a powerful technology for understanding various molecular mechanisms and solving several biological problems. Even if there is no reference genome available, the short reads generated by Illumina sequencing can be assembled into contigs, and the DEGs can be identified by de novo transcriptome assembly [10].
In this study, we treated A. pusilla plants with toluene, followed by RNA-seq analysis of the treated and untreated leaves with Illumina, to obtain a large amount of cDNA se-

Discussion
In recent years, RNA-seq has become a powerful technology for understanding various molecular mechanisms and solving several biological problems. Even if there is no reference genome available, the short reads generated by Illumina sequencing can be assembled into contigs, and the DEGs can be identified by de novo transcriptome assembly [10].
In this study, we treated A. pusilla plants with toluene, followed by RNA-seq analysis of the treated and untreated leaves with Illumina, to obtain a large amount of cDNA sequence data, which provides a basis for the identification of DEGs and a more detailed study of the toluene metabolic pathway in plants. Based on Illumina RNA-seq data, each sample produced more than 14 GB of clean read segments, which were assembled into 88,444 unigenes with an average length of 637 bp and a median size of 1049 bp. Through BLASTX or BLASTN protein database annotation, including that of EggNOG, UniProt, NR, GO, Pfam, NT, and KEGG functional databases, 49,623 unigenes (56.11%) were successfully compared to known proteins, and 43.89% of the unigenes showed no similarity, indicating that more genetic data are required for annotation. GO is an international standard classification system for gene function [23]. It provides dynamic, controlled terminology, and strictly defined concepts to comprehensively describe the characteristics of genes and their products in any organism. Moreover, GO classification can provide an opportunity to understand the distribution of gene function at the macro level and predict the physiological function of each unigene [24]. In the present study, according to GO analysis, the most involved gene subcategory was "cell part," followed by cellular process, organelle, and metabolic process. DEGs were also distributed among the four gene sub-categories. There are nearly 90 reference metabolic pathways in the KEGG database, which is very helpful for researchers to study metabolic and regulatory processes [25]. Through the analysis of the KEGG enrichment pathway, 182 DEGs were identified, which helped us elucidate the toluene response metabolic pathway.
Our study aimed to explore the metabolic pathways of toluene response in A. pusilla plants by RNA-seq and qRT-PCR analyses. Experiments on labeled (14C) aromatic hydrocarbons have shown that after plants absorb toluene, the aromatic ring of toluene is cleaved during its metabolic transformation [26,27]. Ugrekhelidze et al. speculated that the oxidative cleavage of toluene could occur through two ways: oxidation of a methyl group to a carboxyl group, followed by ring hydroxylation to produce α-carboxymucic acid, and ring hydroxylation of toluene without preliminary oxidation of the methyl group to produce α-methylmucic acid [28]. Our results showed that the expression of MDH associated with the TCA cycle was significantly upregulated. This is consistent with the high radioactivity levels of malic and oxalic compounds reported previously by Ugrekhelidze et al. [28]. We also found that a large number of genes were upregulated in glycolysis/gluconeogenesis, starch and sucrose metabolism, glycerophospholipid metabolism, carotenoid biosynthesis, phenylpropanoid biosynthesis pathway, and flavonoid biosynthesis. Based on the above results, we constructed a comprehensive metabolic pathway map, clearly explaining the metabolic pathway of toluene (Figure 5a,b). Finally, we verified 10 genes in the metabolic pathway by qRT-PCR, which proved the reliability of the RNA-seq results.
Transcriptional regulation of gene expression depends on the recognition of promoter elements by TFs. In the past few years, several TFs have been identified in plants [29]. TFs involved in toluene metabolism in plants have not yet been reported. It is necessary to analyze the expression levels of TFs to understand their roles in toluene metabolism. In the present study, 53 TF-coding DEGs, associated with the most representative TF families (MYB, EREBP, AP2/B3, and WRKY), were identified. TFs play important roles in plant development and responses to the environment [30]. In the past decade, extensive studies have been carried out on R2R3-MYB-coding genes, and it has been found that the MYB family is large and has diverse functions, participating in various biological functions, including responses to biotic and abiotic stresses, development, differentiation, metabolism, and defense. Some examples include phenylpropanoid biosynthesis [31], carotenoid biosynthesis [32,33], and flavonoid biosynthesis [34,35]. MYB20 TF regulates phenylalanine biosynthesis [36], and MYB96 can activate Arabidopsis cuticle wax biosynthesis [37]. The expression of MYB121 in Arabidopsis is upregulated after exposure to formaldehyde [38]. In the present study, after A. pusilla plants were treated with toluene, RNA-seq analysis showed that the 14 genes encoding TFs of the MYB family were upregulated and three genes were downregulated (Figure 4). These results indicate that some TF-coding genes of the MYB family play important roles in toluene metabolism.
The expression of AP2/B3 TF family-coding genes (TEM1, RAV1, and RAV2) and WRKY TF family-coding genes (WRKY22, WRKY29, and WRKY33) were also downregulated (Figure 4b). TEM1 controls flower senescence and abscission [44]. Arabidopsis TEM1 and TEM2 double mutants are more tolerant than plants overexpressing TEM under conditions of increasing soil salinity [45]. Similarly, RAV1 and RAV2 are multifunctional negative regulators of plant growth, abiotic stress, drought, and salt stress [46]. This is consistent with our results; the expression of these genes was downregulated during toluene response, which may be negative regulators. WRKY TFs play an important regulatory role in various stress adaptations. WRKY22 and its homolog, WRKY29, are part of class IIE WRKYs, which are markers of pathogen-triggered immunity [47]. WRKY33 is related to the ROS detoxification mechanism, and overexpression of WRKY33 can improve Arabidopsis tolerance to salt stress [48]. In our results, we determined that WRKY22, WRKY29, and WRKY33 were highly sensitive to toluene treatment and that their expression was downregulated.
Nucleoside diphosphate kinase (NDPK) is a highly conserved multifunctional protein that is mainly involved in maintaining the balance between NDP and NTP, and it can regulate various eukaryotic cell activities, including cell proliferation, development, and differentiation [49]. At present, five NDPK-coding genes have been identified in plant species, namely, NDPK1 to NDPK5. The roles of NDPK1, NDPK2, and NDPK3 in signal transduction and oxidative stress have been demonstrated [50,51]. RNA-seq analysis in the current study showed that the expression of NDPK1 increased after toluene treatment. We speculate that NDPK1 plays an important role in toluene detoxification and metabolism. Previous studies have shown that the overexpression of NDPK2 enhances tolerance to various abiotic stresses [17,18], especially in favor of the detoxification ability of A. pusilla to toluene [14]. To verify whether the overexpression of AtNDPK2 affects the expression of other genes related to toluene reaction or metabolism, we used qRT-PCR to detect the expression of related DEGs after toluene treatment. The results showed that the overexpression of NDPK2 promoted or decreased the expression of genes related to toluene reaction (Figure 7).

Plant Materials and Toluene Treatment Conditions
A. pusilla plants were collected from the Floriculture Research Division, National Institute of Horticultural and Herbal Science, Rural Development Administration, Wanju 55365, South Korea. A. pusilla plants (20 cm height) grown in a greenhouse were placed in a sealed chamber at room temperature (25 ± 5 • C) under a pressure of 760 mm Hg with 12/12 h light/dark cycle. After that, toluene treatment was applied at a concentration of 1.5 ppm inside the chamber for 5 h. A. pusilla plants without toluene treatment were used as controls. The leaves of three plants in each of the toluene treatment group and the control group were collected, and immediately frozen in liquid nitrogen. This process was repeated twice, and all leaves were stored at −80 • C until further analysis.

RNA Extraction
A. pusilla leaf samples were ground into powder in a mortar with liquid nitrogen, and total RNA was extracted from the samples using TRIzol reagent (MRC, Cincinnati, OH, USA) according to the manufacturer's protocol. The quality and quantity of RNA were assessed using a spectra max quick drop microvolume spectrophotometer (Molecular Devices, San Jose, CA, USA) and analyzed on 1% agarose gel. The transcriptome sequencing libraries were prepared by mixing equal quantities of RNA from the samples [14].

Deep Sequencing
Two replicates of each toluene-treated and untreated RNA samples were sent to Macrogen Inc. (Seoul, Korea) for deep sequencing and generation of datasets. TruSeq Stranded mRNA LT Sample Prep Kit (Illumina, San Diego, CA, USA) was used to prepare a cDNA library according to the TruSeq Stranded mRNA Sample Preparation Guide, Part # 15031047 Rev. E. Briefly, DNase was used to remove DNA contamination, then the poly-A mRNAs were purified, and the mRNA was disintegrated into short fragments. Cleaved RNA fragments were synthesized into cDNA using random hexamer primers. After purification, the cDNA fragments were ligated to universal adapters containing sequencing priming sites. Subsequently, fragments were amplified using PCR and analyzed by agarose gel electrophoresis, and fragments with insert sizes between 200 and 400 bp were subjected to deep sequencing. For paired-end sequencing, both ends of the cDNA were sequenced using an Illumina NovaSeq 6000 (Illumina, San Diego, CA, USA).

Sequence Data Analysis and De Novo Assembly
To analyze the quality control of the sequenced raw reads, overall read quality, total bases, total reads, GC (%), and basic statistics were calculated. In order to reduce biases in analysis, artifacts such as low-quality reads, adaptor sequences, contaminant DNA, or PCR duplicates were removed using the Trimmomatic program [52]. The sliding window method was used, and bases of reads that did not qualify inside the window size 4, and mean quality 15 were trimmed [52]. Afterwards, reads with lengths shorter than 36 bp were dropped to produce trimmed data. The trimmed reads for all samples were merged into one file to perform transcriptome assembly. Subsequently, merged data were assembled using Trinity software (https://github.com/trinityrnaseq/trinityrnaseq/wiki) [53], which is generally utilized for de novo reconstruction of transcriptomes and combining read sequences of a certain length of overlap to form longer fragments without N gaps, called contigs. For assembled genes, the longest contig of the assembled contigs was filtered and clustered into non-redundant transcripts using the CD-HIT-EST program [54,55]. We defined these transcripts as unigenes. The unigenes were used for subsequent annotation and further differentially expressed gene (DEG) analysis.

Identification of DEGs
The read count values of contigs obtained through RSEM were used as the original raw data. Low-quality transcripts were filtered during data preprocessing. Subsequently, trimmed mean of M-values (TMM) normalization was performed, and statistical analysis was performed using fold change, exact test using edgeR per comparison pair [58]. Significant results were selected on the conditions of |fc| ≥ 2 and p-value < 0.05. For significant lists, hierarchical clustering analysis was performed to group similar samples and contigs. These results are graphically depicted using a heatmap and dendrogram. Classification of GO terms was subsequently performed using an in-house script [59]. The GO terms belonging to biological process (BP), cellular component (CC), and molecular function (MF) are listed. For KEGG pathway analysis of the DEGs, bidirectional best hit (BBH) was used to search against the KEGG database to obtain the KO (reference pathway) number of the KEGG annotations [60].

Validation of RNA-Seq Analysis by qRT-PCR
To validate the transcriptome-based DEGs results, 10 DEGs related to metabolic pathways were selected and examined by qRT-PCR using actin as a housekeeping gene. Primers were designed using Primer 3 software (www.embnet.sk/cgi-bin/primer3_www. cgi). The primer names and sequences used for primer design are listed in Table S1. In detail, the first-strand cDNA was used as the template for each reaction and 2 µL of cDNA (1:9 diluted) was placed in a 20 µL reaction mixture containing 10 µL of iQ™ SYBR-Green Supermix (2×) and 0.2 µL of each primer (10 pmol). The qRT-PCR was run on a CFX96 qPCR-PCR machine (Bio-Rad, Hercules, CA, USA), and the thermal cycling conditions were adjusted as follows: 95 • C for 3 min, followed by 40 cycles of 95 • C for 10 s and 63 • C for 30 s. Actin was used as the reference gene to normalize the gene expression level, and relative gene expression was calculated using the Ct (2 −∆∆Ct ) method. The above experiment was repeated three times.

Gene Expression Analysis Related to Toluene Purification in AtNDPK2-Transgenic A. pusilla Lines
To further analyze the metabolic pathways of plants in the process of toluene absorption and decomposition, four transgenic A. pusilla plants with AtNDPK2 obtained previously in our laboratory were used along with non-transgenic A. pusilla as a control [14]. Briefly, the plants were transferred into pots (20 cm in diameter) containing autoclaved soil and placed inside sealable plastic bags. The bags were sealed and placed in the culture room. After one week, the bags were partially opened, and two weeks later, they were fully opened. In the third week, the plants were cultured under ambient greenhouse conditions. Two months later, the surviving plants were placed in a sealed chamber at room temperature (25 ± 5 • C) under a pressure of 760 mm Hg with 12/12 h light/dark cycle. After toluene treatment was carried out at a concentration of 1.5 ppm inside the chamber for 5 h, the leaves of three plants in each line were mixed and immediately frozen in liquid nitrogen for further analysis. RNA extraction, cDNA synthesis, and qRT-PCR were performed as described above. The differential expression of metabolic genes related to toluene absorption and decomposition in transgenic A. pusilla lines was analyzed. The experiment was repeated three times.

Conclusions
In conclusion, we performed RNA-seq analysis and screened 4101 DEGs from A. pusilla plants after toluene treatment. The results of cluster analysis and functional annotation of DEGs enabled us to further investigate the metabolic pathway of toluene in A. pusilla plants. Toluene first enters the TCA cycle through ring cleavage and oxidation and then approaches other metabolic pathways, such as glycolysis/gluconeogenesis; starch and sucrose metabolism; glycerophospholipid metabolism, alanine, aspartate, and glutamate metabolism; terpenoid backbone biosynthesis; phenylpropanoid biosynthesis; flavonoid biosynthesis; and carotenoid biosynthesis. Our study provides insights into the metabolic mechanism underlying toluene response in plants. The identification of DEGs provides us with new genetic resource information, which would be helpful in developing plants for toluene air pollution remediation. In addition, qRT-PCR data supported that the overexpression of AtNDPK2 was helpful for the expression of genes related to toluene metabolism, thus improving the toluene absorption capacity of the plant.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/plants10051011/s1, Figure S1: GO functional analysis of DEGs identified in toluene treated and untreated A. pusilla, Figure S2: KEGG analysis DEGs in ribosome after toluene treatment in A. pusilla, Figure S3: KEGG analysis DEGs in endocytosis after toluene treatment in A. pusilla, Table S1: Primer sequences for metabolic pathways and reference genes used in this study, Table S2: Assembly statistics and statistics of clustered contigs (unigene) in A. pusilla leaf transcriptome, Table S3