Chromatographic Proﬁling with Machine Learning Discriminates the Maturity Grades of Nicotiana tabacum L. Leaves

: Nicotiana tabacum L. (NTL) is an important agricultural and economical crop. Its maturity is one of the key factors affecting its quality. Traditionally, maturity is discriminated visually by humans, which is subjective and empirical. In this study, we concentrated on detecting as many compounds as possible in NTL leaves from different maturity grades using ultra-performance liquid chromatography ion trap time-of-ﬂight mass spectrometry (UPLC-IT-TOF/MS). Then, the low-dimensional embedding of LC-MS dataset by t-distributed stochastic neighbor embedding (t-SNE) clearly showed the separation of the leaves from different maturity grades. The discriminant models between different maturity grades were established using orthogonal partial least squares discriminant analysis (OPLS-DA). The quality metrics of the models are R 2 Y = 0.939 and Q 2 = 0.742 (unripe and ripe), R 2 Y = 0.900 and Q 2 = 0.847 (overripe and ripe), and R 2 Y = 0.972 and Q 2 = 0.930 (overripe and unripe). The differential metabolites were screened by their variable importance in projection (VIP) and p -Values. The existing tandem mass spectrometry library of plant metabolites, the user-deﬁned library of structures, and MS-FINDER were combined to identify these metabolites. A total of 49 compounds were identiﬁed, including 12 amines, 14 lipids, 10 phenols, and 13 others. The results can be used to discriminate the maturity grades of the leaves and ensure their quality. maturity At the ﬁrst stage, metabolites were identiﬁed by searching the self-built library of Solanaceae plants (NTL, Eggplant). At the second stage, metabolites are identiﬁed by searching the metabolomic libraries (KNApSAcK, LipidMAPS, PlantCyc, NANPDB) of plants in MS-FINDER and matching their spectra. At the third stage, metabolites are putatively identiﬁed by searching in METLIN, which stored a large number of small molecule metabolites. maturity degrees. This maturity closely metabolites.


Introduction
Nicotiana tabacum L. (NTL) is a Solanaceae plant with important economic significance. Maturity of the NTL leaves is the primary factor for grading, which is an important index to measure the quality [1]. If the maturity grade of the leaves can be determined accurately and the right time for harvesting can be chosen precisely, the field loss rate and curing loss rate of the leaves can be reduced significantly. Studies have indicated that the ripe leaves have fully developed leaves, loose tissue structure, coordinated chemical components, and rich aroma substances [2]. At present, the maturity grade of the leaves generally was distinguished visually by the experts, which is highly subjective and empirical. Hence, it is important and necessary to study the differences in metabolites of the leaves from different maturity grades.
Metabolomics is playing an important role in the study of the metabolic process to reveal the essence of life's activities [3]. Metabolomics mainly studies small molecular metabolites (molecular weight < 1000) produced by various metabolic pathways. So far, several methods have been established for untargeted metabolomic analysis of plant extracts, including gas chromatography coupled to mass spectrometry (GC-MS) [4], capillary electrophoresis mass spectrometry (CE-MS) [5], and liquid chromatography mass spectrometry (LC-MS) [6]. GC-MS is suitable for the analysis of thermally stable volatile compounds, Scheme 1. Schematic diagram of untargeted metabolomic analysis of Nicotiana tabacum L. (NTL) leaves based on liquid chromatography mass spectrometry (LC-MS). The untargeted metabolomic analysis based on LC-MS are mainly divided into the following six parts: extraction of metabolites of NTL leaves of different maturity grades, LC-MS analysis of the extracted metabolites, data preprocessing of LC-MS raw data, dimensionality reduction and visualization based on peak table, classification of samples with different maturity grades based on peak table, and identification of different metabolites.

Materials and Reagents
Forty-five samples from different maturity grades were provided by Yunnan Academy of Tobacco Agricultural Sciences (Kunming, China). These NTL leaves were picked and waved in the conventional harvest period for middle leaves in the local area, to ensure the equilibrium and consistency of NTL leaves maturity and quality with moderate density. These NTL leaves were flue-cured in a local bulk curing barn. The fluecuring was performed by the most common curing mode in the region ( Figure 1). Additionally, 100 to 120 NTL leaves were weaved in each rod, and a total of three layers were set; in each layer, there were 150 to170 rods.

Materials and Reagents
Forty-five samples from different maturity grades were provided by Yunnan Academy of Tobacco Agricultural Sciences (Kunming, China). These NTL leaves were picked and waved in the conventional harvest period for middle leaves in the local area, to ensure the equilibrium and consistency of NTL leaves maturity and quality with moderate density. These NTL leaves were flue-cured in a local bulk curing barn. The flue-curing was performed by the most common curing mode in the region ( Figure 1). Additionally, 100 to 120 NTL leaves were weaved in each rod, and a total of three layers were set; in each layer, there were 150 to170 rods.
Cured NTL leaves were used for metabolic profiling analyses, including 15 unripe samples, 15 ripe samples, and 15 overripe samples. The detailed information of NTL leaves of different maturity stages is shown in Table 1. Cured NTL leaves were used for metabolic profiling analyses, including 15 unripe samples, 15 ripe samples, and 15 overripe samples. The detailed information of NTL leaves of different maturity stages is shown in Table 1.

Sample Preparation
The NTL leaves were ground to powder and filtered through a 40-mesh sieve. A total of 100 milligrams powdered sample was transferred to a 2 mL Eppendorf tube, and then 1 mL of aqueous methanol solution was added to the tube. Samples were vortexed for 20 s, sonicated for 30 min, and centrifuged at 4 °C for 10 min at 16,000 g. After centrifugation, the supernatant was obtained and passed through a syringe filter (0.22 μm pore size). Then, the solvent was evaporated under a stream of N2 gas at room temperature. The dried sample was resolved in 300 μL extraction solvent before LC-MS analysis.
In addition, quality control (QC) samples were prepared by mixing equal amounts of all the analyzed samples and were added at the beginning of the sequence to equilibrate the system and every nine samples to further monitor the stability of the analysis [16].

LC-MS Analysis
The sample was analyzed with a LC-30AD UPLC system (Shimadzu, Tokyo, Japan) coupled with an IT-TOF MS (Shimadzu, Tokyo, Japan). The LC-MS system was controlled by the LCMS solution 3.70 software (Shimadzu, Tokyo, Japan).

Sample Preparation
The NTL leaves were ground to powder and filtered through a 40-mesh sieve. A total of 100 milligrams powdered sample was transferred to a 2 mL Eppendorf tube, and then 1 mL of aqueous methanol solution was added to the tube. Samples were vortexed for 20 s, sonicated for 30 min, and centrifuged at 4 • C for 10 min at 16,000 g. After centrifugation, the supernatant was obtained and passed through a syringe filter (0.22 µm pore size). Then, the solvent was evaporated under a stream of N 2 gas at room temperature. The dried sample was resolved in 300 µL extraction solvent before LC-MS analysis.
In addition, quality control (QC) samples were prepared by mixing equal amounts of all the analyzed samples and were added at the beginning of the sequence to equilibrate the system and every nine samples to further monitor the stability of the analysis [16].

LC-MS Analysis
The sample was analyzed with a LC-30AD UPLC system (Shimadzu, Tokyo, Japan) coupled with an IT-TOF MS (Shimadzu, Tokyo, Japan). The LC-MS system was controlled by the LCMS solution 3.70 software (Shimadzu, Tokyo, Japan).
The injection volume was 2 µL, and the column temperature was set to 40 • C. The ultra-performance liquid chromatography column was ACQUITY UPLC BEH C18 (100 nm × 2.1 mm, 1.7 µm, Waters Corporation, Milford, MA, USA). The mobile phase was constituted by acetonitrile acidified with 0.1% formic acid (eluent A) and water acidified The mass spectrometer was operated within the m/z range of 50-1000 for MS1 and automatic multiple stage fragmentation scan modes for MS/MS spectra. The CDL temperature was set to 200 • C, and the heating block temperature was set to 200 • C, the nebulizing gas (N 2 ) flow rate was 1.5 L/min, the drying gas (N 2 ) pressure was set to 100 kPa, ion trap pressure was set to 1.8 × 10 −5 kPa, and the ion accumulation time was 60.0 ms. Detector voltage was set at 1.62 kV. RP vacuum degree was set to 85.0-92.0 Pa, IT vacuum degree was set to 1.8 × 10 −2 Pa, TOF vacuum degree was set to 1.3 × 10 −4 Pa. Collision energy was set at 50%.

Data Preprocessing
Raw data were exported in mzData format by LCMS Solution-Browser (Shimadzu, Tokyo, Japan). Prior to preprocessing, the exported data were converted to mzML files, and centroided using OpenMS [33]. Then, the mzML files were read into R terminal (version = 4.0.2) using the MSnbase package (version = 2.15.8), and the LC-MS data were preprocessed using xcms package (version = 3.11.3) [21]. The preprocessing consisted of chromatographic peak detection, peak alignment, and correspondence between different samples. A list of metabolic features with mass, retention time, and abundance were obtained. The alignment results followed the "80% rule" [34]. Replacement of the missing value and data normalization were performed by Metaboanalyst R (version = 2.0.2) [35], and peak table with label and category were exported for further analysis.

Machine Learning Models of Maturity Grades
Peak table were further processed by SIMCA-P (version 14.1, Umetrics AB, Umea, Sweden) for multivariate data analysis [36]. PCA was employed to reduce dimensionality and evaluate data quality. The t-SNE was employed to visualize the leaves from different maturity grades. A detailed description of the t-SNE method is described in Method S1. The OPLS-DA models were built to predict the maturity grades of the leaves. The detail of the OPLS-DA method is described in Method S2. The differential features were screened out by variable importance in projection (VIP) values of >1. Meanwhile, variables with significant differences (Probability, p < 0.05) of t-test were selected between different grades. The p value is the probability. When p < 0.05, it indicates a significant difference. Hierarchical cluster analysis (HCA) was performed on differential metabolites of different maturity grades by the heatmap package in R (version = 4.0.2) programming language.

Identification of Metabolites
The features were selected as potential differential features based on the values of VIP > 1.0 and p < 0.05. Since the coverage of metabolites for MS/MS library was not comprehensive enough, the searching results of the library were not accurate and often required further verification. The same plant family or closely related families had chemical substances of the same or similar structures. Based on the articles on metabolomic studies of Solanaceae plants [1,9,[15][16][17][37][38][39][40][41][42][43][44][45][46][47][48][49][50][51][52][53][54][55][56], the user-defined library of plant metabolites was established. MS-FINDER was used as the identification tool of metabolites. Tandem mass spectra of differential metabolites were stored in MSP format. Then, it was imported into MS-FINDER [57] for identification. The existing library of plant metabolites and the user-defined library were combined together to identify these differential metabolites. The details of MS-FINDER software are described in Method S3. The detailed information of the user-defined library is listed in Table S5.
In this study, the differential metabolites were identified by a procedure consisting of three stages (Figure 2). At the first stage, metabolites were identified by comparing MS/MS information of features with in silico MS/MS spectrum predicted from the molecular structures in the user-defined library. At the second stage, MS/MS information of features was matched with that of predicted/reported fragments from some local databases in MS-FINDER, for example the PlantCyc, LIPID MAPS, and KNApSAcK. At the third stage, metabolites were putatively identified by comparing the accurate m/z value of the feature with the metabolites in METLIN for features, and the candidate with the lowest difference in parts per million was selected. In this study, the differential metabolites were identified by a procedure consisting of three stages (Figure 2). At the first stage, metabolites were identified by comparing MS/MS information of features with in silico MS/MS spectrum predicted from the molecular structures in the user-defined library. At the second stage, MS/MS information of features was matched with that of predicted/reported fragments from some local databases in MS-FINDER, for example the PlantCyc, LIPID MAPS, and KNApSAcK. At the third stage, metabolites were putatively identified by comparing the accurate m/z value of the feature with the metabolites in METLIN for features, and the candidate with the lowest difference in parts per million was selected.

Extraction Solvent Optimization
The extraction procedure is important for the detection of small metabolites in the NTL leaves. Single solvent is difficult to extract different metabolites. Methanol solution is an effective solvent system to extract metabolites for large-scale plant metabolomic studies [7,58]. In this study, the ratio of methanol/water were optimized for extraction experiments, and six different solvent ratios (5:5, 6:4, 7:3, 8:2, 9:1, 10:0, v/v) were studied. Here, the peak number and area were treated as the criteria to evaluate the extraction efficiency. One can see from Figure 3A,B that methanol/water (8:2, v/v) has the best efficiency. putatively identified by searching in METLIN, which stored a large number of small molecule metabolites.

Extraction Solvent Optimization
The extraction procedure is important for the detection of small metabolites in the NTL leaves. Single solvent is difficult to extract different metabolites. Methanol solution is an effective solvent system to extract metabolites for large-scale plant metabolomic studies [7,58]. In this study, the ratio of methanol/water were optimized for extraction experiments, and six different solvent ratios (5:5, 6:4, 7:3, 8:2, 9:1, 10:0, v/v) were studied. Here, the peak number and area were treated as the criteria to evaluate the extraction efficiency. One can see from Figure 3A,B that methanol/water (8:2, v/v) has the best efficiency.

Extraction Time Selection
Ultrasound can increase the swelling index, which is the absorption of water by plant during the ultrasonic treatment. Compared with mechanical stirring, the extraction efficiency under ultrasonic treatment is much higher. In some cases, increased swelling of the plant tissue can damage the cell wall, thereby facilitating the metabolites extraction [59]. Here, five different extraction times (15,30,40,50, 65 min) were studied. The number and area of peaks were used as the evaluation criteria. It can be seen from Figure 4A,B that the extraction efficiency of ultrasonic time at 65 min has not changed significantly compared with 15 min. In order to ensure the extraction quality and reduce random error in pretreatment [16], 30 min was selected as the appropriate extraction time.

Extraction Time Selection
Ultrasound can increase the swelling index, which is the absorption of water by plant during the ultrasonic treatment. Compared with mechanical stirring, the extraction efficiency under ultrasonic treatment is much higher. In some cases, increased swelling of the plant tissue can damage the cell wall, thereby facilitating the metabolites extraction [59]. Here, five different extraction times (15,30,40,50, 65 min) were studied. The number and area of peaks were used as the evaluation criteria. It can be seen from Figure 4A,B that the extraction efficiency of ultrasonic time at 65 min has not changed significantly compared with 15 min. In order to ensure the extraction quality and reduce random error in pretreatment [16], 30 min was selected as the appropriate extraction time.

Extraction Solvent Optimization
The extraction procedure is important for the detection of small metabolites in the NTL leaves. Single solvent is difficult to extract different metabolites. Methanol solution is an effective solvent system to extract metabolites for large-scale plant metabolomic studies [7,58]. In this study, the ratio of methanol/water were optimized for extraction experiments, and six different solvent ratios (5:5, 6:4, 7:3, 8:2, 9:1, 10:0, v/v) were studied. Here, the peak number and area were treated as the criteria to evaluate the extraction efficiency. One can see from Figure 3A,B that methanol/water (8:2, v/v) has the best efficiency.

Extraction Time Selection
Ultrasound can increase the swelling index, which is the absorption of water by plant during the ultrasonic treatment. Compared with mechanical stirring, the extraction efficiency under ultrasonic treatment is much higher. In some cases, increased swelling of the plant tissue can damage the cell wall, thereby facilitating the metabolites extraction [59]. Here, five different extraction times (15,30,40,50, 65 min) were studied. The number and area of peaks were used as the evaluation criteria. It can be seen from Figure 4A,B that the extraction efficiency of ultrasonic time at 65 min has not changed significantly compared with 15 min. In order to ensure the extraction quality and reduce random error in pretreatment [16], 30 min was selected as the appropriate extraction time.

Investigation of UPLC/IT-TOF MS Parameters
There are many types of metabolites in flue-cured NTL leaves, so the choice of chromatographic column is crucial for the separation of these metabolites. The ACQUITY UPLC BEH C18 column had high reliability to retain molecules with good repeatability. The trifunctionally bonded BEH particle gave the wide usable pH range (pH 1-12), ultralow column bleed and excellent separation. In addition, the optimized mass spectrometry parameters made the analysis in a highly sensitive state. Because the positive ion mode had detected more features than the negative ion mode, the positive ion mode was selected to analyze the leaves from different maturity grades. Base peak chromatograms (BPC) of metabolites extracted from three typical leaves from different maturity grades were shown in the Figure 5A-C, respectively. As can be seen from the Figure 5, the relative areas of some peaks are significantly different.

Validation of the Analytical Method
In order to ensure the repeatability, accuracy, and precision of the extraction results, it was necessary to evaluate the analysis method. As shown in Figure 6, the reproducibility of the extraction method, instrument stability, intraday precision and interday precision were verified.  To determine whether the repeatability of this extraction method is acceptable, six parallel QC samples were prepared according to Section 2. After LC-MS analysis and data preprocessing, a table of metabolites with retention time, m/z, and abundance was obtained. Then, the relative standard deviation (RSD) of each feature in six QC samples was calculated, and the number and area of features in different RSD ranges (0-10, 10-20, 20-30, >30%) were counted. Almost 89% features had an RSD within 20%, which accounted for 90% of the total peak area. Therefore, the preparation method of this sample had good repeatability.
Instrument precision was also an important parameter that needed to be investigated. Similarly, the samples were prepared according to the method in Section 2, and six consecutive LC-MS analyses on the same QC sample were performed. Peak number and area of features were computed in different RSD (0-10, 10-20, 20-30, >30%) ranges. Nearly 95% of metabolic features had an RSD < 20%, which accounted for 95% of the total peak area. The results showed that the instrument has excellent stability.
In order to investigate the intraday precision, six duplicated QC samples were analyzed at (2, 4, 6, 8, 10, 12 h) of the day, and the peak numbers and peak areas of metabolic features were counted in different RSD (0-10, 10-20, 20-30, >30%) ranges. As shown in Figure 6, there were 94% of the metabolic features within 20% of RSD, which accounted for approximately 94% of the total of the peak area. In order to investigate the interday precision, six QC samples were analyzed over 4 days, and the peak numbers and peak areas of metabolic features were computed in different RSD (0-10, 10-20, 20-30, >30%) ranges. It can be seen from Figure 6 that 88% of the metabolic features within 20% of RSD, accounting for about 91% of the total peak area. It showed that the method has a good intraday and interday precision.

Classification of the Leaves from Different Maturity Grades
The reliability of the acquired data should be evaluated before further statistical analysis. In this study, the QC samples were inserted in the analysis sequence to monitor the data quality according to Section 2. The first principal component of 11 QC samples was illustrated in Figure 7. The results showed that the acquired data are stable during operation, and further statistical analysis can be performed to build discriminant models and screen the differential metabolites.
ranges. It can be seen from Figure 6 that 88% of the metabolic features within 20% of RSD, accounting for about 91% of the total peak area. It showed that the method has a good intraday and interday precision.

Classification of the Leaves from Different Maturity Grades
The reliability of the acquired data should be evaluated before further statistical analysis. In this study, the QC samples were inserted in the analysis sequence to monitor the data quality according to Section 2. The first principal component of 11 QC samples was illustrated in Figure 7. The results showed that the acquired data are stable during operation, and further statistical analysis can be performed to build discriminant models and screen the differential metabolites. The t-SNE converts high-dimensional data into low-dimensional embedding (twodimensional or three-dimensional) by minimizing the Kullback-Leibler divergence between their joint probabilities. It was superior to existing technologies and produced significantly better visualization [26]. So, t-SNE was used to reduce dimensionality and visualize flue-cured NTL leaves from different maturity grades. In Figure 8, the two-dimensional maps of t-SNE showed that these samples were obviously separated into three groups according to their maturity grades. The t-SNE converts high-dimensional data into low-dimensional embedding (twodimensional or three-dimensional) by minimizing the Kullback-Leibler divergence between their joint probabilities. It was superior to existing technologies and produced significantly better visualization [26]. So, t-SNE was used to reduce dimensionality and visualize flue-cured NTL leaves from different maturity grades. In Figure 8, the twodimensional maps of t-SNE showed that these samples were obviously separated into three groups according to their maturity grades. They were clustered into three groups according their maturity degrees. This indicated that the maturity grades were closely related to their own metabolites.
To further investigate the differences between the flue-cured NTL leaves from different maturity grades, the OPLS-DA models were established. OPLS-DA is an effective and interpretable discriminant method because of the elimination of information unrelated to maturity grades. First, the OPLS-DA model between the unripe and ripe samples was established, and its score plot and the result of the permutation test are shown in Figure  9A. The samples of two maturity grades were clearly separated along the PC1 axis. Results (R 2 Y = 0.939, Q 2 = 0.742) showed that the model is accurate and reliable. To avoid overfitting the OPLS-DA model, 200-times permutation testing was applied. The results of Figure 8. The t-SNE visualization of the flue-cured NTL leaves from three different maturity grades. They were clustered into three groups according their maturity degrees. This indicated that the maturity grades were closely related to their own metabolites.
To further investigate the differences between the flue-cured NTL leaves from different maturity grades, the OPLS-DA models were established. OPLS-DA is an effective and interpretable discriminant method because of the elimination of information unrelated to maturity grades. First, the OPLS-DA model between the unripe and ripe samples was established, and its score plot and the result of the permutation test are shown in Figure 9A. The samples of two maturity grades were clearly separated along the PC1 axis. Results (R 2 Y = 0.939, Q 2 = 0.742) showed that the model is accurate and reliable. To avoid over-fitting the OPLS-DA model, 200-times permutation testing was applied. The results of permutation test in Figure 9B showed that the model was reliable. The VIP > 1 and p < 0.05 were chosen as the criteria to screen out differential features. In this way, thirteen metabolites were found as the differential features between the unripe and ripe leaves, and the detailed information of these differential features is listed in Table S1. In order to analyze the changes in differential metabolites, heat maps ( Figure 10A-C) were used to display the relative distribution of each metabolite in each maturity grade. It can be seen from these figures that the leaves from three different maturity grades were well clustered. It meant that the results of the analysis are credible. Similarly, OPLS-DA and t-test were performed on overripe and ripe samples. The OPLS-DA score of overripe and ripe samples were plotted in Figure 9C, and samples of these two maturity grades can be clearly distinguished. Results (R 2 Y = 0.9, Q 2 = 0.847) showed that the model is effective and reliable. The model was also assessed by a 200-times permutation test, and one can observe from Figure 9D that there is no over-fitting risk. Finally, according to VIP > 1 and p < 0.05, thirteen differential metabolites were found between the overripe and ripe samples, and the detailed information of these differential features were shown in Table S2.
Finally, the OPLS-DA model between the overripe and unripe samples was established. The score plot is shown in Figure 9E, samples from these two maturity grades can be clearly distinguished. Results (R 2 Y = 0.972, Q 2 = 0.93) showed that the model is highly reliable and accurate. The OPLS-DA was also assessed by a 200-times permutation test, and there is no over-fitting risk from Figure 9F. Since there were more differential features detected in this model compared to the previous ones, it was difficult to conduct subsequent qualitative analysis. Therefore, a stricter criterion (VIP > 1.5 and p < 0.01) was set, and twenty-nine differential features were obtained. The detailed information of these differential features is listed in Table S3.
In order to analyze the changes in differential metabolites, heat maps ( Figure 10A-C) were used to display the relative distribution of each metabolite in each maturity grade. It can be seen from these figures that the leaves from three different maturity grades were well clustered. It meant that the results of the analysis are credible. Separations 2021, 8, x FOR PEER REVIEW 13 of 19 Figure 10. Results of hierarchical cluster analysis (HCA) of the flue-cured NTL leaves from different maturity grades. (A) HCA of differential metabolites between overripe and ripe leaves; (B) HCA of differential metabolites between unripe and ripe leaves; (C) HCA of differential metabolites between unripe and overripe leaves. Figure 10. Results of hierarchical cluster analysis (HCA) of the flue-cured NTL leaves from different maturity grades. (A) HCA of differential metabolites between overripe and ripe leaves; (B) HCA of differential metabolites between unripe and ripe leaves; (C) HCA of differential metabolites between unripe and overripe leaves.

Identification of Metabolites from Different Maturity Grades
It can be seen from the above sections that forty-nine differential features were screened out by the OPLS-DA and t-test. One of them is common among three different maturity grades. Four of them were common differential metabolites in two OPLS-DA models. The MSP file of each metabolite was imported into MS-FINDER and searched in the user-defined library. Six metabolites were identified by the MS-FINDER and userdefined library. Afterwards, the libraries of plant metabolites in MS-FINDER were searched, and in silico MS/MS fragments were matched. Twenty-six differential metabolites were annotated. Eight metabolites were putative annotated by searching the METLIN database. In addition, nine metabolites had not been annotated because of the limited number of molecules in spectral or structural libraries.

Discussions
According to the VIP value and the p-value, forty-nine significantly differential metabolites were found between overripe, unripe, and ripe samples (Table S4). The differential metabolites mainly include amines, lipids, and phenols.
Nitrogen-containing compounds in flue-cured NTL leaves include protein, alkaloids, etc. Nitrogen-containing compounds not only affect the characteristics of leaves and determine economic output, but also have an important impact on the quality of leaves [60]. Alkaloids are a class of secondary metabolites that contain nitrogen. Among these alkaloids, nicotine is the most important compound, accounting for more than 95% of the total alkaloids, followed by nor-nicotine, etc. In the identification of differential metabolites of fluecured leaves from different maturity grades, N-Octanoylnornicotine, Nicotine-1 -N-oxide (NNO), and 1-Methyl-9H-pyrido [3,4-b]indole (Harman) were detected ( Figure 11A-C). Nicotine is synthesized in the roots of multiple Nicotiana species and transports to the aerial part of the plant followed by its demethylation to nornicotine [61]. The content of Nicotine-1 -N-oxide in the leaves decreased with the increase in maturity, while the content of N-Octanoylnornicotine increased with the maturity of leaves. Nicotine-1 -N-oxide is an oxidation product of nicotine. Nornicotine was produced by enzymatic degradation of nicotine during senescence and conditioning of leaves. Therefore, the content of N-Octanoylnornicotine of overripe leaves increased significantly. Harman is a naturally occurring beta-carboline alkaloid and only a small amount exists in leaves. The content of Harman tended to stabilize as the leaves matured.

Identification of Metabolites from Different Maturity Grades
It can be seen from the above sections that forty-nine differential features were screened out by the OPLS-DA and t-test. One of them is common among three different maturity grades. Four of them were common differential metabolites in two OPLS-DA models. The MSP file of each metabolite was imported into MS-FINDER and searched in the user-defined library. Six metabolites were identified by the MS-FINDER and user-defined library. Afterwards, the libraries of plant metabolites in MS-FINDER were searched, and in silico MS/MS fragments were matched. Twenty-six differential metabolites were annotated. Eight metabolites were putative annotated by searching the METLIN database. In addition, nine metabolites had not been annotated because of the limited number of molecules in spectral or structural libraries.

Discussions
According to the VIP value and the p-value, forty-nine significantly differential metabolites were found between overripe, unripe, and ripe samples (Table S4). The differential metabolites mainly include amines, lipids, and phenols.
Nitrogen-containing compounds in flue-cured NTL leaves include protein, alkaloids, etc. Nitrogen-containing compounds not only affect the characteristics of leaves and determine economic output, but also have an important impact on the quality of leaves [60]. Alkaloids are a class of secondary metabolites that contain nitrogen. Among these alkaloids, nicotine is the most important compound, accounting for more than 95% of the total alkaloids, followed by nor-nicotine, etc. In the identification of differential metabolites of flue-cured leaves from different maturity grades, N-Octanoylnornicotine, Nicotine-1′-Noxide (NNO), and 1-Methyl-9H-pyrido [3,4-b]indole (Harman) were detected ( Figure  11A-C). Nicotine is synthesized in the roots of multiple Nicotiana species and transports to the aerial part of the plant followed by its demethylation to nornicotine [61]. The content of Nicotine-1′-N-oxide in the leaves decreased with the increase in maturity, while the content of N-Octanoylnornicotine increased with the maturity of leaves. Nicotine-1′-Noxide is an oxidation product of nicotine. Nornicotine was produced by enzymatic degradation of nicotine during senescence and conditioning of leaves. Therefore, the content of N-Octanoylnornicotine of overripe leaves increased significantly. Harman is a naturally occurring beta-carboline alkaloid and only a small amount exists in leaves. The content of Harman tended to stabilize as the leaves matured. Harman; (C) N-Octanoylnornicotine, in the methanol extracts of the flue-cured NTL leaves. All data represent the mean values ± standard errors. The asterisks represent significant differences. One asterisk 0.05 ≧ p > 0.01, two asterisks 0.01 ≧ p > 0.001, three asterisks p ≧ 0.001.
Phenolic compounds have a variety of physiological functions, and almost all exist in the vacuole in the form of glycosides and esters [62]. The glycosides identified were mainly flavonoids ( Figure 12A,B), such as Quercetin 3-rutinoside 7-galactoside, Kaempferol 3-rutinoside-4′-glucoside, etc. Flavonoids are widely found in plants and are Figure 11. The differential alkaloids between different maturity grades. (A) Nicotine-1 -N-oxide; (B) Harman; (C) N-Octanoylnornicotine, in the methanol extracts of the flue-cured NTL leaves. All data represent the mean values ± standard errors. The asterisks represent significant differences. One asterisk 0.05 ≥ p > 0.01, two asterisks 0.01 ≥ p > 0.001, three asterisks p ≥ 0.001.
Phenolic compounds have a variety of physiological functions, and almost all exist in the vacuole in the form of glycosides and esters [62]. The glycosides identified were mainly flavonoids ( Figure 12A,B), such as Quercetin 3-rutinoside 7-galactoside, Kaempferol 3-rutinoside-4 -glucoside, etc. Flavonoids are widely found in plants and are the secondary metabolites of plants [63]. Most of them are combined with sugars to form glycosides or carbon sugar groups in plants. Flavonoids include the glycosides of kaempferol and quercetin. The content of Quercetin 3-rutinoside 7-galactoside and Kaempferol 3-rutinoside-4 -glucoside both increased with the maturity of the leaves and basically reached a balance when the leaves were at moderate maturity. Generally, when the content of phenols reaches the maximum, it is the suitable harvest period. However, different parts of a leaf, different amounts of growth regulator substances, baking conditions, and mineral nutrients will cause different levels of phenolic substances. Therefore, it is difficult to determine the most suitable harvest period based on phenolic compounds solely. the secondary metabolites of plants [63]. Most of them are combined with sugars to form glycosides or carbon sugar groups in plants. Flavonoids include the glycosides of kaempferol and quercetin. The content of Quercetin 3-rutinoside 7-galactoside and Kaempferol 3-rutinoside-4′-glucoside both increased with the maturity of the leaves and basically reached a balance when the leaves were at moderate maturity. Generally, when the content of phenols reaches the maximum, it is the suitable harvest period. However, different parts of a leaf, different amounts of growth regulator substances, baking conditions, and mineral nutrients will cause different levels of phenolic substances. Therefore, it is difficult to determine the most suitable harvest period based on phenolic compounds solely. Lipids include phospholipids, glycolipids, and cholesterol and cholesterol esters. Among the identified differential lipids, most of them are diacylglycerol (DG), ceramide (Cer), phosphatidylcholine (PC), and phosphatidylethanolamine (PE). Phospholipids are the main components of biological membranes. As shown in Figure 12C,D, the content of Cer(d18:0/14:0) increased significantly when the leaves were over mature, and the content , in the methanol extracts of the flue-cured NTL leaves. All data represent the mean values ± standard error. The asterisks represent significant differences. One asterisk 0.05 ≥ p > 0.01, two asterisks 0.01 ≥ p > 0.001, three asterisks p ≤ 0.001.
Lipids include phospholipids, glycolipids, and cholesterol and cholesterol esters. Among the identified differential lipids, most of them are diacylglycerol (DG), ceramide (Cer), phosphatidylcholine (PC), and phosphatidylethanolamine (PE). Phospholipids are the main components of biological membranes. As shown in Figure 12C,D, the content of Cer(d18:0/14:0) increased significantly when the leaves were over mature, and the content of DG(20:5(5Z,8Z,11Z,14Z)/14:0/0:0) decreased as the leaves matured. As the leaves matured, the glandular hair secretion increased continuously, and the content of lipid compounds also increased. However, lipids are also affected by the NTL plant's own metabolism during the maturation process. The modulation, fermentation, and aging process of the leaves will also affect the changes in the chemical compositions. Therefore, it is not accurate and robust enough to judge the maturity of the leaves with only the lipid content.

Conclusions
In this study, we have developed a method to extract the metabolites from flue-cured NTL leaves, which has good repeatability and precision. The metabolites of samples from three different maturity grades were analyzed and compared by UPLC-IT-TOF/MS. The OPLS-DA models were built to classify the leaves with good accuracy. Differential metabolites related to three different maturity grades were identified by the user-defined structure library, the existing plant metabolites library, and MS-FINDER software. Fortynine differential metabolites of the leaves were putatively identified, including amines, phenols, and lipids. These results indicated that UPLC-IT-TOF/MS-based metabolomics can be useful to discriminate the leaves from different maturity grades, and the userdefined structural library and computational tool have the potential to identify the unknown metabolites.
Supplementary Materials: The following are available online at https://www.mdpi.com/2297 -8739/8/1/9/s1, Method S1: Detailed description of the t-SNE algorithm, Method S2: Detailed description of the OPLS-DA algorithm, Method S3: Detailed description of the MS-FINDER software,  Table S4: The full list of identified compounds of flue-cured NTL leaves from different maturity grades (unripe, ripe, and overripe), Table S5: Table of

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.