A GC-MS Protocol for Separating Endangered and Non-endangered Pterocarpus Wood Species

Pterocarpus santalinus and Pterocarpus tincorius are commonly used traded timber species of the genus Pterocarpus. P. santalinus has been listed in Appendix II of the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). As a non-CITES species, P. tincorius is also indiscriminately labeled as P. santalinus due to the similar macroscopic and microscopic features with P. santalinus. In order to understand the molecular discrimination between these easily confused species, xylarium heartwoods of these two species were extracted by three different kinds of solvents and analyzed using gas chromatography–mass spectrometry (GC-MS). Multivariate analyses were also applied for the selection of marker compounds that are distinctive between P. santalinus and P. tincorius. A total of twenty volatile compounds were detected and tentatively identified in three kinds of extracts, and these compounds included alcohols, stilbenoids, esters, aromatic hydrocarbons, ketones, miscellaneous, phenols, and flavonoids. GC-MS analyses also revealed that extraction solvents including ethanol and water (EW), ethyl acetate (EA), and benzene–ethanol (BE) gave the best chemotaxonomical discrimination in the chemical components and relative contents of the two Pterocarpus species. After chemometric analyses, EW displayed higher predictive accuracy (100%) than those of EA extract (83.33%) and BE extract (83.33%). Furthermore, spathulenol (17.58 min) and pterostilbene (23.65 min) were elucidated as the critical compounds for the separation of the EW extracts of P. santalinus and P. tinctorius. Thus, a protocol of GC-MS and multivariate analyses was developed to use for successfully distinguishing P. santalinus from P. tinctorius.


Introduction
Illegal logging seriously affects the global forest resources, causing forest destruction, the loss of biodiversity, climate change, and environmental deterioration [1,2]. The seizing of illegal wood products and the prosecuting of illegal logging crimes are of importance for the restriction of both illegal logging and associated trade [3]. Therefore, being able to identify timber to a level of certainty acceptable for admission to a court of law plays a critical role in the law enforcement for the forest protection [4].

GC-MS Analysis of Heartwood Extracts of P. santalinus and P. tinctorius
Extraction of P. santalinus and P. tinctorius samples was conducted under three different solvent systems: EW, EA, and BE, respectively. The extracts were then analyzed by GC-MS. In the subsequent GC-MS analyses, one sample in each species was used as a representative. The typical GC-MS total ion chromatograms (TICS) of the heartwood extracts of P. santalinus and P. tinctorius are shown in Figure 1. As shown in Figure 1, it is obvious that significant differences appear in the heartwood extracts of P. santalinus and P. tinctorius. A greater number of peaks are observed in the TICS curves of the heartwood extracts of P. santalinus than in those of P. tinctorius. TICS curves of P. tinctorius of three different kinds of extracts are similar ( Figure 1). On the contrary, TICS curves of three different kinds of extracts of P. santalinus are significantly different.
Molecules 2019, 24 3 of 12 TICS curves of the heartwood extracts of P. santalinus than in those of P. tinctorius. TICS curves of P. tinctorius of three different kinds of extracts are similar ( Figure 1). On the contrary, TICS curves of three different kinds of extracts of P. santalinus are significantly different.  For three different kinds of extraction solvents, the distinctions were a consequence not only of differences in the number of detected molecules, but also in the relative content of peaks differed between P. santalinus and P. tinctorius. The peak area was the analytical signal for the relative content of peaks. The relative content was calculated by area normalization and the average value of the three replicates. Peaks whose area was above 1% were tentatively identified by matching their mass spectra with those in the NIST 11 library and in the literature, as summarized in Table 1. A total of twenty volatile compounds were detected and tentatively identified in three kinds of extracts, and these compounds included alcohols, stilbenoids, esters, aromatic hydrocarbons, ketones, miscellaneous, phenols, and flavonoids.

Multivariate Analyses
PCA methods were applied to the processed dataset to visualize the clustering trends between the two Pterocarpus species. Samples with similar values for the variables explained by the principal components appeared close together in the PCA score plot [28]. For the EW extract, the first principal component (PC1) represented 51.2% of the variance and the second principal components represented 15.5% of the total variance. Along the PC1 axis, the left side of the plot shows the cluster of P. tinctorius samples, and the right side depicts the P. santalinus samples (Figure 2a). Similar results were also observed both in the EA extract and the BE extract (Figure 2b,c). The existing distinguish was thought to be derived from the difference of wood species. The loading plot can further illustrate the key variance responsible for the distinction between the groups, and the loading plot of PC1 shows in the Figure 3. Peak at 15.12 min in the EW extract and peak at 23 min in the EA extract present high contribution for the classification (Figure 3a,b), while differences of the relative content of these peaks still exist in the P. santalinus samples ( Figure S1). This may be the reason why samples from P. santalinus were spread in the PCA score plot of the EW extract and the EA extract. Furthermore, the samples cultivated in different regions with different growth conditions could also affect the result of score plot. Additionally, the BE extract seems to provide the best separation because of the tightest sample distribution on the score plot. As shown in the loading plot of PC1 of the BE extract, p-Xylene (4.1 min) derived from the benzene-ethyl alcohol solvent also presents high contribution for the classification except for the peaks at 17.34, 17.58, 18.26, and 23.65 min ( Figure 3c). This phenomenon is detrimental to wood identification in practice because the difference from the chemical composition of wood would be weakened.

Multivariate Analyses
PCA methods were applied to the processed dataset to visualize the clustering trends between the two Pterocarpus species. Samples with similar values for the variables explained by the principal components appeared close together in the PCA score plot [28]. For the EW extract, the first principal component (PC1) represented 51.2% of the variance and the second principal components represented 15.5% of the total variance. Along the PC1 axis, the left side of the plot shows the cluster of P. tinctorius samples, and the right side depicts the P. santalinus samples (Figure 2a). Similar results were also observed both in the EA extract and the BE extract (Figure 2b,c). The existing distinguish was thought to be derived from the difference of wood species. The loading plot can further illustrate the key variance responsible for the distinction between the groups, and the loading plot of PC1 shows in the Figure 3. Peak at 15.12 min in the EW extract and peak at 23 min in the EA extract present high contribution for the classification (Figure 3a,b), while differences of the relative content of these peaks still exist in the P. santalinus samples ( Figure S1). This may be the reason why samples from P. santalinus were spread in the PCA score plot of the EW extract and the EA extract. Furthermore, the samples cultivated in different regions with different growth conditions could also affect the result of score plot. Additionally, the BE extract seems to provide the best separation because of the tightest sample distribution on the score plot. As shown in the loading plot of PC1 of the BE extract, p-Xylene (4.1 min) derived from the benzene-ethyl alcohol solvent also presents high contribution for the classification except for the peaks at 17.34, 17.58, 18.26, and 23.65 min (Figure 3c). This phenomenon is detrimental to wood identification in practice because the difference from the chemical composition of wood would be weakened.   Molecules 2019, 24 6 of 12

Multivariate Analyses
PCA methods were applied to the processed dataset to visualize the clustering trends between the two Pterocarpus species. Samples with similar values for the variables explained by the principal components appeared close together in the PCA score plot [28]. For the EW extract, the first principal component (PC1) represented 51.2% of the variance and the second principal components represented 15.5% of the total variance. Along the PC1 axis, the left side of the plot shows the cluster of P. tinctorius samples, and the right side depicts the P. santalinus samples (Figure 2a). Similar results were also observed both in the EA extract and the BE extract (Figure 2b,c). The existing distinguish was thought to be derived from the difference of wood species. The loading plot can further illustrate the key variance responsible for the distinction between the groups, and the loading plot of PC1 shows in the Figure 3. Peak at 15.12 min in the EW extract and peak at 23 min in the EA extract present high contribution for the classification (Figure 3a,b), while differences of the relative content of these peaks still exist in the P. santalinus samples ( Figure S1). This may be the reason why samples from P. santalinus were spread in the PCA score plot of the EW extract and the EA extract. Furthermore, the samples cultivated in different regions with different growth conditions could also affect the result of score plot. Additionally, the BE extract seems to provide the best separation because of the tightest sample distribution on the score plot. As shown in the loading plot of PC1 of the BE extract, p-Xylene (4.1 min) derived from the benzene-ethyl alcohol solvent also presents high contribution for the classification except for the peaks at 17.34, 17.58, 18.26, and 23.65 min (Figure 3c). This phenomenon is detrimental to wood identification in practice because the difference from the chemical composition of wood would be weakened.   OPLS-DA, a supervised multivariate analysis method, was constructed to further understand the differences between P. santalinus and P. tinctorius and to provide the information of the correlations between specific markers and each particular wood species [29,30]. The objective of OPLS-DA is to separate the systematic variation in X into two parts, one part which is linearly related to Y, and another part is orthogonal to Y, which leads to better class resolution in a discriminant problem [31]. Classification models were established using all the samples from the training set (see Table 2). The supervised OPLS-DA models for samples subjected to different kinds of extraction solvents all exhibited accurate differentiation performance of the explained fraction of variance of classes (R 2 Y = 0.949-0.978) and the cross-validated fraction of variance of classes (Q 2 = 0.944-0.97) according to cross-validation, which showed acceptable predictability for the wood species. To further validate the models, all the samples from the test set were used to test their predictive quality. The models generated with the GC-MS data of the EW extract present the highest predictive capacity (100%) for samples from the test set. As for the models based on the data of the EA extract and BE extract, one sample from the test set was classified incorrectly, and the predictive accuracy was only 83.33%. Due to the advantages of low toxicity, easy availability, low-cost, and highest predictive accuracy, EW was considered as a more suitable solvent in the wood identification of P. santalinus and P. tinctorius using GC-MS. Variable importance in projection (VIP) analysis was employed to provide the order of contribution of variables to the separation of clustering [29]. The contribution of the variables between the two groups increased with increasing VIP value [31]. The variables whose VIP value was higher than 3 and p-value obtained from the t-test was lower than 0.05, were selected as potential marker compounds with the significant differences between P. santalinus and P. tinctorius. Due to the highest predictive accuracy, only the EW extract was performed for the selection of potential marker compounds. For the EW extract, peaks at 17.58 (VIP value is 6.49) and 23.65 min (VIP value is 3.96) were considered as potential marker peaks. It indicated that spathulenol and pterostilbene were the marker compounds for the wood discrimination between P. santalinus and P. tinctorius, which was consistent with the previous analysis results (Figure 4). The results suggested that GC-MS coupled with statistical analyses had a high development and application potential to the wood trade and technology. A protocol suitable for wood identification using GC-MS and multivariate analyses was developed in this study ( Figure 5).
Molecules 2019, 24 7 of 12 OPLS-DA, a supervised multivariate analysis method, was constructed to further understand the differences between P. santalinus and P. tinctorius and to provide the information of the correlations between specific markers and each particular wood species [29,30]. The objective of OPLS-DA is to separate the systematic variation in X into two parts, one part which is linearly related to Y, and another part is orthogonal to Y, which leads to better class resolution in a discriminant problem [31]. Classification models were established using all the samples from the training set (see Table 2). The supervised OPLS-DA models for samples subjected to different kinds of extraction solvents all exhibited accurate differentiation performance of the explained fraction of variance of classes (R 2 Y = 0.949-0.978) and the cross-validated fraction of variance of classes (Q 2 = 0.944-0.97) according to cross-validation, which showed acceptable predictability for the wood species. To further validate the models, all the samples from the test set were used to test their predictive quality. The models generated with the GC-MS data of the EW extract present the highest predictive capacity (100%) for samples from the test set. As for the models based on the data of the EA extract and BE extract, one sample from the test set was classified incorrectly, and the predictive accuracy was only 83.33%. Due to the advantages of low toxicity, easy availability, low-cost, and highest predictive accuracy, EW was considered as a more suitable solvent in the wood identification of P. santalinus and P. tinctorius using GC-MS. Variable importance in projection (VIP) analysis was employed to provide the order of contribution of variables to the separation of clustering [29]. The contribution of the variables between the two groups increased with increasing VIP value [31]. The variables whose VIP value was higher than 3 and p-value obtained from the t-test was lower than 0.05, were selected as potential marker compounds with the significant differences between P. santalinus and P. tinctorius. Due to the highest predictive accuracy, only the EW extract was performed for the selection of potential marker compounds. For the EW extract, peaks at 17.58 (VIP value is 6.49) and 23.65 min (VIP value is 3.96) were considered as potential marker peaks. It indicated that spathulenol and pterostilbene were the marker compounds for the wood discrimination between P. santalinus and P. tinctorius, which was consistent with the previous analysis results (Figure 4). The results suggested that GC-MS coupled with statistical analyses had a high development and application potential to the wood trade and technology. A protocol suitable for wood identification using GC-MS and multivariate analyses was developed in this study ( Figure 5).

Materials and Chemicals
Twelve of the analyzed P. santalinus heartwood specimens and fourteen of P. tinctorius heartwood specimens were collected from curated xylaria collections (Table 3). All the specimens contain the information of their botanical voucher ID or scientific validation. Among these specimens, nine specimens of P. santalinus and eleven specimens of P. tinctorius were selected randomly as the training set for creating the classification models. The remaining six specimens were used as the test set for validation purposes. Ethanol absolute, ethanol (95%), and benzene were purchased from Beijing Chemical Works (Beijing, China). Ethyl acetate was bought from Fuchen Chemical Reagent Company (Tianjin, China).

Sample Preparation
To overcome the shortcomings of the traditional wood identification methods and develop a practical method as alternative, a mild condition for extraction was used in this study. All

Materials and Chemicals
Twelve of the analyzed P. santalinus heartwood specimens and fourteen of P. tinctorius heartwood specimens were collected from curated xylaria collections (Table 3). All the specimens contain the information of their botanical voucher ID or scientific validation. Among these specimens, nine specimens of P. santalinus and eleven specimens of P. tinctorius were selected randomly as the training set for creating the classification models. The remaining six specimens were used as the test set for validation purposes. Ethanol absolute, ethanol (95%), and benzene were purchased from Beijing Chemical Works (Beijing, China). Ethyl acetate was bought from Fuchen Chemical Reagent Company (Tianjin, China).

Sample Preparation
To overcome the shortcomings of the traditional wood identification methods and develop a practical method as alternative, a mild condition for extraction was used in this study. All heartwood specimens were dried at room temperature and ground into a fine powder using a 6770 Freezer/Mill (Spex SamplePrep, Metuchen, NJ, USA) with cycle conditions consisting of a 1 min precool, 2 min crush, and 1 min cool. Approximately 5 mg of heartwood powder was extracted ultrasonically with 1 mL solvent for 30 min at 25 • C (ultrasonic power of 50 W). The mixture was then centrifuged at 750× g for 2 min. Finally,~1 µL supernatant of each sample was used for GC-MS analysis. Three different kinds of solvents were used in this study, including 1:1 (v/v) mixture of ethanol and water (EW), ethyl acetate (EA), and benzene-ethanol (BE).

Apparatus and Chromatographic Conditions
GC-MS analyses were performed on a GC-MS (Agilent 7890A, Santa Clara, CA, USA) equipped with a 5975C mass spectrometer (Avondale, PA, USA). A HP-5MS capillary fused silica column (30 m × 250 µm i.d., 0.25 µm film thickness) was used for separation, and helium (99.999%) was used as carrier gas with a flow rate of 1 mL/min. The oven temperature program initiated at 60 • C, held for 2 min, then increased at 10 • C/min to 280 • C, and then held at this temperature for 5 min. The injector temperature was 260 • C. A sample of 0.5 µL was injected in the split mode injection. The mass spectrometric data were recorded in the range of 50 to 500 m/z. Three replicates were analyzed per sample.

Determination of Chemical Compounds
Peak deconvolution is a critical stage to discriminate coeluting compounds from multiple ions. Automated mass spectral deconvolution and identification system (AMDIS) is a common method for deconvolution of GC-MS data. Thus, the components eluting from GC-MS were extracted in the AMDIS and then mass spectral fragmentation patterns were compared with those stored in the National Institute of Standards and Technology (NIST, Gaithersburg, MD, USA) libraries and the mass spectra reported from the literatures.

Multivariate Analyses
All the GC-MS raw files were converted into NETCDF format, and then peak detection, identification, and alignment were performed using MS-DIAL software (v 2.74) [32]. Aligned peak area data based on the full GC-MS spectra were exported and normalized for the subsequent multivariate statistical analysis.
For EW extract and EA extract, a total of 78 GC-MS files were used for the subsequent statistical analysis (60 files as the training set and 18 files as the test set). For BE extract, 75 GC-MS files were used for the subsequent statistical analysis (57 files as the training set and 18 files as the test set) because three files from one sample of P. santalinus are invalid.
Principal component analysis (PCA) and OPLS-DA were widely applied with unsupervised and supervised test methods. These methods can reduce the dimensionality of raw data and provide a visualizing result for easy interpretation of complicated raw data. PCA and OPLS-DA analyses were conducted by SIMCA-P (14.1 Umetrics, Umea, Sweden) software. SPSS 22.0 (SPSS, Chicago, IL, USA) was used for the student's t test to determine if the data of the two species are significantly different.

Conclusions
A GC-MS and multivariate analyses approach was developed to establish a protocol for the discrimination of the endangered P. santalinus and non-endangered P. tinctorius wood species, which could be potentially used for wider application in wood identification field. A total of twenty volatile compounds were detected and tentatively identified in the three kinds of extracts, and these compounds included alcohols, stilbenoids, esters, aromatic hydrocarbons, ketones, miscellaneous, phenols, and flavonoids. Both the number of detected compounds and their relative content significantly differed between P. santalinus and P. tinctorius. Compared to the ethyl acetate extract and benzene-ethanol extract, the 1:1 mixture of ethanol and water extract performed with high predictive accuracy (100%). Spathulenol (17.58 min) and pterostilbene (23.65 min) were considered as the potential markers to characterize and differentiate 1:1 mixture of ethanol and water extracts of these two species. The results suggested that GC-MS was an effective analytical method for wood identification at the species level.
In the further study, a large-sized sample and more extraction methods, including soxhlet, would be inspected to investigate the effect of sample size and extraction methods on the classification results.
Supplementary Materials: The supplementary materials are available online.
Author Contributions: M.Z. performed the experiments, analyzed the data, and wrote the manuscript; Y.Y. and G.Z. conceived and designed the experiments, and revised the manuscript; J.G. revised this work critically for the analysis and interpretation of data and put forward some suggestions for the manuscript; B.L. and X.J. revised this work critically for the collection data and put forward some suggestions for the manuscript; All authors commented on the manuscript and approved the final form of manuscript.