GC-MS Fingerprinting Combined with Chemometric Methods Reveals Key Bioactive Components in Acori Tatarinowii Rhizoma

This present study aims to identify the key bioactive components in acorus tatarinowii rhizoma (ATR), a traditional Chinese medicine (TCM) with various bioactivities. Partial least squares regression (PLSR) was employed to describe the relationship between the radical scavenging activity and the volatile components. The PLSR model was improved by outlier elimination and variable selection and was evaluated by 10-fold cross-validation and external validation in this study. Based on the PLSR model, eleven chemical components were identified as the key bioactive components by variable importance in projection. The final PLS regression model with these components has good predictive ability. The Q2 was 0.8284, and the root mean square error for prediction was 2.9641. The results indicated that the eleven components could be a pattern to predict the radical scavenging activity of ATR. In addition, we did not find any specific relationship between the radical scavenging ability and the habitat of the ATRs. This study proposed an efficient strategy to predict bioactive components using the combination of quantitative chromatography fingerprints and PLS regression, and has potential perspective for screening bioactive components in complex analytical systems, such as TCM.


Introduction
Acori tatarinowii rhizoma (ATR) is a traditional Chinese medicine (TCM) used for thousands of years because of its low toxicity and various bioactivities. Many studies reported the significant clinical effect of ATR for the treatment of diseases, such as epilepsy [1,2] nervous disorders [3][4][5][6], depression [4], cancers [7], skin diseases [8], and Alzheimer's disease [9]. The research on ATR mainly focuses on its antioxidant activity. Some reports suggested that antioxidants can scavenge free radicals and reduce lipid peroxidation, protein peroxidation, and DNA damage of free radicals [10,11]. The studies on the antioxidants in food, plant materials, and TCMs recently attracted increasing attention [12,13]. However, most studies only evaluated the antioxidant activity of plant extracts or certain chemical components; whether these components are the key antioxidants in the extracts remains poorly understood. Many reports have proven the antioxidant activities of volatile oils in ATR [14]. However, the specific components responsible for these bioactivities remain unknown.
Volatile oils have complex chemical compounds [13], consisting of phenylpropanoid, sesquiterpene, oxygenated-sesquiterpene, monoterpene, and oxygenated-monoterpene [15]. Chromatographic fingerprint is used to characterize the complex chemical composition of TCMs [16,17]. This method was introduced by the World Health Organization to control the quality of TCMs and overcome the limitations when using few marker components [18,19]. Techniques, such as gas chromatography-mass spectrometry (GC-MS) [20], high performance liquid chromatography-diode array detector/mass spectrometry (HPLC-DAD/MS) [21,22], and high performance capillary electrophoresis-mass spectrometry (CE-MS) [23], are commonly used to measure the chromatographic fingerprints of TCMs. In this study, GC-MS fingerprinting was used to measure the complexity of volatile components in ATR.
Evaluating the bioactivities of each component of ATR volatiles is difficult due to its complexity. Chemometrics became popular because it can deal with groups of variables or discover sets of related predictors. Partial least squares regression (PLSR) [24,25], network-induced supervised learning [26], support vector machine [27], and some penalized methods were proposed for data analysis of complex analytical systems [28]. Multivariate calibration techniques are used to extract information from high throughput analytical data. For example, Liu et al. revealed the antioxidant components in Turpiniae Folium (TF) through PLSR using the information of multi-wavelength fingerprints generated by HPLC and the antioxidant capacity of TFs [18]. These techniques can help reveal the chemical features of TCMs with minimum sample preparation, together with reasonable accuracy and precision without expensive and time-consuming preliminary separation steps, which are usually required for complex systems [29][30][31].
This study aims to identify the key bioactive components in ATR for radical scavenging activity. The volatile components in the deliberately collected 49 ATR samples were analyzed by GC-MS. The bioactivity of the volatile oils in ATR was evaluated by 2,2-diohenyl-1-picryl-hydrazyl (DPPH) radical scavenging assay. Furthermore, a PLS regression model between the radical scavenging activity and the contents of volatile components was established and evaluated by using 10-fold cross-validation and external validation.

GC-MS Fingerprinting and DPPH Radical Scavenging Assay
The chromatographic fingerprints of 49 ATR samples were obtained by GC-MS analysis and were used to reflect the complexity of volatile chemical components. The representative total ion chromatogram (TIC) of ATR is shown in Figure 1. Eighty components were detected. All GC-MS data, including retention characteristics, peak intensities, and integrated mass spectra, of each sample were used for the qualitative and quantitative analyses. First, the automated mass-spectral deconvolution and identification system (Automatic Mass Spectral Deconvolution and Identification software (AMDIS) software, National Institute of Standards and Technology, Gaithersburg, MD, USA) was used for peak finding and deconvolution. Component identification was based on the comparison of the mass spectrum of putative component with those of the NIST 2005 Mass Spectral Library and NIST 2011. In addition, the chemical components were further identified by temperature-programmed retention indices. The detailed qualitative process was shown in our previous study [15]. Seventy-four volatiles were identified; nine of which, namely, methyleugenol, transmethylisoeugenol, linaool, α-pinene, (-)-terpinen-4-ol, α-terpieol, bornylacetate, α-asarone, and β-asarone, were identified by commercial standard substances. The relative concentration of an individual component was expressed by the ratio of peak area of one component to the internal standard (undecane) on the same TIC. The qualitative and quantitative results are shown in Table S1. The bioactivities of ATR essential oils were determined by the DPPH radical scavenging assay. The radical scavenging activity was expressed by the equivalent concentration of quercetin. The radical scavenging activities had a maximum value of 41.99 µg/mL, minimum value of 11.60 µg/mL, and mean value of 25.61 µg/mL. The radical scavenging activities of the 49 ATR samples were significantly different. Furthermore, the strongest and the weakest samples are both cultivated in Sichuan province. The result indicated that there is no specific relationship available between the radical scavenging ability and the habitat of the ATR.

PLSR Model between the Radical Scavenging Activity and the Contents of Volatile Components
Partial least squares regression models were established for the contents of volatile components (matrix X) and the results of DPPH radical scavenging assay (matrix Y) to describe the relationship between the radical scavenging activities and the chemical components. First, the dataset of 49 samples (49 × 80, samples × variables) were used to build the PLS regression model. Thirty-nine samples were selected as the calibration set by using the Kennard-Stone algorithm, and the remaining 10 samples were used as the validation set. For the calibration set, the maximum, minimum, and mean values of the radical scavenging activity were 41.99, 11.60, and 25.97 µg/mL, respectively. For the validation set, the maximum, minimum, and mean values of the radical scavenging activity were 41.27, 14.96, and 24.21 µg/mL, respectively. The distribution of the radical scavenging activity values was similar for the calibration and validation sets. The two datasets were The bioactivities of ATR essential oils were determined by the DPPH radical scavenging assay. The radical scavenging activity was expressed by the equivalent concentration of quercetin. The radical scavenging activities had a maximum value of 41.99 µg/mL, minimum value of 11.60 µg/mL, and mean value of 25.61 µg/mL. The radical scavenging activities of the 49 ATR samples were significantly different. Furthermore, the strongest and the weakest samples are both cultivated in Sichuan province. The result indicated that there is no specific relationship available between the radical scavenging ability and the habitat of the ATR.

PLSR Model between the Radical Scavenging Activity and the Contents of Volatile Components
Partial least squares regression models were established for the contents of volatile components (matrix X) and the results of DPPH radical scavenging assay (matrix Y) to describe the relationship between the radical scavenging activities and the chemical components. First, the dataset of 49 samples (49 × 80, samples × variables) were used to build the PLS regression model. Thirty-nine samples were selected as the calibration set by using the Kennard-Stone algorithm, and the remaining 10 samples were used as the validation set. For the calibration set, the maximum, minimum, and mean values of the radical scavenging activity were 41.99, 11.60, and 25.97 µg/mL, respectively. For the validation set, the maximum, minimum, and mean values of the radical scavenging activity were 41.27, 14.96, and 24.21 µg/mL, respectively. The distribution of the radical scavenging activity values was similar for the calibration and validation sets. The two datasets were reasonably partitioned. The first eight latent variables were used to establish the PLS regression model as determined by the 10-fold cross-validation. The R 2 and Q 2 values of the model were 0.8284 and 0.7824, respectively, as shown in Table 1. MCCV method was introduced to identify the outliers of the dataset and improve the predictive ability of the regression model. Monte-Carlo sampling was conducted to select 40 ATR data from the original data (49 × 80) to build the PLS regression model. The rest of the data were used as the validation set to evaluate the model. The distribution of the predicted values of DPPH radical scavenging assay for each sample was obtained after sampling 2000 times. The standard deviation and mean predicted values were calculated for each sample as shown in Figure 2. Samples 31, 32, 37, 38, 48, and 49 were identified as outliers in the X-axis and Y-axis directions. reasonably partitioned. The first eight latent variables were used to establish the PLS regression model as determined by the 10-fold cross-validation. The R 2 and Q 2 values of the model were 0.8284 and 0.7824, respectively, as shown in Table 1. MCCV method was introduced to identify the outliers of the dataset and improve the predictive ability of the regression model. Monte-Carlo sampling was conducted to select 40 ATR data from the original data (49 × 80) to build the PLS regression model. The rest of the data were used as the validation set to evaluate the model. The distribution of the predicted values of DPPH radical scavenging assay for each sample was obtained after sampling 2000 times. The standard deviation and mean predicted values were calculated for each sample as shown in Figure 2. Samples 31, 32, 37, 38, 48, and 49 were identified as outliers in the X-axis and Y-axis directions.  The new dataset (43 × 80) was generated after outlier elimination. Thirty-four samples were selected as the calibration set by the Kennard-Stone algorithm, and the remaining nine samples were used as the validation set. The distribution of radical scavenging activity values was similar The new dataset (43 × 80) was generated after outlier elimination. Thirty-four samples were selected as the calibration set by the Kennard-Stone algorithm, and the remaining nine samples were used as the validation set. The distribution of radical scavenging activity values was similar before and after the outlier elimination. The new calibration set and validation set were reasonably partitioned.
The new PLS regression model is shown in Figure 4B and was established by using the first six latent variables selected by 10-fold cross validation, as shown in Figure 3A. The R 2 and Q 2 values of the model were improved to 0.9090 and 0.8124, respectively, as shown in Table 1 before and after the outlier elimination. The new calibration set and validation set were reasonably partitioned. The new PLS regression model is shown in Figure 4B and was established by using the first six latent variables selected by 10-fold cross validation, as shown in Figure 3A. The R 2 and Q 2 values of the model were improved to 0.9090 and 0.8124, respectively, as shown in Table 1. RMSECV decreased from 5.0261 to 3.4109, and RMSEP decreased from 3.2210 to 3.0996. Other parameters for the PLS regression model were compared in Table 1. The results indicated that the efficiency of the model was significantly improved after the outlier elimination. Before the PLS model was selected, both cross validation and external validation was used. The results indicated that the models are not overfitted.

Key Bioactive Components in ATR for Radical Scavenging Activity
The present study aims to identify the key components in volatile oils that correspond to the radical scavenging activity of ATR. Thus, the regression coefficient (RC) and variable importance in projection (VIP) methods were employed. A higher absolute value of RC or VIP indicates the bigger contribution of this specific component. These values are scaled and centered, so they are comparable. The RC and VIP values of the 80 components were plotted from the PLSR model (43 × 80) as shown in Figure 4B,C. In addition, combination effect of variables was taken into account in this study [32]. Predictive ability of different variable combinations was compared in order to select the best components' pattern and help us to define the threshold of variable selection, as shown in Figure 5A. For VIP method, when the number of variables is eleven, Q 2 of the PLSR model obtained the best results, as shown in Figure 5A and Table 1. The results of VIP are better than RC. Thus, the first eleven components were identified as responsible for the radical scavenging activity of ATR volatile oil in this study. They are estragole, methyleugenol, cis-methylisoeugenol, isoshyobunone, δ-cadinene, calacorene, γ-asarone, β-asarone, α-asarone, calamusenone, isocalamendiol. Previous studies on ATR volatile oil mainly focused on the two components, α-asarone and β-asarone. These components account for approximately 95% of ATR volatile oils [33]. Thus far, several publications have reported the antioxidant ability of β-asarone, α-asarone [34], and the isoshyobunone, isocalamendiol, and calacorene are the constituents of essential oils with antioxidant activity [35]. The other components are not reported for antioxidant bioactivity, but having other bioactivities. For example, γ-asarone exhibited its fungitoxicity against Aspergillus flavus [36]. The isocalamendiol in

Key Bioactive Components in ATR for Radical Scavenging Activity
The present study aims to identify the key components in volatile oils that correspond to the radical scavenging activity of ATR. Thus, the regression coefficient (RC) and variable importance in projection (VIP) methods were employed. A higher absolute value of RC or VIP indicates the bigger contribution of this specific component. These values are scaled and centered, so they are comparable. The RC and VIP values of the 80 components were plotted from the PLSR model (43 × 80) as shown in Figure 4B,C. In addition, combination effect of variables was taken into account in this study [32]. Predictive ability of different variable combinations was compared in order to select the best components' pattern and help us to define the threshold of variable selection, as shown in Figure 5A. For VIP method, when the number of variables is eleven, Q 2 of the PLSR model obtained the best results, as shown in Figure 5A and Table 1. The results of VIP are better than RC. Thus, the first eleven components were identified as responsible for the radical scavenging activity of ATR volatile oil in this study. They are estragole, methyleugenol, cis-methylisoeugenol, isoshyobunone, δ-cadinene, calacorene, γ-asarone, β-asarone, α-asarone, calamusenone, isocalamendiol. Previous studies on ATR volatile oil mainly focused on the two components, α-asarone and β-asarone. These components account for approximately 95% of ATR volatile oils [33]. Thus far, several publications have reported the antioxidant ability of β-asarone, α-asarone [34], and the isoshyobunone, isocalamendiol, and calacorene are the constituents of essential oils with antioxidant activity [35]. The other components are not reported for antioxidant bioactivity, but having other bioactivities. For example, γ-asarone exhibited its fungitoxicity against Aspergillus flavus [36]. The isocalamendiol in Zibu Piyin recipe exhibited ameliorating effects on scopolamine-induced memory dysfunction [37]. Calamusenone have insecticidal and repellant activities [38,39]. δ-Cadinene in Psidium cattleianum Sabine has antimicrobial and antioxidant activities [40]. The present study firstly reveals the eleven volatiles as a pattern for predict the radical scavenging activity of ATR. Our findings provide a new focus for the research of bioactivities of ATR. Zibu Piyin recipe exhibited ameliorating effects on scopolamine-induced memory dysfunction [37]. Calamusenone have insecticidal and repellant activities [38,39]. δ-Cadinene in Psidium cattleianum Sabine has antimicrobial and antioxidant activities [40]. The present study firstly reveals the eleven volatiles as a pattern for predict the radical scavenging activity of ATR. Our findings provide a new focus for the research of bioactivities of ATR.

Extraction of Volatile Oil
All samples were dried at 40 °C for 2 h, followed by pulverization. Volatile oil was extracted according to the procedure described in the Chinese Pharmacopoeia [41]. Briefly, 800 mL of distilled water and 80 g of the sample were added to the standard extractor. The sample was extracted for 4 h with 4 D/min reflux rate. Moderate anhydrous sodium sulfate was added to remove the trace amounts of water. All volatile oils were stored in brown syringes at 4 °C, followed by GC-MS analysis. Undecane was selected as the internal standard in this study.

DPPH Radical Scavenging Assay
The bioactivities of the 49 ATR samples were determined using the DPPH radical scavenging assay. The DPPH radical has a maximum absorbance at 517 nm, which disappeared upon reduction in the presence of antioxidant components in each sample. This phenomenon resulted in a negative correlation between the remaining absorbance at 517 nm and the radical scavenging activity of the sample.
The DPPH assay was conducted as described by Pérez-Meseguer et al. [42] with some modifications. First, the extract was filtered through a 0.45 µm membrane and subsequently diluted 10 times with methanol. The assay mixture comprised ATR extract (0.2 mL) and DPPH solution (41 µg/mL in anhydrous ethanol, 3.8 mL). The quercetin standard solution was used as the positive

Extraction of Volatile Oil
All samples were dried at 40 • C for 2 h, followed by pulverization. Volatile oil was extracted according to the procedure described in the Chinese Pharmacopoeia [41]. Briefly, 800 mL of distilled water and 80 g of the sample were added to the standard extractor. The sample was extracted for 4 h with 4 D/min reflux rate. Moderate anhydrous sodium sulfate was added to remove the trace amounts of water. All volatile oils were stored in brown syringes at 4 • C, followed by GC-MS analysis. Undecane was selected as the internal standard in this study.

DPPH Radical Scavenging Assay
The bioactivities of the 49 ATR samples were determined using the DPPH radical scavenging assay. The DPPH radical has a maximum absorbance at 517 nm, which disappeared upon reduction in the presence of antioxidant components in each sample. This phenomenon resulted in a negative correlation between the remaining absorbance at 517 nm and the radical scavenging activity of the sample.
The DPPH assay was conducted as described by Pérez-Meseguer et al. [42] with some modifications. First, the extract was filtered through a 0.45 µm membrane and subsequently diluted 10 times with methanol. The assay mixture comprised ATR extract (0.2 mL) and DPPH solution (41 µg/mL in anhydrous ethanol, 3.8 mL). The quercetin standard solution was used as the positive control, which was diluted with methanol to obtain serial dilutions (5-50 µg/mL). The assay mixture was kept in the dark at room temperature. After 30 min, the absorbance was measured at 517 nm using a UV-Vis spectrophotometer. The radical scavenging activity was calculated by standard curve. DPPH assay and GC-MS analysis were simultaneously conducted to avoid possible variations caused by the degradation of the samples during storage. Each measurement was performed in triplicates. The antioxidant capacity was expressed by the equivalent concentration of quercetin.

GC-MS Analysis
GC-MS analyses were conducted with a Shimadzu GC-2010 gas chromatograph (Kyoto, Japan) coupled with a Shimadzu QP2010 mass spectrometer. The column initial temperature was maintained at 60 • C for 1 min. The column temperature was programmed from 60 to 140 • C at the rate of 5 • C/min, held for 20 min at 140 • C, ramped at a rate of 10 • C/min to 220 • C, and then held for 5 min at 220 • C. The sample (1.0 µL) was injected into a fused-silica capillary column OV-1 (30 m × 0.25 mm with i.d. film thickness of 0.25 µm) with a split ratio of 1:10. The flow rate of the high-purity helium carrier gas was 0.8 mL/min. The injector temperature was 250 • C, and the septum purge flow rate was 3 mL/min. The mass conditions were as follows: interface temperature, 250 • C; ion source temperature, 200 • C; ionization voltage, 70 eV; detector voltage, 0.9 kV; solvent delay, 3 min; and full scan mode in the 35-800 m/z mass ranges with 0.2/s scan velocities.
Quality control sample was used in this study. For the 49 ATR samples, 10 uL volatile oil of each sample was pooled, then vigorously vortexed for 1 min, to be the quality control sample. The sample injection order is random, one quality control sample after four ATR samples.

Partial Least Squares Regression (PLSR)
PLS can help us to obtain the latent variables, then, build a multivariate linear model between two data matrices, X and Y [43]. This method is very efficient, especially for high dimensional datasets. The PLSR model could be improved by optimizing the number of latent variables. In addition, variable selection is very important to optimize a PLSR model.
In this study, a regression model between the radical scavenging activities and the contents of volatile components was constructed by PLS. Data was centered before the PLS model was developed. Data were partitioned into calibration (80%) and validation (20%) sets using Kennard-Stone algorithm [44]. The number of latent variables were optimized by 10-fold cross validation for the calibration set. The overfitting possibilities of the PLSR model were evaluated by 10-fold cross validation and external validation. The established PLSR models were evaluated using the root mean square error of cross-validation (RMSECV), root mean square error of calibration (RMSEC), root mean square error of prediction (RMSEP), determination coefficient for calibration set (R 2 ), and determination coefficient for validation set (Q 2 ) [18,43].

Outlier Identification
Outliers are the data that fall outside the population and were caused by measurement error, sudden changes in experiment conditions, and sample properties [45]. Outliers greatly influence the regression solution, and the existence of such data points might lead to considerable deviations from normality [46]. In this study, the outliers were identified based on Monte-Carlo cross-validation (MCCV) [47]. First, the original data was sampled 2000 times. A fixed percentage (80%) of the samples was randomly selected to build the PLS regression model, and the rest of the data was used to validate the model. The prediction error distribution for each sample was obtained after 2000 times sampling. The distribution of mean value and standard deviation were calculated for each sample. A four-zone scatter plot was established by using the mean value as the coordinate in the X-axis and the standard deviation as the coordinate in the Y-axis, as shown in Figure 5. In this study, 2.5 times of average value (mean value or standard deviation) were selected on the X-and Y-axes to divide the graph into a four-zone scatter plot, according to the suggestion of Reference [38]. PLSR and MCCV were performed using Matlab 7.10.0 software (R2010a, The Math Works Inc., Natick, MA,USA).

Conclusions
This study proposed an efficient strategy to reveal the key chemical components responsible for the bioactivity of ATR. Chemical components in ATR volatile oils were profiled by GC-MS. A PLSR model between the radical scavenging activity and the contents of volatile components was established. After the outliers were eliminated by MCCV, and informative variables were selected by VIP and RC, the PLSR model was improved significantly and was proven highly reliable. Eleven components, namely, estragole, methyleugenol, cis-methylisoeugenol, isoshyobunone, δ-cadinene, calacorene, γ-asarone, β-asarone, α-asarone, calamusenone, isocalamendiol, were identified using the VIP method. These components formed a pattern and are probably responsible for the radical scavenging activity of ATR and are worthy of further study.