The Application of Vibrational Spectroscopy Techniques in the Qualitative Assessment of Material Traded as Ginseng

The name “ginseng” is collectively used to describe several plant species, including Panax ginseng (Asian/Oriental ginseng), P. quinquefolius (American ginseng), P. pseudoginseng (Pseudoginseng) and Eleutherococcus senticosus (Siberian ginseng), each with different applications in traditional medicine practices. The use of a generic name may lead to the interchangeable use or substitution of raw materials which poses quality control challenges. Quality control methods such as vibrational spectroscopy-based techniques are here proposed as fast, non-destructive methods for the distinction of four ginseng species and the identification of raw materials in commercial ginseng products. Certified ginseng reference material and commercial products were analysed using hyperspectral imaging (HSI), mid-infrared (MIR) and near-infrared (NIR) spectroscopy. Principal component analysis (PCA) and (orthogonal) partial least squares discriminant analysis models (OPLS-DA) were developed using multivariate analysis software. UHPLC-MS was used to analyse methanol extracts of the reference raw materials and commercial products. The holistic analysis of ginseng raw materials revealed distinct chemical differences using HSI, MIR and NIR. For all methods, Eleutherococcus senticosus displayed the greatest variation from the three Panax species that displayed closer chemical similarity. Good discrimination models with high R2X and Q2 cum vales were developed. These models predicted that the majority of products contained either /P. ginseng or P. quinquefolius. Vibrational spectroscopy and HSI techniques in tandem with multivariate data analysis tools provide useful alternative methods in the authentication of ginseng raw materials and commercial products in a fast, easy, cost-effective and non-destructive manner.


Introduction
Ginseng, an ancient herbal medicine with a long history of traditional use, is one of the most popular and best-selling herbal medicines in the world [1,2]. The word ginseng, literally meaning "man root", refers to the characteristic shape of the roots which resemble the human body and the legs of a man [3]. Traditionally, the term ginseng was applied to preparations formulated using Panax species, but nowadays it refers to any product said to have adaptogenic or restorative properties [2]. This gives rise to quality control problems due to botanical misidentification. The word "ginseng" refers to Panax ginseng C.A.Mey. (Asian/Oriental ginseng-Korean and Chinese), P. quinquefolius

Qualitative Differentiation of Ginseng Raw Materials Using HSI Spectroscopy
The use of HSI to determine chemical variation between the different ginseng species yielded the results presented in Figure 1A-D.  [1] score value; (B) Corresponding score image of t [1]; (C) pixel clusters coloured according to t [2] score value; (D) corresponding score image of t [2].
The PCA model demonstrated good calibration statistics following mean centering and standard normal variate (SNV) pre-treatment of the data set. Three principal components (PCs) were used to model 95.3% variance in the data with 90.8% variation explained along PC1 and PC2. The different PC combinations were assessed in the score plot and the combination that displayed the greatest variance, i.e., PC1 vs. PC2, was selected. The score scatter plot ( Figure 1A) shows two distinct clusters separated along PC1 with a variation of 84.8%. When coloured according to the t [1] (PC1) score value which corresponds to the score image ( Figure 1B), it is clear that E. senticosus (red-yellow) is chemically distinct from the other Panax species (blue) which are represented by lower score values (blue) in the score image ( Figure 1B). The pixel scatter plot thus clearly demonstrates chemical segregation of E. senticosus (PC1+) pixels from the pixel cluster representing the three Panax species (PC1−). Evaluation of PC2 displayed minimal chemical variation between the three Panax species where only 6% variation was observed. Figure 1C shows the pixel scatter plot coloured according to the t [2] (PC2) score value which identifies P. quinquefolius pixels (orange-yellow cluster) on PC2+, P. pseudoginseng (yellow-green cluster) overlapping with P. quinquefolius while P. ginseng (blue cluster) occupies PC2−. The small variation observed in the scatter plot can also be seen in the score image where P. quinquefolius powder has an orange-yellow score value in the t [2] image compared to other species ( Figure 1D). Variation modelled along PC3 did not further differentiate the three species significantly. The PCA model demonstrated good calibration statistics following mean centering and standard normal variate (SNV) pre-treatment of the data set. Three principal components (PCs) were used to model 95.3% variance in the data with 90.8% variation explained along PC1 and PC2. The different PC combinations were assessed in the score plot and the combination that displayed the greatest variance, i.e., PC1 vs. PC2, was selected. The score scatter plot ( Figure 1A) shows two distinct clusters separated along PC1 with a variation of 84.8%. When coloured according to the t [1] (PC1) score value which corresponds to the score image ( Figure 1B), it is clear that E. senticosus (red-yellow) is chemically distinct from the other Panax species (blue) which are represented by lower score values (blue) in the score image ( Figure 1B). The pixel scatter plot thus clearly demonstrates chemical segregation of E. senticosus (PC1+) pixels from the pixel cluster representing the three Panax species (PC1´). Evaluation of PC2 displayed minimal chemical variation between the three Panax species where only 6% variation was observed. Figure 1C shows the pixel scatter plot coloured according to the t [2] (PC2) score value which identifies P. quinquefolius pixels (orange-yellow cluster) on PC2+, P. pseudoginseng (yellow-green cluster) overlapping with P. quinquefolius while P. ginseng (blue cluster) occupies PC2´. The small variation observed in the scatter plot can also be seen in the score image where P. quinquefolius powder has an orange-yellow score value in the t [2] image compared to other species ( Figure 1D). Variation modelled along PC3 did not further differentiate the three species significantly. Following PCA, PLS-DA classification modelling was performed and Figure 2A shows the PLS-DA Y-image colour-coded according to the species. The corresponding pixel scatter plot is shown in Figure 2B demonstrating the chemical distinction of E. senticosus. Separation of the three Panax species along PLS factor 2 is minimal (5.9%) between P. ginseng (blue), P. quinquefolius (green) and P. pseudoginseng (red) as observed in the pixel scatter plot. Pixel cluster overlap between P. quinquefolius (green) and P. pseudoginseng (red) is observed demonstrating that the two species are chemically more similar. The use of the HSI technique has not only proven useful for differentiation of plant species misidentified due to morphological similarities [11,12] but differentiation of fungal [13] and algae species [14] has also been achieved. The developed PLS-DA model was then used to predict a test set that comprised of authentic reference samples and commercial products to identify raw material constituents present in commercial ginseng products. Y-image colour-coded according to the species. The corresponding pixel scatter plot is shown in Figure 2B demonstrating the chemical distinction of E. senticosus. Separation of the three Panax species along PLS factor 2 is minimal (5.9%) between P. ginseng (blue), P. quinquefolius (green) and P. pseudoginseng (red) as observed in the pixel scatter plot. Pixel cluster overlap between P. quinquefolius (green) and P. pseudoginseng (red) is observed demonstrating that the two species are chemically more similar. The use of the HSI technique has not only proven useful for differentiation of plant species misidentified due to morphological similarities [11,12] but differentiation of fungal [13] and algae species [14] has also been achieved. The developed PLS-DA model was then used to predict a test set that comprised of authentic reference samples and commercial products to identify raw material constituents present in commercial ginseng products.  Figure 3 shows the prediction image (test set) consisting of known reference material and commercial products (P1-P8) coloured according to the predicted classes. The predictive ability of the model in classifying new samples was confirmed as the predicted species for reference materials matched the known identity. Additionally, the high Q 2 value (0.78) also demonstrated that the PLS-DA model would accurately predict the raw material constituents of commercial products. Prediction of the commercial products classified four (P1, P5, P6, P8) of the eight products as containing P. ginseng, one (P2) containing P. pseudoginseng, two samples comprised of more than one species (P3 and P7) and for one sample (P4) the ingredients could not be matched to any of the ginseng reference materials used for calibration ( Figure 3). Table 1 provides a summary of the prediction results with the percentage pixel abundance corresponding to each predicted species for the commercial samples. In the present study, it was possible to detect different ginseng species present in commercial products from scanning the products and introducing these into a robust and well calibrated model. Quality control aspects that have become problematic to the herbal products industry include the misidentification, mislabelling, substitution and adulteration of raw materials and finished products. With the aid of HSI and multivariate data analysis techniques, these problems can be overcome by introducing quality control procedures that allow on-line bulk analysis of samples to visually assess the chemical composition and spatial distribution of ingredients where necessary.  Figure 3 shows the prediction image (test set) consisting of known reference material and commercial products (P1-P8) coloured according to the predicted classes. The predictive ability of the model in classifying new samples was confirmed as the predicted species for reference materials matched the known identity. Additionally, the high Q 2 value (0.78) also demonstrated that the PLS-DA model would accurately predict the raw material constituents of commercial products. Prediction of the commercial products classified four (P1, P5, P6, P8) of the eight products as containing P. ginseng, one (P2) containing P. pseudoginseng, two samples comprised of more than one species (P3 and P7) and for one sample (P4) the ingredients could not be matched to any of the ginseng reference materials used for calibration ( Figure 3). Table 1 provides a summary of the prediction results with the percentage pixel abundance corresponding to each predicted species for the commercial samples. In the present study, it was possible to detect different ginseng species present in commercial products from scanning the products and introducing these into a robust and well calibrated model. Quality control aspects that have become problematic to the herbal products industry include the misidentification, mislabelling, substitution and adulteration of raw materials and finished products. With the aid of HSI and multivariate data analysis techniques, these problems can be overcome by introducing quality control procedures that allow on-line bulk analysis of samples to visually assess the chemical composition and spatial distribution of ingredients where necessary.

Mid-Infrared Spectroscopy Demonstrated Spectral Differences in Ginseng Root Powders
HSI is a fairly new technique that is currently under investigation in the field of natural products research and the results must be complemented using other independent techniques. To confirm the spectral differences observed using HSI, MIR was employed to analyse the spectral features of the reference materials by comparison of the spectra using the quick-compare function in OPUS ® . The technique allowed for chemical correlation of each sample spectrum against the reference spectrum, i.e., P. ginseng. The spectral correlation results (Figure 4) clearly demonstrated high spectral/chemical correlation (>99%) between P. ginseng (A) and P. quinquefolius (B) as well as P. ginseng (A) and P. pseudoginseng (D). E. senticosus (C) was shown to have a lower correlation (80.92%) confirming its chemical distinctness. Clear distinct signals could be observed in the fingerprint region, 1450-500 cm −1 for E. senticosus while the signal patterns for the three Panax species appeared similar. This is not surprising as E. senticosus does not belong to the Panax genus and thus chemotaxonomically, it would exhibit a different chemical profile when compared to other species. To use this information for product analysis, multivariate analysis techniques were applied to the MIR data of replicate samples. Orthogonal projections to latent structures-discriminant analysis (OPLS-DA) models were developed and used to predict raw material constituents in the commercial ginseng products.

Mid-Infrared Spectroscopy Demonstrated Spectral Differences in Ginseng Root Powders
HSI is a fairly new technique that is currently under investigation in the field of natural products research and the results must be complemented using other independent techniques. To confirm the spectral differences observed using HSI, MIR was employed to analyse the spectral features of the reference materials by comparison of the spectra using the quick-compare function in OPUS ® . The technique allowed for chemical correlation of each sample spectrum against the reference spectrum, i.e., P. ginseng. The spectral correlation results (Figure 4) clearly demonstrated high spectral/chemical correlation (>99%) between P. ginseng (A) and P. quinquefolius (B) as well as P. ginseng (A) and P. pseudoginseng (D). E. senticosus (C) was shown to have a lower correlation (80.92%) confirming its chemical distinctness. Clear distinct signals could be observed in the fingerprint region, 1450-500 cm´1 for E. senticosus while the signal patterns for the three Panax species appeared similar. This is not surprising as E. senticosus does not belong to the Panax genus and thus chemotaxonomically, it would exhibit a different chemical profile when compared to other species. To use this information for product analysis, multivariate analysis techniques were applied to the MIR data of replicate samples.
Orthogonal projections to latent structures-discriminant analysis (OPLS-DA) models were developed and used to predict raw material constituents in the commercial ginseng products. The MIR data of authentic reference samples was initially analysed using PCA to observe any trends and variations within the data. As expected, some variation was observed within the data (results not shown). Following PCA, PLS-DA and OPLS-DA were applied to observe variation related to species differentiation and predict the raw materials present in each commercial product. Results of the OPLS-DA models are presented here due to ease of interpretation. Figure 5A is an OPLS-DA score plot demonstrating variation between the four ginseng reference materials. The greatest variation was observed along the first predictive component (Pp1) (88%) which clearly separated E. senticosus from the Panax species. Only 7% variation could be attributed to chemical separation of the Panax species along Pp2 where P. quinquefolius and P. pseudoginseng also demonstrated closer association while P. ginseng seemed more distinct. This correlates with the HSI results. The modelled variation responsible for separating the species using three predictive components was 98.3% (R 2 Xcum = 0.983). The model was built on pareto-scaled and SNV pre-treated data. The predictive ability of the model estimated by cross validation was 92.4% (Q 2 cum = 0.924), which demonstrates the reliability of the model in predicting future samples. Following discriminant The MIR data of authentic reference samples was initially analysed using PCA to observe any trends and variations within the data. As expected, some variation was observed within the data (results not shown). Following PCA, PLS-DA and OPLS-DA were applied to observe variation related to species differentiation and predict the raw materials present in each commercial product. Results of the OPLS-DA models are presented here due to ease of interpretation. Figure 5A is an OPLS-DA score plot demonstrating variation between the four ginseng reference materials. The greatest variation was observed along the first predictive component (Pp1) (88%) which clearly separated E. senticosus from the Panax species. Only 7% variation could be attributed to chemical separation of the Panax species along Pp2 where P. quinquefolius and P. pseudoginseng also demonstrated closer association while P. ginseng seemed more distinct.
This correlates with the HSI results. The modelled variation responsible for separating the species using three predictive components was 98.3% (R 2 Xcum = 0.983). The model was built on pareto-scaled and SNV pre-treated data. The predictive ability of the model estimated by cross validation was 92.4% (Q 2 cum = 0.924), which demonstrates the reliability of the model in predicting future samples. Following discriminant analysis, the commercial products were introduced into the OPLS-DA model as a test set and the classification results in relation to the reference material, are presented in Figure 5B. analysis, the commercial products were introduced into the OPLS-DA model as a test set and the classification results in relation to the reference material, are presented in Figure 5B. The proximity of the commercial products to the reference material in the score scatter plot is an indication of the likelihood of a product to contain ingredients belonging to that species. The majority of the commercial products (seven out of eight) were predicted to contain either P. ginseng (PG) or P. quinquefolius (PQ), while E. senticosus (ES) and P. pseudoginseng (PP) were unlikely raw material constituents in any of the commercial products according to MIR analysis. As can be observed from the score plot, products 1, 5, 7 and 8 showed close association with P. ginseng while products 3, 4 and 6 were closer to P. quinquefolius. Although product 2 is likely to be associated with P. quinquefolius, its occurrence as a strong outlier outside of model boundary affects the accuracy of the prediction. To statistically assess the observed relationship between the products and the reference materials, a classification list for discriminant analysis was obtained for the data and this is illustrated in Table 2.
The table was constructed based on supervised classification of the reference materials into The proximity of the commercial products to the reference material in the score scatter plot is an indication of the likelihood of a product to contain ingredients belonging to that species. The majority of the commercial products (seven out of eight) were predicted to contain either P. ginseng (PG) or P. quinquefolius (PQ), while E. senticosus (ES) and P. pseudoginseng (PP) were unlikely raw material constituents in any of the commercial products according to MIR analysis. As can be observed from the score plot, products 1, 5, 7 and 8 showed close association with P. ginseng while products 3, 4 and 6 were closer to P. quinquefolius. Although product 2 is likely to be associated with P. quinquefolius, its occurrence as a strong outlier outside of model boundary affects the accuracy of the prediction. To statistically assess the observed relationship between the products and the reference materials, a classification list for discriminant analysis was obtained for the data and this is illustrated in Table 2. The table was constructed based on supervised classification of the reference materials into classes (dummy Y-variables). Through cross validation each sample was taken out of the model and predicted for fitness into any of the four classes as demonstrated by the predicted Y values (YPredPS). Cross validation results for the reference materials show that the observations are predicted well into their respective classes as YpredPS > 0.65 in all cases. To determine class membership of the commercial products, the products were introduced into the model. By matching the predicted Y of each product to each class (species), it is clear that products 2, 3, 4 and 6 were predicted as P. quinquefolius while products 1, 5, 7 and 8 were predicted as P. ginseng (YPredPS > 0.65). Although product 3 is predicted as P. quinquefolius, according to the YPredPS value the product could also be P. ginseng since the value (0.46) falls within the borderline range (0.35 < YPredPS < 0.65). Using MIR spectroscopy, it was possible to clearly distinguish the four ginseng reference materials and further tentatively identify raw material constituents in eight commercial products.

Near-Infrared Spectroscopy Demonstrated Spectral Differences in Ginseng Root Powders
The usefulness of NIR spectroscopy in assessing variations among the ginseng reference materials was demonstrated. Results showed obvious differences between the species using PCA (results not shown) and OPLS-DA ( Figure 6A).
The total variation modelled using a three predictive component OPLS-DA model was 99.6% (R 2 X cum = 0.996) and the predictive ability of the model was 95.3% (Q 2 cum = 0.953). As with MIR, the NIR data was pareto-scaled and SNV filtered to produce the best model statistics. Figure 6A shows the score scatter plot displaying chemical variation between the species. E. senticosus was clearly separated from Panax species along Pp1 with modelled variation of 81.0%. Panax ginseng clustered on Pp2´, separate from P. quinquefolius and P. pseudoginseng that clustered in close proximity to each other on Pp2+. The variation modelled along Pp2 was 14.6%. The prediction of raw material constituents based on NIR showed products 1, 3, 6 and 8 in close proximity to P. quinquefolius, product 2 close to P. pseudoginseng and products 5 and 7 close to E. senticosus. Product 4 was a strong outlier in the direction PC2+ resulting in a skewed model thus for clear visualisation of the observations, product 4 was excluded from the score plot for reporting purposes. To confirm visual assessment of the score plot, the classification list was obtained and the results are presented in Table 3.
is predicted as borderline P. quinquefolius. Although some differences have been observed between NIR predictions and other spectroscopy techniques, it is clear that most products contain either P. ginseng or P. quinquefolius based on both NIR and MIR predictions. This could be due to the chemical similarities between P. ginseng and P. quinquefolius, as demonstrated using all three techniques. Thus, it was difficult for the models to conclusively classify the products as containing one of either two species and so borderline predictions were observed. Spectroscopy also suggests that E. senticosus is not a common ingredient in the ginseng products even though two of the products (products 3 and 6) listed E. senticosus as ingredients according to the package insert. This may be a case of mislabelling or misidentification of the raw material which could translate to a consumer taking a herbal product with the hope of attaining a specific therapeutic outcome (as per label ingredient) which will not be achieved as the expected ingredient will be absent in the product. This is one of the many quality concerns in the herbal products industry and hence the need to develop robust quality control protocols based on holistic approaches. The results thus demonstrate the need for more robust, fast, holistic approaches in the analysis of herbal products to ensure standardisation and quality products that deliver the desired effects to the consumer. The classification list shows that five of the eight products (1, 3, 4, 6 and 8) were classified as either P. quinquefolius or P. ginseng. However, NIR predicts products 1 and 4 to be either P. quinquefolius or P. ginseng as shown by YPredPS > 0.65 for both products. Although the model predicts products 6 and 8 as P. quinquefolius, it also classifies these as borderline P. ginseng (YPredPS < 0.65 but > 0.35). Product 2 is predicted as P. pseudoginseng while products 5 and 7 are predicted as E. senticosus. Product 3 is predicted as borderline P. quinquefolius. Although some differences have been observed between NIR predictions and other spectroscopy techniques, it is clear that most products contain either P. ginseng or P. quinquefolius based on both NIR and MIR predictions. This could be due to the chemical similarities between P. ginseng and P. quinquefolius, as demonstrated using all three techniques. Thus, it was difficult for the models to conclusively classify the products as containing one of either two species and so borderline predictions were observed. Spectroscopy also suggests that E. senticosus is not a common ingredient in the ginseng products even though two of the products (products 3 and 6) listed E. senticosus as ingredients according to the package insert. This may be a case of mislabelling or misidentification of the raw material which could translate to a consumer taking a herbal product with the hope of attaining a specific therapeutic outcome (as per label ingredient) which will not be achieved as the expected ingredient will be absent in the product. This is one of the many quality concerns in the herbal products industry and hence the need to develop robust quality control protocols based on holistic approaches. The results thus demonstrate the need for more robust, fast, holistic approaches in the analysis of herbal products to ensure standardisation and quality products that deliver the desired effects to the consumer.

UHPLC-MS Demonstrated Chromatographic Differences in Ginseng Root Powders
As chromatographic techniques are commonly used in quality analyses both in industry and research, UHPLC-MS was selected to corroborate the results obtained using the three vibrational spectroscopic techniques. As with MIR and NIR, an untargeted metabolomics approach was used to analyse the full chromatographic profile. PCA and OPLS-DA models were constructed and the resulting score plots were analysed for possible chemical variation in the methanol extracts of ginseng reference materials. The results for the PCA model were not shown as they were similar to the OPLS-DA plot. The total modelled variation related to species differentiation in the OPLS-DA model was 93.8% (R 2 Xcum = 0.938) and the predictive ability of the model was estimated at 98.6% (Q 2 cum = 0.986) which shows that the model can be used for the accurate prediction of commercial products. Figure 7A is the OPLS-DA score plot showing chemical distinction of the four species where variation along Pp1 mainly separated E. senticosus (Pp1´) and P. pseudoginseng (Pp1+).   According to the chromatographic profiles of the methanol extracts, P. ginseng and P. quinquefolius share similar chemical profiles as these could not be separated along both Pp1 and Pp2. A loadings line plot was used to identify putative biomarkers (retention time/mass pairs) responsible for the chemical differences between the species clusters. Table 4 lists retention time/mass pairs and corresponding tentative identities of the compounds.
According to the OPLS-DA model, all commercial products were predicted well within the model boundary. Due to the close proximity of the commercial products to the P. ginseng and P. quinquefolius cluster, it is clear that most of the products are predicted to contain either of the two species ( Figure 7B). With the aid of the classification list (Table 5), it was possible to confirm the prediction of product 6a as P. quinquefolius and product 8a as borderline P. quinquefolius. Five of the eight products were predicted as borderline P. ginseng (1a, 3a, 4a, 5a and 7a) and product 2a was predicted as borderline E. senticosus. Using UHPLC-MS it was not possible to obtain definite predictions for most of the products as the YPredPS values for these were borderline however, it is clear that the majority of the products contain either P. ginseng or P. quinquefolius. UHPLC-MS demonstrated similar clustering patterns for both raw material differentiation and prediction of commercial products compared to the spectroscopy results. Even though UHPLC-MS is a very sensitive and accurate analytical method, it is difficult to distinguish between the Panax species due to the presence of many ginsenosides in almost all the species. Vibrational spectroscopy can successfully offer the same functionality at a fraction of the price. Images were captured at a frame rate of 100 Hz and 3.5 ms exposure time as the samples entered the field of view. Images with a pixel size of 256ˆ320 and a pixel depth of 14 bits/pixel were acquired. Each sample composed of approximately 14750 pixels. Internal dark and white references were used for image calibration and to correct for variation in sample illumination.

Mid-Infrared (MIR) Spectroscopy Analysis
The MIR spectra of the reference material and commercial products and were recorded in the wavenumber range 4000-550 cm´1 on an Alpha-P Bruker FT-IR spectrometer mounted with an ATR diamond crystal (Bruker OPTIK GmbH, Ettlingen, Germany). The operating software used was OPUS ® 6.5 (Bruker OPTIK GmbH). Spectra were obtained by placing a small aliquot of the powder directly onto the surface of the ATR diamond crystal. A plunger was used to press down the powder so that it was in direct contact with the crystal. A total of 32 scans were accumulated and averaged to produce an average spectrum for each sample. All measurements were performed in the absorbance mode and a spectral resolution of 4 cm´1 was used [15]. Chemometric analysis of the data was performed using SIMCA ® -P+ 14.0 (Umetrics AB, Malmo, Sweden).

Near-Infrared Spectroscopy Analysis
The powdered ginseng reference and product samples were packed in Chromacol ® vials (Chromacol, Sigma-Aldrich, Johannesburg, South Africa) and placed in the sample holder of a NIRFlex N500 (solids cell) FT-NIR spectrophotometer (Buchi Labortechnik AG, Flawil, Switzerland). The reflectance spectra were recorded in the wavenumber region of 10,000-4000 cm´1 with a spectral resolution of 4 cm´1 using NIRWare software (Version 1.2, Buchi Labortechnik AG, Flawil, Switzerland). A total of 32 scans were accumulated and averaged to produce a spectrum for each sample. The reflectance readings were converted to absorbance using the formula A = log (1/R) where A = Absorbance and R = reflectance, before exporting the data into SIMCA ® -P+ 14.0 (Umetrics AB, Malmo, Sweden) for chemometric analyses [8].

Ultra-High Performance Liquid Chromatography-Mass Spectrometry Analysis
UHPLC analyses were performed on a Waters Acquity Ultra Performance Liquid Chromatographic system with PDA detector (Waters Corporation, Milford, MA, USA). To achieve chromatograms with better resolution in a short analysis time, the chromatographic conditions were optimised in a preliminary test. An Acquity UHPLC BEH C 18 column (150 mmˆ2.1 mm, i.d., 1.7 µm particle size, Waters) was used and maintained at 40˝C. The mobile phase consisted of 0.1% formic acid in water (solvent A) and acetonitrile (solvent B) at a flow rate of 0.3 mL/min; a gradient elution was as follow: 95% A: 5% B, keeping for 0.5 min, to 85% A: 15% B in 0.5 min, to 72% A: 28% B in 0.5 min, changed to 71% A: 29% B in 4 min, to 55% A: 45% B in 0.5 min, to 45% A: 55% B in 4 min, changed to 7% A: 93% B in 1 min, final changed to 100% B in 1.5 min, keeping for 1.5 min and back to initial ratio in 0.5 min. The total runtime was 16 min. The injection volume was 2.0 µL (full-loop injection). Data were collected by chromatographic software Masslynx ® 4.1 (Waters Corporation, Milford, MA, USA). The UHPLC system was interfaced with a Xevo G 2 QTof mass spectrometer (Waters). The same column, elution gradient and flow rate were used during the UHPLC-MS analysis. Both positive and negative ion modes were investigated, and the results showed that higher sensitivity and more information were obtained in the negative mode. Thus, the mass spectrometer was operated in negative ion electrospray mode. N 2 was used as the desolvation gas. The desolvation temperature was set to 350˝C at a flow rate of 600 L/h and the source temperature was 100˝C. The capillary and cone voltages were set to 2800 and 60 V, respectively. Data were collected between 100 and 1500 m/z [16]. Compound identification was performed through elementary composition to establish the possible molecular formula and comparison with literature.

Hyperspectral Imaging Data Analysis
The hypercubes were analysed using multivariate image analysis software (Evince ® Version 2.4.0, UmBio AB, Umea, Sweden). Following conversion of reflectance values to pseudo-absorbance, PCA was performed on the data. The largest variance was observed between the background and the samples and thus image clean-up was performed by removing pixels corresponding to the background and edge effects. To improve model statistics and obtain a better visualisation of chemical differences between the species, different spectral filtering methods such as standard normal variate (SNV), multiplicative scatter correction (MSC) and Savitzky-Golay, were investigated [17]. To determine the optimum number of principal components (PCs) for the model, PCs were manually added and the Q 2 value evaluated. The results of image analysis were presented in a score image, which displays chemical differences between species based on colour amplitude. The corresponding scatter plot indicates pixel distribution and can be coloured according to the amplitude or score values. Following PCA, image classification modelling using PLS-DA and OPLS-DA was performed by selecting the root image of each species in the PCA score image and assigning it to a class (Class 1: P. ginseng; Class 2: P. pseudoginseng; Class 3: P. quinquefolius; Class 4: E. senticosus). Full cross validation of the model was performed using the random method and the Q 2 value was evaluated. PLS factors were added to the model and the optimum number was determined by excluding factors that did not significantly increase the value of the Q 2 cum in the model. A PLS-DA score image (Y image) was then used for class prediction of the commercial products and test set reference samples. The PLS-DA model predicts class membership by correlating pixels in the test and commercial samples to any of the four classes as per training set. The prediction table was assessed to check the pixel percentage in each image that are correlated to a specific species and these were coloured according to the predicted class in the image.

Spectral and Chromatographic Data Analysis
Chemometric analysis of MIR, NIR and UHPLC-MS data was performed using SIMCA ® -P+ 14.0 (Umetrics AB, Malmo, Sweden). The holistic analysis of UHPLC-MS chromatographic data was performed through pre-processing of the data using MarkerLynx ® 4.1 software (Waters Corporation, Milford, MA, USA). Pre-processing involved peak detection, baseline correction, noise elimination and spectral alignment where molecular fragments with corresponding retention times were aligned across the whole sample set. The resulting aligned data were then listed together with the corresponding intensities across all samples and was subjected to cluster analysis in SIMCA-P ® + 14. As with HSI, PCA was initially applied to the data to investigate chemical variation between the species. Two-dimensional score plots were used as maps of observations to show the distribution of the species based on chemical differences or similarities between them. The loadings line plot was used to identify variables (retention time/mass pairs or wavenumbers) that contribute to the clustering patterns observed in the scores plot. Following PCA, PLS-DA and OPLS-DA were performed on the dataset after assigning a class to each species as previously described for PLS-DA of HSI. Based on these classification models, the botanical materials present in the commercial products were determined by class prediction of the commercial products. This was achieved by introducing the commercial samples as an external test set into the model. Using 2D prediction score plots, it was possible to visualise the predicted class of the commercial samples based on its distribution and proximity to the authentic reference sample. To statistically confirm the visual results, a classification table was obtained which showed a list of predicted Y-values (YPredPS) for each sample in each class/species. Class membership was defined as follows; YPredPS < 0.35; observation does not belong to the class; 0.35 < YPredPS < 0.65; observation is borderline; YPredPS > 0.65; observation belongs to the class [17].

Conclusions
The present study demonstrates the application of HSI, MIR and NIR spectroscopies as fast, non-destructive, holistic approaches in the qualitative differentiation of ginseng raw materials. The three techniques were able to distinguish between (1) E. senticosus and the three Panax species; and (2) between P. pseudoginseng and P. ginseng together with P. quinquefolius. P. ginseng is chemically most similar to P. quinquefolius. These results were consistent for all three vibrational spectroscopy techniques (NIR, MIR and HSI) as well as the UHPLC-MS results. It was possible to use any of the methods to distinguish the raw material of the four different botanical species. This provides fast techniques that in addition to their non-destructive nature, allow for bulk analysis of raw material, thus saving time. The prediction models classified all the commercial products as ginseng products containing either P. ginseng or P. quinquefolius in higher proportions, as did the UHPLC-MS method. This highlights the labelling error noted for products 3 and 6. It is evident that fast and accurate quality control protocols must be developed for botanical raw materials and commercial products.