Next Article in Journal
Kinetic Isotope Effects and Hydrogen Tunnelling in PCET Oxidations of Ascorbate: New Insights into Aqueous Chemistry?
Previous Article in Journal
Amyloids: Regulators of Metal Homeostasis in the Synapse
Previous Article in Special Issue
Investigation of a Medical Plant for Hepatic Diseases with Secoiridoids Using HPLC and FT-IR Spectroscopy for a Case of Gentiana rigescens
Open AccessArticle

Discrimination of Gentiana and Its Related Species Using IR Spectroscopy Combined with Feature Selection and Stacked Generalization

by Tao Shen 1,2,3, Hong Yu 1,2,* and Yuan-Zhong Wang 4
1
Yunnan Herbal Laboratory, Institute of Herb Biotic Resources, School of Life and Sciences, Yunnan University, Kunming 650091, China
2
The International Joint Research Center for Sustainable Utilization of Cordyceps Bioresources in China (Yunnan) and Southeast Asia, Yunnan University, Kunming 650091, China
3
College of Chemistry, Biological and Environment, Yuxi Normal University, Yu’xi 653100, China
4
Medicinal Plants Research Institute, Yunnan Academy of Agricultural Sciences, Kunming 650200, China
*
Author to whom correspondence should be addressed.
Academic Editors: Christian Huck and Krzysztof Bec
Molecules 2020, 25(6), 1442; https://doi.org/10.3390/molecules25061442 (registering DOI)
Received: 16 February 2020 / Revised: 15 March 2020 / Accepted: 20 March 2020 / Published: 23 March 2020
(This article belongs to the Special Issue Perspectives in Near Infrared Spectroscopy and Related Techniques)
Gentiana, which is one of the largest genera of Gentianoideae, most of which had potential pharmaceutical value, and applied to local traditional medical treatment. Because of the phytochemical diversity and difference of bioactive compounds among species, which makes it crucial to accurately identify authentic Gentiana species. In this paper, the feasibility of using the infrared spectroscopy technique combined with chemometrics analysis to identify Gentiana and its related species was studied. A total of 180 batches of raw spectral fingerprints were obtained from 18 species of Gentiana and Tripterospermum by near-infrared (NIR: 10,000–4000 cm−1) and Fourier transform mid-infrared (MIR: 4000–600 cm−1) spectrum. Firstly, principal component analysis (PCA) was utilized to explore the natural grouping of the 180 samples. Secondly, random forests (RF), support vector machine (SVM), and K-nearest neighbors (KNN) models were built while using full spectra (including 1487 NIR variables and 1214 FT-MIR variables, respectively). The MIR-SVM model had a higher classification accuracy rate than the other models that were based on the results of the calibration sets and prediction sets. The five feature selection strategies, VIP (variable importance in the projection), Boruta, GARF (genetic algorithm combined with random forest), GASVM (genetic algorithm combined with support vector machine), and Venn diagram calculation, were used to reduce the dimensions of the data variable in order to further reduce numbers of variables for modeling. Finally, 101 NIR and 73 FT-MIR bands were selected as the feature variables, respectively. Thirdly, stacking models were built based on the optimal spectral dataset. Most of the stacking models performed better than the full spectra-based models. RF and SVM (as base learners), combined with the SVM meta-classifier, was the optimal stacked generalization strategy. For the SG-Ven-MIR-SVM model, the accuracy (ACC) of the calibration set and validation set were both 100%. Sensitivity (SE), specificity (SP), efficiency (EFF), Matthews correlation coefficient (MCC), and Cohen’s kappa coefficient (K) were all 1, which showed that the model had the optimal authenticity identification performance. Those parameters indicated that stacked generalization combined with feature selection is probably an important technique for improving the classification model predictive accuracy and avoid overfitting. The study result can provide a valuable reference for the safety and effectiveness of the clinical application of medicinal Gentiana. View Full-Text
Keywords: NIR; FT-MIR; species identification; Gentiana; chemometrics; feature selection; stacked generalization NIR; FT-MIR; species identification; Gentiana; chemometrics; feature selection; stacked generalization
Show Figures

Figure 1

MDPI and ACS Style

Shen, T.; Yu, H.; Wang, Y.-Z. Discrimination of Gentiana and Its Related Species Using IR Spectroscopy Combined with Feature Selection and Stacked Generalization. Molecules 2020, 25, 1442.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop