Research on Apple Origins Classification Optimization Based on Least-Angle Regression in Instance Selection
Abstract
:1. Introduction
2. Samples and Spectra
3. Theory and Algorithm
3.1. LARIS
3.2. Selection Methods
3.3. Preprocessing, Decomposition and Outliers Methods
3.4. The Classifier
3.5. Evaluation Metrics
4. Results and Discussions
4.1. Spectral Analysis and Processing
4.2. Outliers
4.3. Set Split and Optimization
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- da Silva Medeiros, M.L.; Brasil, Y.L.; Cruz-Tirado, L.J.P.; Lima, A.F.; Godoy, H.T.; Barbin, D.F. Portable NIR spectrometer and chemometric tools for predicting quality attributes and adulteration levels in butteroil. Food Control 2023, 144, 109349. [Google Scholar] [CrossRef]
- Eisenstecken, D.; Stürz, B.; Robatscher, P.; Lozano, L.; Zanella, A.; Oberhuber, M. The potential of near infrared spectroscopy (NIRS) to trace apple origin: Study on different cultivars and orchard elevations. Postharvest Biol. Technol. 2019, 147, 123–131. [Google Scholar] [CrossRef]
- Li, L.; Li, B.; Jiang, X.; Liu, Y. A Standard-Free Calibration Transfer Strategy for a Discrimination Model of Apple Origins Based on Near-Infrared Spectroscopy. Agriculture 2022, 12, 366. [Google Scholar] [CrossRef]
- Grabska, J.; Beć, K.B.; Ueno, N.; Huck, C.W. Analyzing the Quality Parameters of Apples by Spectroscopy from Vis/NIR to NIR Region: A Comprehensive Review. Foods 2023, 12, 1946. [Google Scholar] [CrossRef]
- Nawar, S.; Mouazen, A. Optimal sample selection for measurement of soil organic carbon using on-line vis-NIR spectroscopy. Comput. Electron. Agric. 2018, 151, 469–477. [Google Scholar] [CrossRef]
- Huang, M.-W.; Tsai, C.-F.; Lin, W.-C. Instance selection in medical datasets: A divide-and-conquer framework. Comput. Electr. Eng. 2021, 90, 106957. [Google Scholar] [CrossRef]
- Brodinová, Š.; Filzmoser, P.; Ortner, T.; Breiteneder, C.; Rohm, M. Robust and sparse k-means clustering for high-dimensional data. Adv. Data Anal. Classif. 2019, 13, 905–932. [Google Scholar] [CrossRef]
- Lucà, F.; Conforti, M.; Castrignanò, A.; Matteucci, G.; Buttafuoco, G.J.G. Effect of calibration set size on prediction at local scale of soil carbon by Vis-NIR spectroscopy. Geoderma 2017, 288, 175–183. [Google Scholar] [CrossRef]
- Song, Y.; Liang, J.; Lu, J.; Zhao, X.J.N. An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing 2017, 251, 26–34. [Google Scholar] [CrossRef]
- Sáiz-Abajo, M.J.; Mevik, B.-H.; Segtnan, V.H.; Næs, T.J.A.C.A. Ensemble methods and data augmentation by noise addition applied to the analysis of spectroscopic data. Anal. Chim. Acta 2005, 533, 147–159. [Google Scholar] [CrossRef]
- Liu, C.; Wang, W.; Wang, M.; Lv, F.; Konan, M.J.K.B.S. An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowl. Based Syst. 2017, 116, 58–73. [Google Scholar] [CrossRef]
- Li, J.; Guo, Z.; Huang, W.; Zhang, B.; Zhao, C. Near-Infrared Spectra Combining with CARS and SPA Algorithms to Screen the Variables and Samples for Quantitatively Determining the Soluble Solids Content in Strawberry. Spectrosc. Spectr. Anal. 2015, 35, 372–378. [Google Scholar]
- Galvao, R.K.; Araujo, M.C.; José, G.E.; Pontes, M.J.; Silva, E.C.; Saldanha, T.C. A method for calibration and validation subset partitioning. Talanta Int. J. Pure Appl. Anal. Chem. 2005, 67, 736–740. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Sun, H.; Zhu, W.; Ren, Q. Segmentation training data selection method based on K-means clustering. Appl. Res. Comput. 2021, 38, 1683–1688. [Google Scholar]
- Kim, S.W.; Oommen, B.J. A brief taxonomy and ranking of creative prototype reduction schemes. Pattern Anal. Appl. 2003, 6, 232–244. [Google Scholar] [CrossRef]
- Shen, X.-j.; Mu, L.; Li, Z.; Wu, H.; Gou, J.; Chen, X.J.N. Large-scale support vector machine classification with redundant data reduction. Neurocomputing 2016, 172, 189–197. [Google Scholar] [CrossRef]
- de Haro-García, A.; Cerruela-García, G.; García-Pedrajas, N. Instance selection based on boosting for instance-based learners. Pattern Recognit. 2019, 96, 106959. [Google Scholar] [CrossRef]
- García-Pedrajas, N.; de Haro-García, A. Boosting instance selection algorithms. Knowl.-Based Syst. 2014, 67, 342–360. [Google Scholar] [CrossRef]
- Cavalcanti, G.D.C.; Soares, R.J.O. Ranking-based instance selection for pattern classification. Expert Syst. Appl. 2020, 150, 113269. [Google Scholar] [CrossRef]
- Pereira, C.d.S.; Cavalcanti, G.D.C. Instance selection algorithm based on a Ranking Procedure. In Proceedings of the 2011 International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 2409–2416. [Google Scholar]
- Djouzi, K.; Beghdad-Bey, K.; Amamra, A. A new adaptive sampling algorithm for big data classification. J. Comput. Sci. 2022, 61, 101653. [Google Scholar] [CrossRef]
- Zhao, X.; Zhao, X.; Zhu, Q.; Huang, M. A Model Construction Method of Spectral Nondestructive Detection for Apple Quality Based on Unsupervised Active Learning. Spectrosc. Spectr. Anal. 2022, 42, 282–291. [Google Scholar]
- Saha, S.; Sarker, P.S.; Saud, A.A.; Shatabda, S.; Hakim Newton, M.A. Cluster-oriented instance selection for classification problems. Inf. Sci. 2022, 602, 143–158. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Efron, B.; Hastie, T.J.; Johnstone, I.M.; Tibshirani, R. Least angle regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef]
- Zhang, F.; Zhang, R.; Wang, W.; Yang, W.; Li, L.; Xiong, Y.; Kang, Q.; Du, Y. Ridge regression combined with model complexity analysis for near infrared (NIR) spectroscopic model updating. Chemom. Intell. Lab. Syst. 2019, 195, 103896. [Google Scholar] [CrossRef]
- Belmerhnia, L.; Djermoune, E.-H.; Carteret, C.; Brie, D. Simultaneous variable selection for the classification of near infrared spectra. Chemom. Intell. Lab. Syst. 2021, 211, 104268. [Google Scholar] [CrossRef]
- Lemaigre, S.; Adam, G.; Goux, X.; Noo, A.; De Vos, B.; Gerin, P.A.; Delfosse, P. Transfer of a static PCA-MSPC model from a steady-state anaerobic reactor to an independent anaerobic reactor exposed to organic overload. Chemom. Intell. Lab. Syst. 2016, 159, 20–30. [Google Scholar] [CrossRef]
- Zang, H.; Wang, J.; Li, L.; Zhang, H.; Jiang, W.; Wang, F. Application of near-infrared spectroscopy combined with multivariate analysis in monitoring of crude heparin purification process. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2013, 109, 8–13. [Google Scholar] [CrossRef]
- Godoy, J.L.; Vega, J.R.; Marchetti, J.L. A fault detection and diagnosis technique for multivariate processes using a PLS-decomposition of the measurement space. Chemom. Intell. Lab. Syst. 2013, 128, 25–36. [Google Scholar] [CrossRef]
- Li, G.; Qin, S.J.; Zhou, D. Geometric properties of partial least squares for process monitoring. Automatica 2010, 46, 204–210. [Google Scholar] [CrossRef]
- Huang, Y.; Bais, A. A novel PCA-based calibration algorithm for classification of challenging laser-induced breakdown spectroscopy soil sample data. Spectrochim. Acta Part B At. Spectrosc. 2022, 193, 106451. [Google Scholar] [CrossRef]
- Tarekegn, A.N.; Giacobini, M.; Michalak, K. A review of methods for imbalanced multi-label classification. Pattern Recognit. 2021, 118, 107965. [Google Scholar] [CrossRef]
- Ramentol, E.; Vluymans, S.; Verbiest, N.; Caballero, Y.; Bello, R.; Cornelis, C.; Herrera, F. IFROWANN: Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor Classification. IEEE Trans. Fuzzy Syst. 2015, 23, 1622–1637. [Google Scholar] [CrossRef]
- Keskes, N.; Fakhfakh, S.; Kanoun, O.; Derbel, N. Representativeness consideration in the selection of classification algorithms for the ECG signal quality assessment. Biomed. Signal Process. Control 2022, 76, 103686. [Google Scholar] [CrossRef]
- Caliński, T.; Ja, H. A Dendrite Method for Cluster Analysis. Commun. Stat. Theory Methods 1974, 3, 1–27. [Google Scholar] [CrossRef]
Preprocessing Methods | ACC_CV | F-Measure | |||
---|---|---|---|---|---|
Class 1 | Class 2 | Class 3 | Class 4 | ||
None | 55.57% | 58.13% | 69.26% | 26.03% | 56.34% |
MSC | 65.65% | 65.58% | 80.46% | 48.52% | 98.02% |
S-G(0) | 56.09% | 58.55% | 69.12% | 23.61% | 59.31% |
S-G(0) + MSC | 66.49% | 66.25% | 80.46% | 48.52% | 69.26% |
S-G(1) | 57.46% | 60.20% | 67.42% | 14.50% | 69.71% |
S-G(1) + MSC | 71.85% | 69.70% | 89.69% | 44.72% | 78.42% |
S-G(2) | 46.43% | 53.48% | 54.92% | 11.93% | 45.36% |
S-G(2) + MSC | 64.92% | 64.08% | 96.22% | 38.07% | 42.07% |
PCA decomposition | 76.37% | 73.15% | 78.63% | 60.26% | 94% |
PCA decomposition + MSC | 29.41% | 58.83% | — | — | — |
PLS decomposition | 96.43% | 96.27% | 94.67% | 95.82% | 99.50% |
PLS decomposition + MSC | 100% | 100.00% | 100.00% | 100.00% | 100.00% |
Methods | Parameter | Class | Training Set Size | ACC_CV | CH | IR |
---|---|---|---|---|---|---|
None | — | 210/175/177/137 | 699 | 90.13% | 38.15 | 1.53 |
LARIS | — | 155/134/122/100 | 511 | 90.32% | 31.71 | 1.55 |
RS | Train_size = 511/699 | 147/127/133/104 | 511 | 88.45% | 51.61 | 1.41 |
KS | Train_size = 0.2 × 699 | 33/51/39/17 | 140 | 89.29% | 11.77 | 3 |
SPXY | Train_size = 0.9 × 699 | 186/165/165/113 | 629 | 90.14% | 31.37 | 1.65 |
SSK | n_neighbors = 22 | 107/118/146/82 | 453 | 91.61% | 38.29 | 1.78 |
Testing set | — | 68/59/45/62 | 234 | — | 27.83 | 1.51 |
Training Set | ACC_P | F-Measure | |||
---|---|---|---|---|---|
Class 1 | Class 2 | Class 3 | Class 4 | ||
None | 91.88% | 90.52% | 86.44% | 90.91% | 99.20% |
LARIS | 96.58% | 96.29% | 94.22% | 95.45% | 100% |
RS | 90.17% | 88.89% | 83.60% | 88.64% | 99.19% |
KS | 92.31% | 89.39% | 90.32% | 92.30% | 97.52% |
SPXY | 94.02% | 92.43% | 89.43% | 94.38% | 100% |
SSK | 89.32% | 86.61% | 83.08% | 89.89% | 98.36% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, B.; Wang, Y.; Li, L.; Liu, Y. Research on Apple Origins Classification Optimization Based on Least-Angle Regression in Instance Selection. Agriculture 2023, 13, 1868. https://doi.org/10.3390/agriculture13101868
Li B, Wang Y, Li L, Liu Y. Research on Apple Origins Classification Optimization Based on Least-Angle Regression in Instance Selection. Agriculture. 2023; 13(10):1868. https://doi.org/10.3390/agriculture13101868
Chicago/Turabian StyleLi, Bin, Yuqi Wang, Lisha Li, and Yande Liu. 2023. "Research on Apple Origins Classification Optimization Based on Least-Angle Regression in Instance Selection" Agriculture 13, no. 10: 1868. https://doi.org/10.3390/agriculture13101868
APA StyleLi, B., Wang, Y., Li, L., & Liu, Y. (2023). Research on Apple Origins Classification Optimization Based on Least-Angle Regression in Instance Selection. Agriculture, 13(10), 1868. https://doi.org/10.3390/agriculture13101868