A Machine-Learning-Based Prediction Model for Total Glycoalkaloid Accumulation in Yukon Gold Potatoes
Abstract
1. Introduction
- Explore the capabilities of deploying SWIR hyperspectral imaging to non-destructively estimate TGA levels in YG potatoes.
- Apply chemometric and regression techniques such as PLSR and SVMR to develop predictive models.
- Study the effectiveness of feature selection tools (CARS and IRIV) to improve model performance.
2. Materials and Methods
2.1. Chemicals and Reagents
2.2. Sample Collection and Preparation
2.3. Hyperspectral System and Image Acquisition
2.4. Extraction and Purification of Total Glycoalkaloids (TGA)
2.5. HPLC Analysis of TGA
2.6. Spectral Pre-Processing and Data Analysis
2.7. Dimensionality Reduction
2.7.1. CARS and Backward Elimination
2.7.2. Iteratively Retaining Informative Variables (IRIV)
2.7.3. Prediction and Model Metrics
3. Results and Discussion
3.1. Total Glycoalkaloids in Tuber Samples
3.2. Spectral Analysis
3.3. Full Spectrum Model
3.4. Model Development with Feature Wavelengths
Variable Selection Using CARS and BE
Variable Selection Using IRIV and BE
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Statistics Canada. Canadian Potato Production. 2024. Available online: http://www150.statcan.gc.ca/n1/en/daily-quotidien/241205/dq241205f-eng.pdf?st=3M_1nkFg (accessed on 17 August 2025).
- Canadian Food Inspection Agency Yukon Gold. Available online: https://inspection.canada.ca/en/plant-health/potatoes/potato-varieties/yukon-gold (accessed on 18 August 2025).
- Degebasa, A.C. Prospects and Challenges of Postharvest Losses of Potato (Solanum Tuberosum L.) in Ethiopia. Glob. J. Nutr. Food Sci. 2020, 2. [Google Scholar] [CrossRef]
- Zao, X.; Li, W.; Cheng, L.; Yu, B.; Sa, G. Physiological and Molecular Mechanisms of Light-Induced Greening in Potatoes: A Path to Food Safety. Foods 2025, 14, 1798. [Google Scholar] [CrossRef]
- Dhalsamant, K.; Singh, C.B.; Lankapalli, R. A Review on Greening and Glycoalkaloids in Potato Tubers: Potential Solutions. J. Agric. Food Chem. 2022, 70, 13819–13831. [Google Scholar] [CrossRef] [PubMed]
- Hansen, S.H.; Reubsaet, L. High-Performance Liquid Chromatography (HPLC) and High-Performance Liquid Chromatography-Mass Spectrometry (LC-MS). Bioanalysis of Pharmaceuticals: Sample Preparation, Chromatography and Mass Spectrometry. In Bioanalysis of Pharmaceuticals; Wiley: Hoboken, NJ, USA, 2015; pp. 123–172. [Google Scholar] [CrossRef]
- Thimsen, E.; Sadtler, B.; Berezin, M.Y. Shortwave-Infrared (SWIR) Emitters for Biological Imaging: A Review of Challenges and Opportunities. Nanophotonics 2017, 6, 1043–1054. [Google Scholar] [CrossRef]
- Senthilkumar, T.; Jayas, D.S.; White, N.D.G.; Fields, P.G.; Gräfenhan, T. Detection of Fungal Infection and Ochratoxin A Contamination in Stored Barley Using Near-Infrared Hyperspectral Imaging. Biosyst. Eng. 2016, 147, 162–173. [Google Scholar] [CrossRef]
- Saha, D.; Senthilkumar, T.; Sharma, S.; Singh, C.B.; Manickavasagan, A. Application of Near-Infrared Hyperspectral Imaging Coupled with Chemometrics for Rapid and Non-Destructive Prediction of Protein Content in Single Chickpea Seed. J. Food Compos. Anal. 2023, 115, 104938. [Google Scholar] [CrossRef]
- Kharbach, M.; Alaoui Mansouri, M.; Taabouz, M.; Yu, H. Current Application of Advancing Spectroscopy Techniques in Food Analysis: Data Handling with Chemometric Approaches. Foods 2023, 12, 2753. [Google Scholar] [CrossRef]
- Kamboj, U.; Guha, P.; Mishra, S. Comparison of PLSR, MLR, SVM Regression Methods for Determination of Crude Protein and Carbohydrate Content in Stored Wheat Using near Infrared Spectroscopy. Mater Today Proc. 2021, 48, 576–582. [Google Scholar] [CrossRef]
- Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature Selection. ACM Comput. Surv. 2017, 50, 1–14. [Google Scholar] [CrossRef]
- Sha, C.; Hou, J.; Cui, H. A Robust 2D Otsu’s Thresholding Method in Image Segmentation. J. Vis. Commun. Image Represent. 2016, 41, 339–351. [Google Scholar] [CrossRef]
- Su, H. Dimensionality Reduction for Hyperspectral Remote Sensing: Advances, Challenges and Prospects. Natl. Remote Sens. Bull. 2022, 26, 1504–1529. [Google Scholar] [CrossRef]
- Yang, D.; Hu, J. A Detection Method of Oil Content for Maize Kernels Based on CARS Feature Selection and Deep Sparse Autoencoder Feature Extraction. Ind. Crops Prod. 2024, 222, 119464. [Google Scholar] [CrossRef]
- Jovic, A.; Brkić, K.; Bogunović, N. A Review of Feature Selection Methods with Applications. In Proceedings of the 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015, Opatija, Croatia, 25–29 May 2015; pp. 1200–1205. [Google Scholar] [CrossRef]
- Visalakshi, S.; Radha, V. A Literature Review of Feature Selection Techniques and Applications: Review of Feature Selection in Data Mining. In Proceedings of the 2014 IEEE International Conference on Computational Intelligence and Computing Research, Coimbatore, India, 18–20 December 2014. [Google Scholar] [CrossRef]
- Yun, Y.H.; Wang, W.T.; Tan, M.L.; Liang, Y.Z.; Li, H.D.; Cao, D.S.; Lu, H.M.; Xu, Q.S. A Strategy That Iteratively Retains Informative Variables for Selecting Optimal Variable Subset in Multivariate Calibration. Anal. Chim. Acta 2014, 807, 36–43. [Google Scholar] [CrossRef] [PubMed]
- Giussani, B.; Lu, Y.; Yoon, S.-C.; Yao, H.; Wang, F.; Wang, C.; Song, S. Rapid and Low-Cost Detection of Millet Quality by Miniature Near-Infrared Spectroscopy and Iteratively Retaining Informative Variables. Foods 2022, 11, 1841. [Google Scholar] [CrossRef]
- Wold, S.; Sjöström, M.; Eriksson, L. PLS-Regression: A Basic Tool of Chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
- Awad, M.; Khanna, R. Support Vector Regression. Effic. Learn. Mach. 2015, 67–80. [Google Scholar] [CrossRef]
- Wong, T.T. Performance Evaluation of Classification Algorithms by K-Fold and Leave-One-out Cross Validation. Pattern Recognit. 2015, 48, 2839–2846. [Google Scholar] [CrossRef]
- Friedman, M. Potato Glycoalkaloids and Metabolites: Roles in the Plant and in the Diet. J. Agric. Food Chem. 2006, 54, 8655–8681. [Google Scholar] [CrossRef]
- Machado, R.M.D.; Toledo, M.C.F.; Garcia, L.C. Effect of Light and Temperature on the Formation of Glycoalkaloids in Potato Tubers. Food Control 2007, 18, 503–508. [Google Scholar] [CrossRef]
- Percival, G.; Dixon, G.R. Glycoalkaloid Concentrations in Aerial Tubers of Potato (Solanum Tuberosum L). J. Sci. Food Agric. 1996, 70, 439–448. [Google Scholar] [CrossRef]
- Rymuza, K.; Gugała, M.; Zarzecka, K.; Sikorska, A.; Findura, P.; Malaga-Toboła, U.; Kapela, K.; Radzka, E. The Effect of Light Exposures on the Content of Harmful Substances in Edible Potato Tuber. Agriculture 2020, 10, 139. [Google Scholar] [CrossRef]
- Seki, H.; Ma, T.; Murakami, H.; Tsuchikawa, S.; Inagaki, T. Visualization of Sugar Content Distribution of White Strawberry by Near-Infrared Hyperspectral Imaging. Foods 2023, 12, 931. [Google Scholar] [CrossRef]
- Shurvell, H.F. Spectra—Structure Correlations in the Mid- and Far-Infrared. In Handbook of Vibrational Spectroscopy; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar] [CrossRef]
- Nepal, B.; Stine, K.J. Glycoalkaloids: Structure, Properties, and Interactions with Model Membrane Systems. Processes 2019, 7, 513. [Google Scholar] [CrossRef]
- Curra, A.; Gasbarrone, R.; Bonifazi, G.; Serranti, S.; Fattapposta, F.; Trompetto, C.; Marinelli, L.; Missori, P.; Lendaro, E. Near-Infrared Transflectance Spectroscopy Discriminates Solutions Containing Two Commercial Formulations of Botulinum Toxin Type A Diluted at Recommended Volumes for Clinical Reconstitution. Biosensors 2022, 12, 216. [Google Scholar] [CrossRef] [PubMed]
- Mizushima, M.; Kawamura, T.; Takahashi, K.; Nitta, K.H. In Situ Near-Infrared Spectroscopic Studies of the Structural Changes of Polyethylene during Melting. Polym. J. 2012, 44, 162–166. [Google Scholar] [CrossRef]
- Ghisalberti, E.L. Steroidal Glycoalkaloids: Isolation, Structure, Analysis, and Biosynthesis. Nat. Prod. Commun. 2006, 1, 1934578X0600101007. [Google Scholar] [CrossRef]
- Kjaer, A.; Nielsen, G.; Stærke, S.; Clausen, M.R.; Edelenbos, M.; Jørgensen, B. Detection of Glycoalkaloids and Chlorophyll in Potatoes (Solanum Tuberosum L.) by Hyperspectral Imaging. Am. J. Potato Res. 2017, 94, 573–582. [Google Scholar] [CrossRef]
- Tilahun, S.; An, H.S.; Hwang, I.G.; Choi, J.H.; Baek, M.W.; Choi, H.R.; Park, D.S.; Jeong, C.S. Prediction of α-Solanine and α-Chaconine in Potato Tubers from Hunter Color Values and VIS/NIR Spectra. J. Food. Qual. 2020, 2020, 8884219. [Google Scholar] [CrossRef]
Model | Pre-Processing | LV | R2CV | RMSEcv (ppm) |
---|---|---|---|---|
None | 10 | 0.457 | 72.1 | |
PLSR | SNV + 1st Derivative | 10 | 0.366 | 77.9 |
1st Derivative | 10 | 0.366 | 77.9 | |
None | - | 0.168 | 89.4 | |
SVMR | SNV + 1st Der | - | 0.417 | 74.7 |
1st Derivative | - | 0.418 | 74.6 |
Model | Pre-Processing Technique | No. of Selected Wavelengths | LV | R2CV | RMSEcv (%) | Wavelengths (nm) |
---|---|---|---|---|---|---|
CARS- PLSR | OSC + 1st Derivative | 32 | 15 | 0.707 | 52.9 | 1166.3 1469.4 1478.8 1554.6 1649.3 1725.1 1753.5 1762.9 1791.4 1819.8 1848.2 1867.1 1876.6 1895.5 1914.5 1933.4 1971.3 2047.1 2075.5 2085 2141.8 2151.2 2179.7 2227 2236.5 2274.4 2283.8 2302.8 2312.2 2369.1 2378.5 2435.4 |
IRIV- PLSR | 1st Derivative | 23 | 15 | 0.571 | 64.1 | 1242.1 1403.1 1431.5 1469.4 1478.8 1583 1630.4 1649.3 1810.3 1819.8 1895.5 1933.4 1971.3 2009.2 2122.8 2141.8 2189.1 2255.4 2293.3 2312.2 2378.5 2416.4 2435.4 |
Method | Pre-Processing Technique | No. of Selected Wavelengths | Wavelengths (nm) |
---|---|---|---|
None | 14 | 1033.7 1043.2 1062.1 1081.1 1128.4 1630.4 1829.2 1848.2 1857.7 2227 2293.3 2312.2 2388 2482.7 | |
SNV + 1st Derivative | 31 | 1289.4 1403.1 1450.4 1469.4 1554.6 1649.3 1725.1 1753.5 1762.9 1819.8 1848.2 1867.1 1876.6 1895.5 1914.5 1971.3 2047.1 2075.5 2085 2132.3 2141.8 2151.2 2160.7 2236.5 2274.4 2283.8 2312.2 2369.1 2378.5 2435.4 2482.7 | |
CARS | 1st Derivative | 31 | 1166.3 1450.4 1554.6 1649.3 1687.2 1725.1 1753.5 1762.9 1819.8 1848.2 1867.1 1876.6 1895.5 1914.5 1971.3 2047.1 2075.5 2085 2103.9 2141.8 2151.2 2179.7 2227 2236.5 2274.4 2283.8 2312.2 2331.2 2369.1 2378.5 2435.4 |
OSC + 1st Derivative | 26 | 1166.3 1469.4 1554.6 1649.3 1725.1 1753.5 1762.9 1819.8 1848.2 1867.1 1876.6 1895.5 1971.3 2047.1 2075.5 2085 2141.8 2151.2 2179.7 2227 2236.5 2274.4 2283.8 2369.1 2378.5 2435.4 | |
None | 8 | 1431.5 1488.3 1781.9 1791.4 1810.3 1829.2 2227 2274.4 | |
IRIV | SNV + 1st Derivative | 14 | 1289.4 1469.4 1478.8 1630.4 1649.3 1810.3 1819.8 1971.3 2141.8 2151.2 2312.2 2369.1 2378.5 2435.4 |
1st Derivative | 16 | 1242.1 1403.1 1431.5 1469.4 1478.8 1583 1630.4 1649.3 1810.3 1819.8 1971.3 2141.8 2189.1 2312.2 2378.5 2435.4 | |
|1st derivative| + SNV | 22 | 1033.7 1166.3 1223.1 1242.1 1289.4 1374.7 1412.5 1450.4 1649.3 1791.4 1933.4 1952.4 2047.1 2085 2122.8 2141.8 2160.7 2198.6 2293.3 2350.1 2378.5 2388 |
Model | Pre-Processing | R2CV | RMSEcv (ppm) | R2C | RMSEc (ppm) | R2p | RMSEp (ppm) | RPD |
---|---|---|---|---|---|---|---|---|
None | 0.603 | 61.7 | 0.67 | 55.8 | 0.568 | 62.8 | 1.57 | |
SNV + 1st Derivative | 0.714 | 52.3 | 0.817 | 41.55 | 0.661 | 54.82 | 1.79 | |
PLSR | 1st Derivative | 0.723 | 51.5 | 0.822 | 40.84 | 0.671 | 54.12 | 1.81 |
OSC + 1st derivative | 0.721 | 51.7 | 0.824 | 40.88 | 0.642 | 40.88 | 1.76 | |
None | 0.478 | 70.7 | 0.791 | 44.4 | 0.227 | 80.96 | 1.21 | |
SNV+ 1st Der | 0.698 | 53.7 | 0.807 | 42.63 | 0.618 | 58.55 | 1.67 | |
SVMR | 1st Derivative | 0.693 | 54.2 | 0.764 | 47.1 | 0.649 | 56.41 | 1.73 |
OSC+ 1st derivative | 0.711 | 52.6 | 0.768 | 46.6 | 0.644 | 57.10 | 1.71 |
Model | Pre-Processing | R2CV | RMSEcv (ppm) | R2C | RMSEc (ppm) | R2p | RMSEp (ppm) | RPD |
---|---|---|---|---|---|---|---|---|
None | 0.481 | 70.5 | 0.537 | 66.3 | 0.444 | 70.7 | 1.39 | |
SNV + 1st Derivative | 0.598 | 62.0 | 0.674 | 55.6 | 0.553 | 62.9 | 1.56 | |
PLSR | 1st Derivative | 0.60 | 61.9 | 0.688 | 54.45 | 0.554 | 62.9 | 1.56 |
|1st Der| + SNV | 0.634 | 59.2 | 0.740 | 49.59 | 0.584 | 60.8 | 1.61 | |
None | 0.498 | 69.4 | 0.574 | 63.6 | 0.469 | 69.5 | 1.41 | |
SNV+ 1st Der | 0.624 | 60 | 0.658 | 56.7 | 0.561 | 63.33 | 1.54 | |
SVMR | 1st Derivative | 0.587 | 62.9 | 0.654 | 57.3 | 0.563 | 62.48 | 1.57 |
OSC+ 1st derivative | 0.583 | 63.2 | 0.670 | 55.8 | 0.552 | 63.90 | 1.53 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ramalingam, S.; Singla, D.; Chowdhury, M.P.; Konschuh, M.; Singh, C.B. A Machine-Learning-Based Prediction Model for Total Glycoalkaloid Accumulation in Yukon Gold Potatoes. Foods 2025, 14, 3431. https://doi.org/10.3390/foods14193431
Ramalingam S, Singla D, Chowdhury MP, Konschuh M, Singh CB. A Machine-Learning-Based Prediction Model for Total Glycoalkaloid Accumulation in Yukon Gold Potatoes. Foods. 2025; 14(19):3431. https://doi.org/10.3390/foods14193431
Chicago/Turabian StyleRamalingam, Saipriya, Diksha Singla, Mainak Pal Chowdhury, Michele Konschuh, and Chandra Bhan Singh. 2025. "A Machine-Learning-Based Prediction Model for Total Glycoalkaloid Accumulation in Yukon Gold Potatoes" Foods 14, no. 19: 3431. https://doi.org/10.3390/foods14193431
APA StyleRamalingam, S., Singla, D., Chowdhury, M. P., Konschuh, M., & Singh, C. B. (2025). A Machine-Learning-Based Prediction Model for Total Glycoalkaloid Accumulation in Yukon Gold Potatoes. Foods, 14(19), 3431. https://doi.org/10.3390/foods14193431