Data Augmentation and Machine Learning for Heavy Metal Detection in Mulberry Leaves Using Laser-Induced Breakdown Spectroscopy (LIBS) Spectral Data
Abstract
:1. Introduction
2. Related Work
2.1. Regression Approach
2.2. Classification Methods
- SRD introduces rank-based evaluation criteria but is less interpretable and harder to generalize beyond specific use cases [9].
2.3. Gaps and Research Questions
- Can data augmentation techniques contribute to the development of simpler and more accurate models for LIBS data?
- Are such models interpretable and robust enough for real-world applications?
3. Materials and Methods
3.1. Dataset and Preprocessing
3.2. Data Augmentation Technique
3.3. Machine Learning Models
- To assess the separability of contamination classes using spectral features—an essential step for evaluating whether accurate regression is even feasible.
- To establish a baseline for future classification-oriented LIBS studies, especially since our review revealed no prior classification models targeting Cu or Cr in mulberry leaves.
4. Results and Discussion
4.1. Evaluation of Data Augmentation Effectiveness
4.2. Feature Selection and Model Interpretability
4.3. Model Performance and Comparative Analysis
4.4. Comparison of Real and Augmented Spectra
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AAS | Atomic Absorption Spectroscopy |
ABS | Absolute Difference Between RMSEC and RMSEP |
CARS | Competitive Adaptive Reweighted Sampling |
CV | Cross-Validation |
CCFCV | Cross Computation Between Full and Characteristic Variables |
DBN | Deep Belief Networks |
GANs | Generative Adversarial Networks |
GA | Genetic Algorithms |
ICP-MS/AES | Inductively Coupled Plasma Mass/Atomic Emission Spectroscopy |
KNN | K-Nearest Neighbors |
KS | Kennard-Stone |
LIBS | Laser-Induced Breakdown Spectroscopy |
LASSO | Least Absolute Shrinkage and Selection Operator: |
LS-SVM | Least Squares Support Vector Machine |
LDA | Linear Discriminant Analysis |
LRC | Linear Regression Classifier |
LVE | Low-Intensity Variable Elimination |
MAD | Median Absolute Deviation |
MLR | Multivariate Linear Regression |
NIST | National Institute of Standards and Technology |
PLS-DA | Partial Least Squares Discriminant Analysis |
PLSR | Partial Least Squares Regression |
PCA | Principal Component Analysis |
PCR | Principal Component Regression |
R2c | Coefficient of Determination for Calibration |
R2p | Coefficient of Determination for Prediction |
RF | Random Forest |
RFr | Random Frog |
RPD | Residual Predictive Deviation |
RMSE | Root Mean Square Error |
RMSEC | Root Mean Squared Error for Calibration |
RMSEP | Root Mean Squared Error for Prediction |
SOM | Self-Organizing Maps |
SLR | Simple Linear Regression |
SPA | Successive Projection Algorithm |
SRD | Sum of Ranking Differences |
SVM | Support Vector Machine |
SVR | Support Vector Regression |
TM | Threshold Method |
TV | Threshold Variables |
UVE | Uninformative Variable Elimination |
VIP | Variable Importance Projection |
VAEs | Variational Autoencoders |
References
- Yao, M.; Yang, H.; Huang, L.; Chen, T.; Rao, G.; Liu, M. Detection of Heavy Metal Cd in Polluted Fresh Leafy Vegetables by Laser-Induced Breakdown Spectroscopy. Appl. Opt. 2017, 56, 4070. [Google Scholar] [CrossRef]
- Fu, X.; Ma, S.; Li, G.; Guo, L.; Dong, D. Rapid Detection of Chromium in Different Valence States in Soil Using Resin Selective Enrichment Coupled with Laser-Induced Breakdown Spectroscopy: From Laboratory Test to Portable Instruments. Spectrochim. Acta Part B At. Spectrosc. 2020, 167, 105817. [Google Scholar] [CrossRef]
- Ding, Y.; Xia, G.; Ji, H.; Xiong, X. Accurate Quantitative Determination of Heavy Metals in Oily Soil by Laser Induced Breakdown Spectroscopy (LIBS) Combined with Interval Partial Least Squares (IPLS). Anal. Methods 2019, 11, 3657–3664. [Google Scholar] [CrossRef]
- Ferreira, D.S.; Babos, D.V.; Lima-Filho, M.H.; Castello, H.F.; Olivieri, A.C.; Verbi Pereira, F.M.; Pereira-Filho, E.R. Laser-Induced Breakdown Spectroscopy (LIBS): Calibration Challenges, Combination with Other Techniques, and Spectral Analysis Using Data Science. J. Anal. At. Spectrom. 2024, 39, 2949–2973. [Google Scholar] [CrossRef]
- Yang, L.; Meng, L.; Gao, H.; Wang, J.; Zhao, C.; Guo, M.; He, Y.; Huang, L. Building a Stable and Accurate Model for Heavy Metal Detection in Mulberry Leaves Based on a Proposed Analysis Framework and Laser-Induced Breakdown Spectroscopy. Food Chem. 2021, 338, 127886. [Google Scholar] [CrossRef]
- Su, L.; Shi, W.; Chen, X.; Meng, L.; Yuan, L.; Chen, X.; Huang, G. Simultaneously and Quantitatively Analyze the Heavy Metals in Sargassum fusiforme by Laser-Induced Breakdown Spectroscopy. Food Chem. 2021, 338, 127797. [Google Scholar] [CrossRef]
- Wang, T.; He, M.; Shen, T.; Liu, F.; He, Y.; Liu, X.; Qiu, Z. Multi-Element Analysis of Heavy Metal Content in Soils Using Laser-Induced Breakdown Spectroscopy: A Case Study in Eastern China. Spectrochim. Acta Part B At. Spectrosc. 2018, 149, 300–312. [Google Scholar] [CrossRef]
- Shen, T.; Kong, W.; Liu, F.; Chen, Z.; Yao, J.; Wang, W.; Peng, J.; Chen, H.; He, Y. Rapid Determination of Cadmium Contamination in Lettuce Using Laser-Induced Breakdown Spectroscopy. Molecules 2018, 23, 2930. [Google Scholar] [CrossRef]
- Xu, Y.; Meng, L.; Chen, X.; Chen, X.; Su, L.; Yuan, L.; Shi, W.; Huang, G. A Strategy to Significantly Improve the Classification Accuracy of LIBS Data: Application for the Determination of Heavy Metals in Tegillarca granosa. Plasma Sci. Technol. 2021, 23, 085503. [Google Scholar] [CrossRef]
- Li, S.; Zheng, Q.; Liu, X.; Liu, P.; Yu, L. Quantitative Analysis of Pb in Soil Using Laser-Induced Breakdown Spectroscopy Based on Signal Enhancement of Conductive Materials. Molecules 2024, 29, 3699. [Google Scholar] [CrossRef]
- Baig, M.A.; Fayyaz, A.; Ahmed, R.; Umar, Z.A.; Asghar, H.; Liaqat, U.; Hedwig, R.; Kurniawan, K.H. Analytical Techniques for Elemental Analysis: LIBS, LA-TOF-MS, EDX, PIXE, and XRF: A Review. Proc. Pak. Acad. Sci. A Phys. Comput. Sci. 2024, 61, 99–112. [Google Scholar] [CrossRef]
- Fayyaz, A.; Baig, M.A.; Waqas, M.; Liaqat, U. Analytical Techniques for Detecting Rare Earth Elements in Geological Ores: Laser-Induced Breakdown Spectroscopy (LIBS), MFA-LIBS, Thermal LIBS, Laser Ablation Time-of-Flight Mass Spectrometry, Energy-Dispersive X-Ray Spectroscopy, Energy-Dispersive X-Ray Fluorescence Spectrometer, and Inductively Coupled Plasma Optical Emission Spectroscopy. Minerals 2024, 14, 1004. [Google Scholar] [CrossRef]
- Bhatt, C.R.; Sanghapi, H.K.; Yueh, F.Y.; Singh, J.P. LIBS Application to Powder Samples. In Laser-Induced Breakdown Spectroscopy; Elsevier: Amsterdam, The Netherlands, 2020; pp. 247–262. [Google Scholar]
- Xie, Z.; Meng, L.; Feng, X.; Chen, X.; Chen, X.; Yuan, L.; Shi, W.; Huang, G.; Yi, M. Identification of Heavy Metal-Contaminated Tegillarca granosa Using Laser-Induced Breakdown Spectroscopy and Linear Regression for Classification. Plasma Sci. Technol. 2020, 22, 085503. [Google Scholar] [CrossRef]
- Kim, K.-R.; Kim, G.; Kim, J.-Y.; Park, K.; Kim, K.-W. Kriging Interpolation Method for Laser Induced Breakdown Spectroscopy (LIBS) Analysis of Zn in Various Soils. J. Anal. At. Spectrom. 2014, 29, 76–84. [Google Scholar] [CrossRef]
- Huang, L.; Meng, L.; Yang, L.; Wang, J.; Li, S.; He, Y.; Wu, D. A Novel Method to Extract Important Features from Laser Induced Breakdown Spectroscopy Data: Application to Determine Heavy Metals in Mulberries. J. Anal. At. Spectrom. 2019, 34, 460–468. [Google Scholar] [CrossRef]
- Etemadi, S.; Khashei, M. Etemadi Multiple Linear Regression. Measurement 2021, 186, 110080. [Google Scholar] [CrossRef]
- Hesamian, G.; Torkian, F.; Johannssen, A.; Chukhrova, N. A Learning System-Based Soft Multiple Linear Regression Model. Intell. Syst. Appl. 2024, 22, 200378. [Google Scholar] [CrossRef]
- Song, W.; Afgan, M.S.; Yun, Y.-H.; Wang, H.; Cui, J.; Gu, W.; Hou, Z.; Wang, Z. Spectral Knowledge-Based Regression for Laser-Induced Breakdown Spectroscopy Quantitative Analysis. Expert. Syst. Appl. 2022, 205, 117756. [Google Scholar] [CrossRef]
- Han, B.; Chen, Z.; Feng, J.; Liu, Y. Identification and Classification of Metal Copper Based on Laser-Induced Breakdown Spectroscopy. J. Laser Appl. 2023, 35, 032011. [Google Scholar] [CrossRef]
- Peng, J.; Ye, L.; Liu, Y.; Zhou, F.; Xu, L.; Zhu, F.; Huang, J.; Liu, F. Characterization of the Distribution of Mineral Elements in Chromium-Stressed Rice (Oryza sativa L.) Leaves Based on Laser-Induced Breakdown Spectroscopy and Data Augmentation. Spectrochim. Acta Part B At. Spectrosc. 2024, 222, 107072. [Google Scholar] [CrossRef]
- Wu, Z.; Wu, J.; Guo, X.; Zhu, H.; Zhang, Y.; Su, X.; Chen, F.; Li, M.; Wang, R.; Xu, K.; et al. Laser-Induced Breakdown Spectroscopy Combined with Multi-Task Convolutional Neural Network for Analyzing Sm, Nd, and Gd Elements in Uranium Polymetallic Ore. Spectrochim. Acta Part B At. Spectrosc. 2025, 226, 107153. [Google Scholar] [CrossRef]
- Zhao, Y.; Lamine Guindo, M.; Xu, X.; Sun, M.; Peng, J.; Liu, F.; He, Y. Deep Learning Associated with Laser-Induced Breakdown Spectroscopy (LIBS) for the Prediction of Lead in Soil. Appl. Spectrosc. 2019, 73, 565–573. [Google Scholar] [CrossRef] [PubMed]
- Kim, G.; Kwak, J.; Kim, K.-R.; Lee, H.; Kim, K.-W.; Yang, H.; Park, K. Rapid Detection of Soils Contaminated with Heavy Metals and Oils by Laser Induced Breakdown Spectroscopy (LIBS). J. Hazard. Mater. 2013, 263, 754–760. [Google Scholar] [CrossRef] [PubMed]
- Chen, G.; Zeng, Q.; Li, W.; Chen, X.; Yuan, M.; Liu, L.; Ma, H.; Wang, B.; Liu, Y.; Guo, L.; et al. Classification of Steel Using Laser-Induced Breakdown Spectroscopy Combined with Deep Belief Network. Opt. Express 2022, 30, 9428–9440. [Google Scholar] [CrossRef]
- Ding, Y.; Zhao, M.; Shu, Y.; Hu, A.; Chen, J.; Chen, W.; Wang, Y.; Yang, L. Energy Value Measurement of Milk Powder Using Laser-Induced Breakdown Spectroscopy (LIBS) Combined with Long Short-Term Memory (LSTM). Anal. Methods 2023, 15, 4684–4691. [Google Scholar] [CrossRef]
- Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In AI 2006: Advances in Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1015–1021. [Google Scholar]
- Kotsiantis, S.B.; Zaharakis, I.D.; Pintelas, P.E. Machine Learning: A Review of Classification and Combining Techniques. Artif. Intell. Rev. 2006, 26, 159–190. [Google Scholar] [CrossRef]
- Yang, L.; Meng, L.; Gao, H.; Wang, J.; Zhao, C.; Guo, M.; He, Y.; Huang, L. Heavy Metal Detection in Mulberry Leaves: Laser-Induced Breakdown Spectroscopy Data. Data Brief 2020, 33, 106483. [Google Scholar] [CrossRef]
- Devore, J.L. Probability and Statistics for Engineering and the Sciences, 9th ed.; Cengage Learning: Boston, MA, USA, 2025. [Google Scholar]
- Sansonetti, J.E.; Martin, W.C. Handbook of Basic Atomic Spectroscopic Data. J. Phys. Chem. Ref. Data 2005, 34, 1559–2259. [Google Scholar] [CrossRef]
- Chicco, D.; Warrens, M.J.; Jurman, G. The Coefficient of Determination R-Squared Is More Informative than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef]
Sample | Target | Best Model | No. Features | R2c | R2p | Ref. |
---|---|---|---|---|---|---|
Cabbage | Cd | SG-PLSR | Not reported | 0.982 | 0.996 | [1] |
Mulberry Leaves | Cu | SOM-SPA-PLSR | 15 | 0.99 | 0.954 | [5] |
Cr | SOM-UVE-PLSR | 480 | 0.986 | 0.959 | ||
Sargassum Fusiforme | As | TV-PLSR | 8212 | 0.93 | 0.743 | [6] |
Cd | UVE-PLSR | 353 | 0.987 | 0.794 | ||
Cr | SPA-PLSR | 5 | 0.966 | 0.920 | ||
Pb | SPA-PLSR | 12 | 0.948 | 0.894 | ||
Zn | SPA-PLSR | 8 | 0.958 | 0.937 | ||
Cu | SPA-PLSR | 7 | 0.944 | 0.920 | ||
Hg | UVE-PLSR | 454 | 0.999 | 0.807 | ||
Soil | Cu | 5-CV-LASSO | Not reported | 0.95 | 0.94 | [7] |
Ni | 5-CV-PCR | 0.95 | 0.93 | |||
Cr | 5-CV-PCR | 0.96 | 0.91 | |||
Pb | 5-CV-LASSO | 0.87 | 0.89 | |||
Lettuce | Cd | GA-PLSR | 22 | 0.98 | 0.972 | [8] |
Mulberry Leaves | Cr | CCFCV-LS-SVM | 10 | 0.9 | 0.933 | [16] |
Cu | CCFCV-LS-SVM | 15 | 0.959 | 0.954 |
Sample | Target Groups | Models Tested | Accuracy | No. Features | Ref. |
---|---|---|---|---|---|
Lettuce | Cd/healthy | PCA-KNN | 0.92 | 16 | [8] |
PCA-RF | 0.96 | ||||
Tegillarca Granosa | Pb/Zn/Cd/Pb-Zn-Cd/healthy | PCA-KNN | 0.3 | Not reported | [9] |
PCA-LDA | 0.4 | ||||
PCA-SVM | 0.417 | ||||
PCA-LRC | 0.75 | ||||
PCA-LRC-SRD | 0.93 | ||||
Tegillarca Granosa | Pb/Zn/Cd/Pb-Zn-Cd/healthy | PCA-VIP-KNN | 0.440 | 690 | [14] |
PCA-VIP-LDA | 0.807 | 337 | |||
PCA-VIP-PLS-DA | 0.627 | 861 | |||
PCA-VIP-SVM | 0.887 | 582 | |||
PCA-VIP-LRC | 0.887 | 232 | |||
PCA-RFr-KNN | 0.427 | 8 | |||
PCA-RFr-LDA | 0.713 | 32 | |||
PCA-RFr-PLSA-DA | 0.627 | 195 | |||
PCA-RFr-SVM | 0.813 | 95 | |||
PCA-RFr-LRC | 0.853 | 88 | |||
PCA-TM-KNN | 0.407 | 1530 | |||
PCA-TM-LDA | 0.787 | 1530 | |||
PCA-TM-PLS-DA | 0.667 | 77 | |||
PCA-TM-SVM | 0.853 | 269 | |||
PCA-TM-LRC | 0.907 | 265 | |||
Soil | Low content of Pb | PCA-PLS-DA | 0.793 | Not reported | [23] |
PCA-SVM | 0.826 | ||||
PCA-DBN | 0.841 | ||||
High content of Pb | PCA-PLS-DA | 0.901 | |||
PCA-SVM | 0.959 | ||||
PCA-DBN | 0.962 |
Variable | Metal | Number of Samples | Concentration Levels (mg/L) | Mean Intensity | Std. Dev. of Intensity | Spectral Range | Spectral Points |
---|---|---|---|---|---|---|---|
Contamination concentration | Copper | 100 | 0, 500, 1000, 2000, 4000 | 8531.5 | 32,383.7 | 219–877 nm | 22,015 |
Chromium | 100 | 0, 100, 500, 800, 1000 | 8755.2 | 33,537.4 | 229–877 nm | 22,015 | |
Spectra per sample | Both | 200 | 80 spectra (16 positions × 5 shots) | — | — | — | — |
Sample format | Both | — | Pellets (10 × 10 × 2 mm, 600 MPa) | — | — | — | — |
Models | Accuracy (Cr) | Accuracy (Cu) | |
---|---|---|---|
RF | Train set | 1.0 | 1.0 |
Test set | 0.96 | 0.99 | |
LRC | Train set | 0.93 | 1.0 |
Test set | 0.85 | 0.94 | |
SVM | Train set | 0.97 | 1.0 |
Test set | 0.89 | 0.93 |
Models | Train Set | Test Set | |
---|---|---|---|
RF | R2 | 0.962 | 0.954 |
RMSE | 75.5 | 82.5 | |
ABS | 7.0 | ||
SVM | R2 | 0.232 | 0.229 |
RMSE | 382.23 | 382.31 | |
ABS | 0.049 |
Models | Train Set | Test Set | |
---|---|---|---|
RF | R2 | 0.9995 | 0.9994 |
RMSE | 33.15 | 34.68 | |
ABS | 1.54 | ||
SVM | R2 | −0.11603 | −0.11607 |
RMSE | 1494.012 | 1494.035 | |
ABS | 0.0228 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Castro Gutiérrez, H.; Robles-Algarín, C.; Polo, A. Data Augmentation and Machine Learning for Heavy Metal Detection in Mulberry Leaves Using Laser-Induced Breakdown Spectroscopy (LIBS) Spectral Data. Processes 2025, 13, 1688. https://doi.org/10.3390/pr13061688
Castro Gutiérrez H, Robles-Algarín C, Polo A. Data Augmentation and Machine Learning for Heavy Metal Detection in Mulberry Leaves Using Laser-Induced Breakdown Spectroscopy (LIBS) Spectral Data. Processes. 2025; 13(6):1688. https://doi.org/10.3390/pr13061688
Chicago/Turabian StyleCastro Gutiérrez, Heiner, Carlos Robles-Algarín, and Aura Polo. 2025. "Data Augmentation and Machine Learning for Heavy Metal Detection in Mulberry Leaves Using Laser-Induced Breakdown Spectroscopy (LIBS) Spectral Data" Processes 13, no. 6: 1688. https://doi.org/10.3390/pr13061688
APA StyleCastro Gutiérrez, H., Robles-Algarín, C., & Polo, A. (2025). Data Augmentation and Machine Learning for Heavy Metal Detection in Mulberry Leaves Using Laser-Induced Breakdown Spectroscopy (LIBS) Spectral Data. Processes, 13(6), 1688. https://doi.org/10.3390/pr13061688