Next Article in Journal
Improving Kazakh Abstractive Summarization with LLMs via Domain Adaptation and Self-Correction
Next Article in Special Issue
Benchmarking Energy Efficiency of Supervised Machine Learning Models on Multi-Domain Classification Datasets
Previous Article in Journal
Automotive Production Systems: A Diophantine Simulation Framework with Genetic Algorithm-Driven Stochastic Data Generation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Synthetic Data Augmentation for Robust Classification of Diabetic vs. Non-Diabetic Blood FTIR Spectra

1
Center for MicroElectromechanical Systems (CMEMS-UMinho), University of Minho, 4800-058 Guimaraes, Portugal
2
2Ai, School of Technology, Polytechnic University of Cávado and Ave, 4750-810 Barcelos, Portugal
3
Beijing Institute of Technology (BIT), Zhuhai BIT, Zhuhai 519088, China
4
LABBELS–Associate Laboratory, 4710-057 Braga, Portugal
*
Author to whom correspondence should be addressed.
Information 2026, 17(7), 638; https://doi.org/10.3390/info17070638
Submission received: 22 May 2026 / Revised: 26 June 2026 / Accepted: 26 June 2026 / Published: 29 June 2026
(This article belongs to the Special Issue Innovative Machine Learning Technologies and Applications)

Abstract

Early detection of diabetes mellitus (DM) is essential for preventing disease progression and improving clinical outcomes. However, developing robust machine learning (ML) models for diabetes diagnosis is often constrained by limited data availability, privacy regulations, and challenges with data sharing. This study investigates a privacy-preserving synthetic data augmentation framework for classifying diabetic and non-diabetic blood serum samples using Fourier Transform Infrared (FTIR) spectroscopy. Two deep generative approaches, Autoencoders (AEs) and Generative Adversarial Networks (GANs), were evaluated for their ability to generate realistic synthetic FTIR spectra while preserving the statistical and biochemical characteristics of the original dataset. Synthetic datasets generated by the AE and GAN models were assessed using six ML classifiers: Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbors (KNN), Gradient Boosting (GB), Logistic Regression (LoR), and Decision Tree (DT). Model performance was evaluated using accuracy, precision, recall, F1-score, Receiver Operating Characteristic (ROC) curves, and Area Under the Curve (AUC). Results showed that AE-generated spectra retained stronger discriminative characteristics and were more easily distinguished from the original spectra, whereas GAN-generated spectra exhibited lower classifier separability, suggesting closer alignment with the original data distribution and greater realism for privacy-oriented data augmentation. Correlation analysis demonstrated high spectral fidelity for both approaches. Compared with the original spectra, AE-generated spectra achieved r = 0.9990 and R2 = 0.9999, whereas GAN-generated spectra achieved r = 0.9982 and R2 = 0.9965. The most prominent diabetes related spectral variations were observed in the carbohydrate (1000–1200 cm−1), Amide I (~1650 cm−1), and lipid-associated (3000–3500 cm−1) regions. To explore the transferability of the proposed framework, a preliminary experimental feasibility study was conducted using independently acquired whole blood FTIR spectra. The generated spectra showed strong agreement with the measured whole blood spectra, demonstrating the potential applicability of the framework under alternative sampling conditions. Because the experimental cohort included only one diabetic volunteer, this analysis was intended solely as a proof-of-concept assessment of spectral feasibility and methodological transferability, rather than as a validation of diabetes classification performance. Overall, the findings demonstrate that synthetic data generation can effectively augment limited FTIR datasets while preserving privacy and key spectral characteristics. The proposed framework provides a promising foundation for privacy-aware biomedical data augmentation and future development of robust FTIR diabetes screening systems. The results should be interpreted as methodological evidence of feasibility and synthetic data utility rather than as evidence of clinical diagnostic readiness, as the serum dataset remains modest in size and the independent whole-blood experiment was intentionally exploring.
Keywords: Autoencoders; diabetes classification; FTIR spectroscopy; generative adversarial networks; synthetic data generation Autoencoders; diabetes classification; FTIR spectroscopy; generative adversarial networks; synthetic data generation

Share and Cite

MDPI and ACS Style

Fadlelmoula, A.; Boldyrev, K.N.; Gonçalves, M.; Torres, H.; Catarino, S.O.; Minas, G.; Carvalho, V. Synthetic Data Augmentation for Robust Classification of Diabetic vs. Non-Diabetic Blood FTIR Spectra. Information 2026, 17, 638. https://doi.org/10.3390/info17070638

AMA Style

Fadlelmoula A, Boldyrev KN, Gonçalves M, Torres H, Catarino SO, Minas G, Carvalho V. Synthetic Data Augmentation for Robust Classification of Diabetic vs. Non-Diabetic Blood FTIR Spectra. Information. 2026; 17(7):638. https://doi.org/10.3390/info17070638

Chicago/Turabian Style

Fadlelmoula, Ahmed, Kirill N. Boldyrev, Margarida Gonçalves, Helena Torres, Susana O. Catarino, Graça Minas, and Vitor Carvalho. 2026. "Synthetic Data Augmentation for Robust Classification of Diabetic vs. Non-Diabetic Blood FTIR Spectra" Information 17, no. 7: 638. https://doi.org/10.3390/info17070638

APA Style

Fadlelmoula, A., Boldyrev, K. N., Gonçalves, M., Torres, H., Catarino, S. O., Minas, G., & Carvalho, V. (2026). Synthetic Data Augmentation for Robust Classification of Diabetic vs. Non-Diabetic Blood FTIR Spectra. Information, 17(7), 638. https://doi.org/10.3390/info17070638

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop