This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Machine Learning-Based Prediction of Autism Spectrum Disorder and Discovery of Related Metagenomic Biomarkers with Explainable AI
by
Mustafa Temiz
Mustafa Temiz 1,*
,
Burcu Bakir-Gungor
Burcu Bakir-Gungor 2
,
Nur Sebnem Ersoz
Nur Sebnem Ersoz 3 and
Malik Yousef
Malik Yousef 4,5,*
1
Department of Management Information Systems, Faculty of Economics and Administrative Sciences, Sivas Cumhuriyet University, Sivas 58140, Türkiye
2
Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri 38080, Türkiye
3
Department of Bioengineering, Graduate School of Engineering and Science, Abdullah Gul University, Kayseri 38080, Türkiye
4
Department of Information Systems, Zefat Academic College, Zefat 13206, Israel
5
Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat 1320611, Israel
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2025, 15(16), 9214; https://doi.org/10.3390/app15169214 (registering DOI)
Submission received: 18 July 2025
/
Revised: 15 August 2025
/
Accepted: 20 August 2025
/
Published: 21 August 2025
Abstract
Background: Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder characterized by social communication deficits and repetitive behaviors. Recent studies have suggested that gut microbiota may play a role in the pathophysiology of ASD. This study aims to develop a classification model for ASD diagnosis and to identify ASD-associated biomarkers by analyzing metagenomic data at the taxonomic level. Methods: The performances of five different methods were tested in this study. These methods are (i) SVM-RCE, (ii) RCE-IFE, (iii) microBiomeGSM, (iv) different feature selection methods, and (v) a union method. The last method is based on creating a union feature set consisting of the features with importance scores greater than 0.5, identified using the best-performing feature selection methods. Results: In our 10-fold Monte Carlo cross-validation experiments on ASD-associated metagenomic data, the most effective performance metric (an AUC of 0.99) was obtained using the union feature set (17 features) and the AdaBoost classifier. In other words, we achieve superior machine learning performance with a few features. Additionally, the SHAP method, which is an explainable artificial intelligence method, is applied to the union feature set, and Prevotella sp. 109 is identified as the most important microorganism for ASD development. Conclusions: These findings suggest that the proposed method may be a promising approach for uncovering microbial patterns associated with ASD and may inform future research in this area. This study should be regarded as exploratory, based on preliminary findings and hypothesis generation.
Share and Cite
MDPI and ACS Style
Temiz, M.; Bakir-Gungor, B.; Ersoz, N.S.; Yousef, M.
Machine Learning-Based Prediction of Autism Spectrum Disorder and Discovery of Related Metagenomic Biomarkers with Explainable AI. Appl. Sci. 2025, 15, 9214.
https://doi.org/10.3390/app15169214
AMA Style
Temiz M, Bakir-Gungor B, Ersoz NS, Yousef M.
Machine Learning-Based Prediction of Autism Spectrum Disorder and Discovery of Related Metagenomic Biomarkers with Explainable AI. Applied Sciences. 2025; 15(16):9214.
https://doi.org/10.3390/app15169214
Chicago/Turabian Style
Temiz, Mustafa, Burcu Bakir-Gungor, Nur Sebnem Ersoz, and Malik Yousef.
2025. "Machine Learning-Based Prediction of Autism Spectrum Disorder and Discovery of Related Metagenomic Biomarkers with Explainable AI" Applied Sciences 15, no. 16: 9214.
https://doi.org/10.3390/app15169214
APA Style
Temiz, M., Bakir-Gungor, B., Ersoz, N. S., & Yousef, M.
(2025). Machine Learning-Based Prediction of Autism Spectrum Disorder and Discovery of Related Metagenomic Biomarkers with Explainable AI. Applied Sciences, 15(16), 9214.
https://doi.org/10.3390/app15169214
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.