A Machine Learning Framework for Cognitive Impairment Screening from Speech with Multimodal Large Models
Abstract
1. Introduction
2. Methods
2.1. Study Participants
2.2. Ethical Considerations
2.3. Automated Multilingual Cognitive Assessment and Speech-Based Classification Pipeline
2.4. Speech Data Collection
2.5. Feature Extraction and Speaker Identification Using the CosyVoice2 Audio Module
2.6. Establishment and Validation of Machine Learning Models
2.7. Feature Importance Assessment
2.8. Statistical Analysis Environment
3. Results
3.1. Patient Characteristics
3.2. Mel-Spectrogram and Spectral Analysis of Patient Speech
3.3. Performance Evaluation of Classification Models
3.4. Feature Importance Analysis
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Better, M.A. Alzheimer’s disease facts and figures. Alzheimer’s Dement. 2024, 20, 3708–3821. [Google Scholar] [CrossRef]
- Ssonko, M.; Hardy, A.; Naganathan, V.; Kalula, S.; Combrinck, M. Dementia subtypes, cognitive decline and survival among older adults attending a memory clinic in Cape Town. South Africa: A retrospective study. BMC Geriatr. 2023, 23, 829. [Google Scholar] [CrossRef] [PubMed]
- Scheltens, P.; De Strooper, B.; Kivipelto, M.; Holstege, H.; Chételat, G.; Teunissen, C.E.; Cummings, J.; Van Der Flier, W.M. Alzheimer’s disease. Lancet 2021, 397, 1577–1590. [Google Scholar] [CrossRef]
- Graff-Radford, J.; Yong, K.X.X.; Apostolova, L.G.; Bouwman, F.H.; Carrillo, M.; Dickerson, B.C.; Rabinovici, G.D.; Schott, J.M.; Jones, D.T.; Murray, M.E. New insights into atypical Alzheimer’s disease in the era of biomarkers. Lancet Neurol. 2021, 20, 222–234. [Google Scholar] [CrossRef]
- Anderson, N.D. State of the science on mild cognitive impairment (MCI). CNS Spectr. 2019, 24, 78–87. [Google Scholar] [CrossRef]
- Monfared, A.A.T.; Phan, N.T.N.; Pearson, I.; Mauskopf, J.; Cho, M.; Zhang, Q.; Hampel, H. A Systematic Review of Clinical Practice Guidelines for Alzheimer’s Disease and Strategies for Future Advancements. Neurol. Ther. 2023, 12, 1257–1284. [Google Scholar] [CrossRef]
- Xie, Q.; Ni, M.; Gao, F.; Dai, L.-B.; Lv, X.-Y.; Zhang, Y.-F.; Shi, Q.; Zhu, X.-X.; Xie, J.-K.; Shen, Y.; et al. Correlation between Cerebrospinal Fluid Core Alzheimer’s Disease Biomarkers and β-Amyloid PET in Chinese Dementia Population. ACS Chem. Neurosci. 2022, 13, 1558–1565. [Google Scholar] [CrossRef]
- Kumar, A.P.; Singh, N.; Nair, D.; Justin, A. Neuronal PET tracers for Alzheimer’s disease. Biochem. Biophys. Res. Commun. 2022, 587, 58–62. [Google Scholar] [CrossRef]
- Chapman, K.R.; Bing-Canar, H.; Alosco, M.L.; Steinberg, E.G.; Martin, B.; Chaisson, C.; Kowall, N.; Tripodis, Y.; Stern, R.A. Mini Mental State Examination and Logical Memory scores for entry into Alzheimer’s disease trials. Alzheimer’s Res. Therapy 2016, 8, 9. [Google Scholar] [CrossRef]
- Wang, G.; Initiative, F.T.A.D.N.; Estrella, A.; Hakim, O.; Milazzo, P.; Patel, S.; Pintagro, C.; Li, D.; Zhao, R.; Vance, D.E.; et al. Mini-Mental State Examination and Montreal Cognitive Assessment as Tools for Following Cognitive Changes in Alzheimer’s Disease Neuroimaging Initiative Participants. J. Alzheimer’s Dis. 2022, 90, 263–270. [Google Scholar] [CrossRef]
- Jia, X.; Wang, Z.; Huang, F.; Su, C.; Du, W.; Jiang, H.; Wang, H.; Wang, J.; Wang, F.; Su, W.; et al. A comparison of the Mini-Mental State Examination (MMSE) with the Montreal Cognitive Assessment (MoCA) for mild cognitive impairment screening in Chinese middle-aged and older population: A cross-sectional study. BMC Psychiatry 2021, 21, 485. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Jia, J.; Yang, Z. Mini-Mental State Examination in Elderly Chinese: A Population-Based Normative Study. J. Alzheimer’s Dis. 2016, 53, 487–496. [Google Scholar] [CrossRef] [PubMed]
- Jannati, A.; Toro-Serey, C.; Gomes-Osman, J.; Banks, R.; Ciesla, M.; Showalter, J.; Bates, D.; Tobyne, S.; Pascual-Leone, A. Digital Clock and Recall is superior to the Mini-Mental State Examination for the detection of mild cognitive impairment and mild dementia. Alzheimer’s Res. Therapy 2024, 16, 2. [Google Scholar] [CrossRef] [PubMed]
- Cay, G.; Pfeifer, V.A.; Lee, M.; Rouzi, M.D.; Nunes, A.S.; El-Refaei, N.; Momin, A.S.; Atique, M.M.U.; Mehl, M.R.; Vaziri, A.; et al. Harnessing Speech-Derived Digital Biomarkers to Detect and Quantify Cognitive Decline Severity in Older Adults. Gerontology 2024, 70, 429–438. [Google Scholar] [CrossRef]
- Amini, S.; Hao, B.; Zhang, L.; Song, M.; Gupta, A.; Karjadi, C.; Kolachalama, V.B.; Au, R.; Paschalidis, I.C. Automated detection of mild cognitive impairment and dementia from voice recordings: A natural language processing approach. Alzheimer’s Dement. 2023, 19, 946–955. [Google Scholar] [CrossRef]
- Cho, S.; Olm, C.A.; Ash, S.; Shellikeri, S.; Agmon, G.; Cousins, K.A.Q.; Irwin, D.J.; Grossman, M.; Liberman, M.; Nevler, N. Automatic classification of AD pathology in FTD phenotypes using natural speech. Alzheimer’s Dement. 2024, 20, 3416–3428. [Google Scholar] [CrossRef]
- König, A.; Linz, N.; Baykara, E.; Tröger, J.; Ritchie, C.; Saunders, S.; Teipel, S.; Köhler, S.; Sánchez-Benavides, G.; Grau-Rivera, O.; et al. Screening over Speech in Unselected Populations for Clinical Trials in AD (PROSPECT-AD): Study Design and Protocol. J. Prev. Alzheimer’s Dis. 2023, 10, 314–321. [Google Scholar] [CrossRef]
- Pistono, A.; Senoussi, M.; Guerrier, L.; Rafiq, M.; Giméno, M.; Péran, P.; Jucla, M.; Pariente, J. Language Network Connectivity Increases in Early Alzheimer’s Disease. J. Alzheimer’s Dis. 2021, 82, 447–460. [Google Scholar] [CrossRef]
- Thomas, J.A.; Burkhardt, H.A.; Chaudhry, S.; Ngo, A.D.; Sharma, S.; Zhang, L.; Au, R.; Ghomi, R.H. Assessing the Utility of Language and Voice Biomarkers to Predict Cognitive Impairment in the Framingham Heart Study Cognitive Aging Cohort Data. J. Alzheimers Dis. 2020, 76, 905–922. [Google Scholar] [CrossRef]
- Tavabi, N.; Stück, D.; Signorini, A.; Karjadi, C.; Al Hanai, T.; Sandoval, M.; Lemke, C.; Glass, J.; Hardy, S.; Lavallee, M.; et al. Cognitive Digital Biomarkers from Automated Transcription of Spoken Language. J. Prev. Alzheimer’s Dis. 2022, 9, 791–800. [Google Scholar] [CrossRef]
- Cheng, O.; Dines, J.; Doss, M.M. A Generalized Dynamic Composition Algorithm of Weighted Finite State Transducers for Large Vocabulary Speech Recognition. In Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing—ICASSP’07, Honolulu, HI, USA, 16–20 April 2007; IEEE: New York, NY, USA, 2007; pp. IV-345–IV-348. [Google Scholar] [CrossRef]
- Davis, S.; Mermelstein, P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 1980, 28, 357–366. [Google Scholar] [CrossRef]
- Hilger, A.I.; Cole, J.; Larson, C. Semantic focus mediates pitch auditory feedback control in phrasal prosody. Lang. Cogn. Neurosci. 2023, 38, 328–345. [Google Scholar] [CrossRef] [PubMed]
- Alhanai, T.; Au, R.; Glass, J. Spoken language biomarkers for detecting cognitive impairment. In Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan, 16–20 December 2017; IEEE: New York, NY, USA, 2017; pp. 409–416. [Google Scholar] [CrossRef]
- Shankar, R.; Bundele, A.; Mukhopadhyay, A. A Systematic Review of Natural Language Processing Techniques for Early Detection of Cognitive Impairment. Mayo Clin. Proc. Digit. Health 2025, 3, 100205. [Google Scholar] [CrossRef] [PubMed]
- Saeedi, S.; Hetjens, S.; Grimm, M.O.W.; Latoszek, B.B.V. Acoustic Speech Analysis in Alzheimer’s Disease: A Systematic Review and Meta-Analysis. J. Prev. Alzheimer’s Dis. 2024, 11, 1789–1797. [Google Scholar] [CrossRef]
- Chen, Y.; Hartsuiker, R.; Pistono, A. A comparison of different connected-speech tasks for detecting mild cognitive impairment using multivariate pattern analysis. Aphasiology 2024, 39, 476–499. [Google Scholar] [CrossRef]
- Jack, C.R.; Bennett, D.A.; Blennow, K.; Carrillo, M.C.; Dunn, B.; Haeberlein, S.B.; Holtzman, D.M.; Jagust, W.; Jessen, F.; Karlawish, J.; et al. NIA-AA Research Framework: Toward a biological definition of Alzheimer’s disease. Alzheimer’s Dement. 2018, 14, 535–562. [Google Scholar] [CrossRef]
- Du, Z.; Wang, Y.; Chen, Q.; Shi, X.; Lv, X.; Zhao, T.; Gao, Z.; Yang, Y.; Gao, C.; Wang, H.; et al. CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models. arXiv 2024, arXiv:2412.10117. [Google Scholar] [CrossRef]
- Agbavor, F.; Liang, H. Artificial Intelligence-Enabled End-To-End Detection and Assessment of Alzheimer’s Disease Using Voice. Brain Sci. 2022, 13, 28. [Google Scholar] [CrossRef]
- García-Gutiérrez, F.; Alegret, M.; Marquié, M.; Muñoz, N.; Ortega, G.; Cano, A.; De Rojas, I.; García-González, P.; Olivé, C.; Puerta, R.; et al. Unveiling the sound of the cognitive status: Machine Learning-based speech analysis in the Alzheimer’s disease spectrum. Alzheimer’s Res. Ther. 2024, 16, 26. [Google Scholar] [CrossRef]
- Chen, L.; Zhang, M.; Yu, W.; Yu, J.; Cui, Q.; Chen, C.; Liu, J.; Huang, L.; Liu, J.; Yu, W.; et al. A Fully Automated Mini-Mental State Examination Assessment Model Using Computer Algorithms for Cognitive Screening. J. Alzheimer’s Dis. 2024, 97, 1661–1672. [Google Scholar] [CrossRef]
- Sheng, Z.; Du, Z.; Zhang, S.; Yan, Z.; Yang, Y.; Ling, Z. SyncSpeech: Low-Latency and Efficient Dual-Stream Text-to-Speech based on Temporal Masked Transformer. arXiv 2025, arXiv:2502.11094. [Google Scholar] [CrossRef]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Norris, D.; McQueen, J.M.; Cutler, A. Prediction, Bayesian inference and feedback in speech recognition. Lang. Cogn. Neurosci. 2016, 31, 4–18. [Google Scholar] [CrossRef] [PubMed]
- Schober, P.; Vetter, T.R. Logistic Regression in Medical Research. Anesth. Analg. 2021, 132, 365–366. [Google Scholar] [CrossRef]
- Hu, J.; Szymczak, S. A review on longitudinal data analysis with random forest. Brief. Bioinform. 2023, 24, bbad002. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, T.; Liu, Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Adv. Neural Inf. Process. Syst. 2017, 30, 52. [Google Scholar]
- Drucker, H. Improving Regressors using Boosting Techniques. Icml 1997, 97, e115. [Google Scholar]
- Hall, P.; Park, B.U.; Samworth, R.J. Choice of neighbor order in nearest-neighbor classification. Ann. Statist. 2008, 36, 2135–2152. [Google Scholar] [CrossRef]
- Lemon, S.C.; Roy, J.; Clark, M.A.; Friedmann, P.D.; Rakowski, W. Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression. Ann. Behav. Med. 2003, 26, 172–181. [Google Scholar] [CrossRef]
- Zhang, F.; Petersen, M.; Johnson, L.; Hall, J.; O’Bryant, S.E. Recursive Support Vector Machine Biomarker Selection for Alzheimer’s Disease. J. Alzheimer’s Dis. 2021, 79, 1691–1700. [Google Scholar] [CrossRef] [PubMed]
- Cichosz, P. Data Mining Algorithms: Explained Using R, 1st ed.; Wiley: Hoboken, NJ, USA, 2015. [Google Scholar] [CrossRef]
- Lin, W.; Gao, Q.; Du, M.; Chen, W.; Tong, T. Multiclass diagnosis of stages of Alzheimer’s disease using linear discriminant analysis scoring for multimodal data. Comput. Biol. Med. 2021, 134, 104478. [Google Scholar] [CrossRef] [PubMed]
- Odom-Maryon, T.; Langholz, B.; Niland, J.; Azen, S. Generalization of normal discriminant analysis using fourier series density estimators. Stat. Med. 1991, 10, 473–485. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Shen, S.; Li, X.; Wang, S.; Xiao, Z.; Cheng, J.; Li, R. A multiclass extreme gradient boosting model for evaluation of transcriptomic biomarkers in Alzheimer’s disease prediction. Neurosci. Lett. 2024, 821, 137609. [Google Scholar] [CrossRef]
- Franciotti, R.; Nardini, D.; Russo, M.; Onofrj, M.; Sensi, S.L. Comparison of Machine Learning-based Approaches to Predict the Conversion to Alzheimer’s Disease from Mild Cognitive Impairment. Neuroscience 2023, 514, 143–152. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Müller, A.; Nothman, J.; Louppe, G.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2012, 12, 2825–2830. [Google Scholar] [CrossRef]
- Vimbi, V.; Shaffi, N.; Mahmud, M. Interpreting artificial intelligence models: A systematic review on the application of LIME and SHAP in Alzheimer’s disease detection. Brain Inf. 2024, 11, 10. [Google Scholar] [CrossRef]
- Yamada, Y.; Shinkawa, K.; Nemoto, M.; Nemoto, K.; Arai, T. A mobile application using automatic speech analysis for classifying Alzheimer’s disease and mild cognitive impairment. Comput. Speech Lang. 2023, 81, 101514. [Google Scholar] [CrossRef]
- Shanshool, M.A.; Abdulmohsin, H.A. A Comprehensive Review on Machine Learning Approaches for Enhancing Human Speech Recognition. Trait. Signal 2023, 40, 2121–2129. [Google Scholar] [CrossRef]
- Sheerur, M.S.S.; Nitnaware, D.V.N. Patil School Of Engineering Academy, Emotion Speech Recognition using MFCC and SVM. Int. J. Eng. Res. 2015, 4, IJERTV4IS060932. [Google Scholar] [CrossRef]
- Toth, L.; Hoffmann, I.; Gosztolya, G.; Vincze, V.; Szatloczki, G.; Banreti, Z.; Pakaski, M.; Kalman, J. A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech. Curr. Alzheimer Res. 2018, 15, 130–138. [Google Scholar] [CrossRef]
- Haider, F.; De La Fuente, S.; Luz, S. An Assessment of Paralinguistic Acoustic Features for Detection of Alzheimer’s Dementia in Spontaneous Speech. IEEE J. Sel. Top. Signal Process. 2020, 14, 272–281. [Google Scholar] [CrossRef]
- Martinc, M.; Haider, F.; Pollak, S.; Luz, S. Temporal Integration of Text Transcripts and Acoustic Features for Alzheimer’s Diagnosis Based on Spontaneous Speech. Front. Aging Neurosci. 2021, 13, 642647. [Google Scholar] [CrossRef]
- Farrús, M.; Codina-Filbà, J. Combining Prosodic, Voice Quality and Lexical Features to Automatically Detect Alzheimer’s Disease. arXiv 2020, arXiv:2011.09272. [Google Scholar] [CrossRef]
- Wang, H.-L.; Tang, R.; Ren, R.-J.; Dammer, E.B.; Guo, Q.-H.; Peng, G.-P.; Cui, H.-L.; Zhang, Y.-M.; Wang, J.-T.; Xie, X.-Y.; et al. Speech silence character as a diagnostic biomarker of early cognitive decline and its functional mechanism: A multicenter cross-sectional cohort study. BMC Med. 2022, 20, 380. [Google Scholar] [CrossRef]
- Themistocleous, C.; Eckerström, M.; Kokkinakis, D. Voice quality and speech fluency distinguish individuals with Mild Cognitive Impairment from Healthy Controls. PLoS ONE 2020, 15, e0236009. [Google Scholar] [CrossRef] [PubMed]
- Badal, V.D.; Reinen, J.M.; Twamley, E.W.; Lee, E.E.; Fellows, R.P.; Bilal, E.; Depp, C.A. Investigating Acoustic and Psycholinguistic Predictors of Cognitive Impairment in Older Adults: Modeling Study. JMIR Aging 2024, 7, e54655. [Google Scholar] [CrossRef]
- Cho, S.; Nevler, N.; Shellikeri, S.; Parjane, N.; Irwin, D.J.; Ryant, N.; Ash, S.; Cieri, C.; Liberman, M.; Grossman, M. Lexical and Acoustic Characteristics of Young and Older Healthy Adults. J. Speech Lang. Hear. Res. 2021, 64, 302–314. [Google Scholar] [CrossRef] [PubMed]







| Variable | Total N = 1098 | HC N = 179 | MCI N = 470 | AD N = 449 | p-Value |
|---|---|---|---|---|---|
| Age | 73.56 (11.6) | 73.00 (13) | 74.00 (11) | 73.00 (10) | 0.573 a |
| Female, N (%) | 738 (67.213) | 119 (66.48) | 307 (65.31) | 312 (69.49) | 0.394 b |
| Education, (y) | 8.62 (6.222) | 9.000 (6.0) | 9.000 (6.3) | 9.000 (6.0) | 0.632 a |
| MMSE | 26.05 (12.52) | 28.00 (2) | 22.00 (12) | 19.00 (15) | <0.001 a* |
| ADAS-Cog | 13.0574 (17.4433) | 8.300 (5.9) | 15.315 (16.08) | 18.7100 (26.03) | <0.001 a* |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chen, S.; Tan, Y.; Hu, W.; Chen, Y.; Chen, L.; He, Y.; Yu, W.; Lü, Y. A Machine Learning Framework for Cognitive Impairment Screening from Speech with Multimodal Large Models. Bioengineering 2026, 13, 73. https://doi.org/10.3390/bioengineering13010073
Chen S, Tan Y, Hu W, Chen Y, Chen L, He Y, Yu W, Lü Y. A Machine Learning Framework for Cognitive Impairment Screening from Speech with Multimodal Large Models. Bioengineering. 2026; 13(1):73. https://doi.org/10.3390/bioengineering13010073
Chicago/Turabian StyleChen, Shiyu, Ying Tan, Wenyu Hu, Yingxi Chen, Lihua Chen, Yurou He, Weihua Yu, and Yang Lü. 2026. "A Machine Learning Framework for Cognitive Impairment Screening from Speech with Multimodal Large Models" Bioengineering 13, no. 1: 73. https://doi.org/10.3390/bioengineering13010073
APA StyleChen, S., Tan, Y., Hu, W., Chen, Y., Chen, L., He, Y., Yu, W., & Lü, Y. (2026). A Machine Learning Framework for Cognitive Impairment Screening from Speech with Multimodal Large Models. Bioengineering, 13(1), 73. https://doi.org/10.3390/bioengineering13010073

