Efficient Pause Extraction and Encode Strategy for Alzheimer’s Disease Detection Using Only Acoustic Features from Spontaneous Speech
Abstract
:1. Introduction
2. Materials and Methods
2.1. System Framework
2.2. Dataset Descriptions
2.2.1. Public Dataset
- Dataset ADReSS: 70% of the ADReSS2020 data were used as the training set, and 30% of the data were used as the test set;
- Dataset ADReSSo: 70% of the ADReSSo2021 data were used as the training set, and 30% of the data were used as the test set. Table 1 shows the composition and distribution of the datasets we used.
2.2.2. Local Dataset
2.3. Preprocessing
2.4. Feature Extraction
2.4.1. VAD Pause Feature
- For each sub-band, we made the assumptions that:
- 2.
- For each sub-band, the probability that it belonged to the silent state was calculated based on the silent Gaussian mixture model (GMM):
- 3.
- The log-likelihood ratio Li for each sub-band and the total log-likelihood ratio Lt were calculated according to the formula:
- 4.
- The thresholds Tτ and Ta were compared to determine whether the audio frame was audible or silent:
2.4.2. Common Acoustic Feature Sets
2.5. Ensemble Classification and Voting
2.6. Evaluation
2.6.1. Classification Metrics
2.6.2. Statistical Analysis
3. Results
3.1. Comparison of VAD Pauses in AD and Non-AD Subjects
3.2. Quantitative Results in Classic Machine-Learning Methods
3.3. Statistical Analysis of Classification Methods
3.4. Experimental Results on a Local Dataset
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Derby, C.A. Trends in the public health significance, definitions of disease, and implications for prevention of Alzheimer’s disease. Curr. Epidemiol. Rep. 2020, 7, 68–76. [Google Scholar] [CrossRef] [PubMed]
- Alzheimer’s Disease International. World Alzheimer Report 2019: Attitudes to Dementia; Alzheimer’s Disease International: London, UK, 2019. [Google Scholar]
- Mahajan, P.; Baths, V. Acoustic and language based deep learning approaches for Alzheimer’s dementia detection from spontaneous speech. Front. Aging Neurosci. 2021, 13, 623607. [Google Scholar] [CrossRef] [PubMed]
- Mueller, K.D.; Koscik, R.L.; Hermann, B.P.; Johnson, S.C.; Turkstra, L.S. Declines in connected language are associated with very early mild cognitive impairment: Results from the Wisconsin registry for Alzheimer’s prevention. Front. Aging Neurosci. 2018, 9, 437. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Mesulam, M.; Wicklund, A.; Johnson, N.; Rogalski, E.; Léger, G.C.; Rademaker, A.; Weintraub, S.; Bigio, E.H. Alzheimer and frontotemporal pathology in subsets of primary progressive aphasia. Ann. Neurol. 2008, 63, 709–719. [Google Scholar] [CrossRef] [Green Version]
- Meghanani, A.; Anoop, C.; Ramakrishnan, A.G. Recognition of alzheimer’s dementia from the transcriptions of spontaneous speech using fasttext and cnn models. Front. Comput. Sci. 2021, 3, 624558. [Google Scholar] [CrossRef]
- Yuan, J.; Cai, X.; Bian, Y.; Ye, Z.; Church, K. Pauses for detection of Alzheimer’s disease. Front. Comput. Sci. 2021, 2, 624488. [Google Scholar] [CrossRef]
- Agbavor, F.; Liang, H. Artificial Intelligence-Enabled End-To-End Detection and Assessment of Alzheimer’s Disease Using Voice. Brain Sci. 2023, 13, 28. [Google Scholar] [CrossRef]
- Luz, S. Longitudinal monitoring and detection of Alzheimer’s type dementia from spontaneous speech data. In Proceedings of the 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS), Thessaloniki, Greece, 22–24 June 2017; pp. 45–46. [Google Scholar]
- Eyben, F.; Weninger, F.; Gross, F.; Schuller, B. Recent developments in opensmile, the munich open-source multimedia feature extractor. In Proceedings of the 21st ACM International Conference on Multimedia, Barcelona Spain, 21–25 October 2013; pp. 835–838. [Google Scholar]
- Eyben, F.; Scherer, K.R.; Schuller, B.W.; Sundberg, J.; André, E.; Busso, C.; Devillers, L.Y.; Epps, J.; Laukka, P.; Narayanan, S.S. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 2015, 7, 190–202. [Google Scholar] [CrossRef] [Green Version]
- Nasrolahzadeh, M.; Rahnamayan, S.; Haddadnia, J. Alzheimer’s disease diagnosis using genetic programming based on higher order spectra features. Mach. Learn. Appl. 2022, 7, 100225. [Google Scholar] [CrossRef]
- Lopez-de-Ipiña, K.; Alonso-Hernández, J.; Solé-Casals, J.; Travieso-González, C.M.; Ezeiza, A.; Faundez-Zanuy, M.; Calvo, P.M.; Beitia, B. Feature selection for automatic analysis of emotional response based on nonlinear speech modeling suitable for diagnosis of Alzheimer’s disease. Neurocomputing 2015, 150, 392–401. [Google Scholar] [CrossRef] [Green Version]
- Nasrolahzadeh, M.; Haddadnia, J.; Rahnamayan, S. Multi-objective optimization of wavelet-packet-based features in pathological diagnosis of alzheimer using spontaneous speech signals. IEEE Access 2020, 8, 112393–112406. [Google Scholar] [CrossRef]
- Ash, S.; Moore, P.; Vesely, L.; Gunawardena, D.; McMillan, C.; Anderson, C.; Avants, B.; Grossman, M. Non-fluent speech in frontotemporal lobar degeneration. J. Neurolinguist. 2009, 22, 370–383. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ash, S.; Xie, S.X.; Gross, R.G.; Dreyfuss, M.; Boller, A.; Camp, E.; Morgan, B.; O’Shea, J.; Grossman, M. The organization and anatomy of narrative comprehension and expression in Lewy body spectrum disorders. Neuropsychology 2012, 26, 368. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wilson, S.M.; Henry, M.L.; Besbris, M.; Ogar, J.M.; Dronkers, N.F.; Jarrold, W.; Miller, B.L.; Gorno-Tempini, M.L. Connected speech production in three variants of primary progressive aphasia. Brain 2010, 133, 2069–2088. [Google Scholar] [CrossRef] [Green Version]
- Lindsay, H.; Tröger, J.; König, A. Language impairment in alzheimer’s disease—Robust and explainable evidence for ad-related deterioration of spontaneous speech through multilingual machine learning. Front. Aging Neurosci. 2021, 228, 642033. [Google Scholar] [CrossRef]
- Pistono, A.; Jucla, M.; Barbeau, E.J.; Saint-Aubert, L.; Lemesle, B.; Calvet, B.; Köpke, B.; Puel, M.; Pariente, J. Pauses during autobiographical discourse reflect episodic memory processes in early Alzheimer’s disease. J. Alzheimer’s Dis. 2016, 50, 687–698. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yuan, J.; Xu, X.; Lai, W.; Liberman, M. Pauses and pause fillers in Mandarin monologue speech: The effects of sex and proficiency. In Proceedings of the Speech Prosody 2016, Boston, MA, USA, 31 May–3 June 2016; pp. 1167–1170. [Google Scholar]
- Shea, C.; Leonard, K. Evaluating measures of pausing for second language fluency research. Can. Mod. Lang. Rev. 2019, 75, 216–235. [Google Scholar] [CrossRef]
- Ogata, J.; Goto, M.; Itou, K. The use of acoustically detected filled and silent pauses in spontaneous speech recognition. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; pp. 4305–4308. [Google Scholar]
- Vincze, V.; Szatlóczki, G.; Tóth, L.; Gosztolya, G.; Pákáski, M.; Hoffmann, I.; Kálmán, J. Telltale silence: Temporal speech parameters discriminate between prodromal dementia and mild Alzheimer’s disease. Clin. Linguist. Phon. 2021, 35, 727–742. [Google Scholar] [CrossRef]
- Pistono, A.; Pariente, J.; Bézy, C.; Lemesle, B.; Le Men, J.; Jucla, M. What happens when nothing happens? An investigation of pauses as a compensatory mechanism in early Alzheimer’s disease. Neuropsychologia 2019, 124, 133–143. [Google Scholar] [CrossRef] [PubMed]
- Pastoriza-Domínguez, P.; Torre, I.G.; Diéguez-Vide, F.; Gomez-Ruiz, I.; Gelado, S.; Bello-López, J.; Ávila-Rivera, A.; Matias-Guiu, J.A.; Pytel, V.; Hernández-Fernández, A. Speech pause distribution as an early marker for Alzheimer’s disease. Speech Commun. 2022, 136, 107–117. [Google Scholar] [CrossRef]
- Gayraud, F.; Lee, H.-R.; Barkat-Defradas, M. Syntactic and lexical context of pauses and hesitations in the discourse of Alzheimer patients and healthy elderly subjects. Clin. Linguist. Phon. 2011, 25, 198–209. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ditthapron, A.; Lammert, A.C.; Agu, E.O. Continuous TBI Monitoring From Spontaneous Speech Using Parametrized Sinc Filters and a Cascading GRU. IEEE J. Biomed. Health Inform. 2022, 26, 3517–3528. [Google Scholar] [CrossRef] [PubMed]
- Lfab, C.; Abb, C.; Lba, D.; Jfb, C. Speech timing changes accompany speech entrainment in aphasia—ScienceDirect. J. Commun. Disord. 2021, 90, 106090. [Google Scholar]
- Luz, S.; Haider, F.; de la Fuente, S.; Fromm, D.; MacWhinney, B. Alzheimer’s dementia recognition through spontaneous speech: The ADReSS challenge. arXiv 2020, arXiv:2004.06833. [Google Scholar] [CrossRef]
- Luz, S.; Haider, F.; de la Fuente, S.; Fromm, D.; MacWhinney, B. Detecting cognitive decline using speech only: The ADReSSo Challenge. arXiv 2021, arXiv:2104.09356. [Google Scholar] [CrossRef]
- Becker, J.T.; Boiler, F.; Lopez, O.L.; Saxton, J.; McGonigle, K.L. The natural history of Alzheimer’s disease: Description of study cohort and accuracy of diagnosis. Arch. Neurol. 1994, 51, 585–594. [Google Scholar] [CrossRef]
- Goodglass, H.; Kaplan, E.; Weintraub, S. BDAE: The Boston Diagnostic Aphasia Examination; Lippincott Williams & Wilkins: Philadelphia, PA, USA, 2001. [Google Scholar]
- Jack Jr, C.R.; Albert, M.; Knopman, D.S.; McKhann, G.M.; Sperling, R.A.; Carillo, M.; Thies, W.; Phelps, C.H. Introduction to revised criteria for the diagnosis of Alzheimer’s disease: National Institute on Aging and the Alzheimer Association Workgroups. Alzheimer’s Dement. J. Alzheimer’s Assoc. 2011, 7, 257. [Google Scholar] [CrossRef] [Green Version]
- Eyben, F.; Wöllmer, M.; Schuller, B. Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM International Conference on Multimedia, New York, NY, USA; 2010; pp. 1459–1462. [Google Scholar]
- Koo, J.; Lee, J.H.; Pyo, J.; Jo, Y.; Lee, K. Exploiting multi-modal features from pre-trained networks for Alzheimer’s dementia recognition. arXiv 2020, arXiv:2009.04070. [Google Scholar] [CrossRef]
- Cummins, N.; Pan, Y.; Ren, Z.; Fritsch, J.; Nallanthighal, V.S.; Christensen, H.; Blackburn, D.; Schuller, B.W.; Magimai-Doss, M.; Strik, H. A comparison of acoustic and linguistics methodologies for Alzheimer’s dementia recognition. In Proceedings of the Interspeech 2020, Shanghai, China, 25–29 October 2020; pp. 2182–2186. [Google Scholar]
- Rohanian, M.; Hough, J.; Purver, M. Multi-modal fusion with gating using audio, lexical and disfluency features for Alzheimer’s dementia recognition from spontaneous speech. arXiv 2021, arXiv:2106.09668. [Google Scholar]
- Pappagari, R.; Cho, J.; Moro-Velazquez, L.; Dehak, N. Using State of the Art Speaker Recognition and Natural Language Processing Technologies to Detect Alzheimer’s Disease and Assess its Severity. In Proceedings of the Interspeech 2020, Shanghai, China, 25–29 October 2020; pp. 2177–2181. [Google Scholar]
- Edwards, E.; Dognin, C.; Bollepalli, B.; Singh, M.K.; Analytics, V. Multiscale System for Alzheimer’s Dementia Recognition Through Spontaneous Speech. In Proceedings of the Interspeech 2020, Shanghai, China, 25–29 October 2020; pp. 2197–2201. [Google Scholar]
- Balagopalan, A.; Novikova, J. Comparing Acoustic-based Approaches for Alzheimer’s Disease Detection. arXiv 2021, arXiv:2106.01555. [Google Scholar]
- Pan, Y.; Mirheidari, B.; Harris, J.M.; Thompson, J.C.; Jones, M.; Snowden, J.S.; Blackburn, D.; Christensen, H. Using the Outputs of Different Automatic Speech Recognition Paradigms for Acoustic-and BERT-Based Alzheimer’s Dementia Detection Through Spontaneous Speech. In Proceedings of the Interspeech 2021, Brno, Czech Republic, 30 August–3 September 2021; pp. 3810–3814. [Google Scholar]
- Pérez-Toro, P.A.; Bayerl, S.P.; Arias-Vergara, T.; Vásquez-Correa, J.C.; Klumpp, P.; Schuster, M.; Nöth, E.; Orozco-Arroyave, J.R.; Riedhammer, K. Influence of the Interviewer on the Automatic Assessment of Alzheimer’s Disease in the Context of the ADReSSo Challenge. In Proceedings of the Interspeech 2021, Brno, Czech Republic, 30 August–3 September 2021; pp. 3785–3789. [Google Scholar]
- Pappagari, R.; Cho, J.; Joshi, S.; Moro-Velázquez, L.; Zelasko, P.; Villalba, J.; Dehak, N. Automatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios. In Proceedings of the Interspeech 2021, Brno, Czech Republic, 30 August–3 September 2021; pp. 3825–3829. [Google Scholar]
- Chen, J.; Ye, J.; Tang, F.; Zhou, J. Automatic detection of Alzheimer’s disease using spontaneous speech only. In Proceedings of the Interspeech 2021, Brno, Czech Republic, 30 August–3 September 2021; p. 3830. [Google Scholar]
- Daneman, M. Working memory as a predictor of verbal fluency. J. Psycholinguist. Res. 1991, 20, 445–464. [Google Scholar] [CrossRef]
- Arciuli, J.; Mallard, D.; Villar, G. “Um, I can tell you’re lying”: Linguistic markers of deception versus truth-telling in speech. Appl. Psycholinguist. 2010, 31, 397–411. [Google Scholar] [CrossRef]
- Laws, K.R.; Duncan, A.; Gale, T.M. ‘Normal’semantic–phonemic fluency discrepancy in Alzheimer’s disease? A meta-analytic study. Cortex 2010, 46, 595–601. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- López-de-Ipiña, K.; Alonso, J.-B.; Travieso, C.M.; Solé-Casals, J.; Egiraun, H.; Faundez-Zanuy, M.; Ezeiza, A.; Barroso, N.; Ecay-Torres, M.; Martinez-Lage, P. On the selection of non-invasive methods based on speech analysis oriented to automatic Alzheimer disease diagnosis. Sensors 2013, 13, 6730–6745. [Google Scholar] [CrossRef]
- Tran, T.; Toshniwal, S.; Bansal, M.; Gimpel, K.; Livescu, K.; Ostendorf, M. Parsing speech: A neural approach to integrating lexical and acoustic-prosodic information. arXiv 2017, arXiv:1704.07287. [Google Scholar] [CrossRef]
- Mignard, P.; Cave, C.; Lagrue, B.; Meynadier, Y.; Viallet, F. Silent pauses in Parkinsonian patients during spontaneous speech and reading: An instrumental study. Rev. De Neuropsychol. 2001, 11, 39–63. [Google Scholar]
- Potagas, C.; Nikitopoulou, Z.; Angelopoulou, G.; Kasselimis, D.; Laskaris, N.; Kourtidou, E.; Constantinides, V.C.; Bougea, A.; Paraskevas, G.P.; Papageorgiou, G.; et al. Silent Pauses and Speech Indices as Biomarkers for Primary Progressive Aphasia. Medicina 2022, 58, 1352. [Google Scholar] [CrossRef]
- Imre, N.; Balogh, R.; Gosztolya, G.; Tóth, L.; Hoffmann, I.; Várkonyi, T.; Lengyel, C.; Pákáski, M.; Kálmán, J. Temporal Speech Parameters Indicate Early Cognitive Decline in Elderly Patients With Type 2 Diabetes Mellitus. Alzheimer Dis. Assoc. Disord. 2022, 36, 148. [Google Scholar] [CrossRef]
- Lu, X.; Shi, D.; Liu, Y.; Yuan, J. Speech depression recognition based on attentional residual network. Front. Biosci.-Landmark 2021, 26, 1746–1759. [Google Scholar] [CrossRef]
- Le, D.; Licata, K.; Provost, E.M. Automatic Quantitative Analysis of Spontaneous Aphasic Speech. Speech Commun. 2018, 100, 1–12. [Google Scholar] [CrossRef]
Dataset | AD | Non-AD | Total | |
---|---|---|---|---|
Dataset ADReSS | Training | 54 | 54 | 108 |
Test | 24 | 24 | 48 | |
Total | 78 | 78 | 156 | |
Dataset ADReSSo | Training | 87 | 79 | 166 |
Test | 35 | 36 | 71 | |
Total | 122 | 115 | 237 |
Sample | MMSE | Age | Gender | |
---|---|---|---|---|
CN | 1 | 26 | 77 | male |
2 | 27 | 73 | male | |
3 | 27 | 69 | female | |
4 | 29 | 74 | male | |
5 | 29 | 74 | male | |
AD | 6 | 11 | 70 | female |
7 | 18 | 60 | female | |
8 | 21 | 84 | female | |
9 | 17 | 80 | female | |
10 | 11 | 79 | female |
Data | Study | Extracted Features | Classifiers | ACC (%) | DL Used |
---|---|---|---|---|---|
Dataset ADReSS | Koo et al. [35] | VGGish | Uni-CRNN | 72.9 | Yes |
Cummins et al. [36] | Log-Mel spectrograms | SiameseNet | 70.8 | Yes | |
Rohanian et al. [37] | COVAREP | LSTM | 66.6 | Yes | |
Pappagari et al. [38] | X-vectors and silence features | PLDA | 66.7 | Yes | |
Edwards et al. [39] | ComParE | LDA | 60.4 | No | |
Luz et al. (Baseline) [29] | ComParE | LDA | 62.5 | No | |
Our method | VAD Pause feature and ComParE | TB | 70.0 | No | |
Dataset ADReSSo | Balagopalan et al. [40] | Conventional acoustic features and wav2vec2.0 pre-trained acoustic embeddings | SVM | 67.6 | Yes |
Pan et al. [41] | Wav2vec2.0 pre-trained acoustic embeddings | TB | 74.7 | Yes | |
Pérez-Toro et al. [42] | X-vectors and dominance embeddings | RBF-SVM | 67.6 | Yes | |
Pappagari et al. [43] | X-vectors, x-vectors (250ms) and encoder–decoder ASR embeddings | LR | 74.7 | Yes | |
Chen et al. [44] | MFCC, GeMAPS, eGeMAPS, ComParE and IS10-Paralinguistics | LR | 67.6 | No | |
Luz et al. (Baseline) [30] | eGeMAPS | SVM | 64.8 | No | |
Our method | VAD Pause feature and eGeMAPS | TB | 70.7 | No |
Ture | Sample | LDA | DT | KNN | SVM | TB |
---|---|---|---|---|---|---|
CN (0) | 1 | 1 | 0 | 0 | 1 | 1 |
2 | 0 | 0 | 0 | 0 | 0 | |
3 | 1 | 1 | 0 | 1 | 1 | |
4 | 0 | 0 | 0 | 0 | 0 | |
5 | 1 | 0 | 1 | 0 | 1 | |
AD (1) | 6 | 1 | 1 | 1 | 1 | 1 |
7 | 1 | 1 | 1 | 1 | 1 | |
8 | 1 | 1 | 1 | 1 | 1 | |
9 | 1 | 1 | 0 | 1 | 1 | |
10 | 1 | 0 | 1 | 1 | 1 | |
Accuracy (%) | 70 | 80 | 80 | 80 | 70 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Fu, F.; Li, L.; Yu, J.; Zhong, D.; Zhu, S.; Zhou, Y.; Liu, B.; Li, J. Efficient Pause Extraction and Encode Strategy for Alzheimer’s Disease Detection Using Only Acoustic Features from Spontaneous Speech. Brain Sci. 2023, 13, 477. https://doi.org/10.3390/brainsci13030477
Liu J, Fu F, Li L, Yu J, Zhong D, Zhu S, Zhou Y, Liu B, Li J. Efficient Pause Extraction and Encode Strategy for Alzheimer’s Disease Detection Using Only Acoustic Features from Spontaneous Speech. Brain Sciences. 2023; 13(3):477. https://doi.org/10.3390/brainsci13030477
Chicago/Turabian StyleLiu, Jiamin, Fan Fu, Liang Li, Junxiao Yu, Dacheng Zhong, Songsheng Zhu, Yuxuan Zhou, Bin Liu, and Jianqing Li. 2023. "Efficient Pause Extraction and Encode Strategy for Alzheimer’s Disease Detection Using Only Acoustic Features from Spontaneous Speech" Brain Sciences 13, no. 3: 477. https://doi.org/10.3390/brainsci13030477
APA StyleLiu, J., Fu, F., Li, L., Yu, J., Zhong, D., Zhu, S., Zhou, Y., Liu, B., & Li, J. (2023). Efficient Pause Extraction and Encode Strategy for Alzheimer’s Disease Detection Using Only Acoustic Features from Spontaneous Speech. Brain Sciences, 13(3), 477. https://doi.org/10.3390/brainsci13030477