Artificial Intelligence-Based Voice Assessment of Patients with Parkinson’s Disease Off and On Treatment: Machine vs. Deep-Learning Comparison
Abstract
:1. Introduction
2. Materials
2.1. Dataset
2.2. Audio Pre-Processing
3. Methods
3.1. Traditional Machine Learning Approach
- Feature extraction;
- Feature selection;
- Model training.
3.1.1. Feature Extraction
3.1.2. Feature Selection
3.1.3. Model Training
- For the SVM classifier, the optimizer selected the kernel between linear or radial-basis, as well as the values of c and gamma;
- For the kNN classifier, the optimizer selected the distance/similarity metric between Euclidean, Manhattan, Chebyshev, Hamming, cosine, correlation or Mahalanobis distances;
- For the NB classifier, the optimizer performed a kernel density estimation procedure to choose the kernel function—Gaussian, triangular or Epanechnikov—and its width.
3.2. Deep Learning Approach
- Time stretching: slows down or speeds up the signal at a random rate between 0.6 and 1.4;
- Pitch shifting: shifts the pitch of the signal up or down by a random amount between 1 and 3 semitones;
- Noise addition: adds Gaussian noise to the original signal with an amplitude equal to 10% of the RMS value of the signal;
- Room simulation: this algorithm simulates the frequency response of a large and reverberating room;
- Time masking: covers part of the spectrogram over time with rectangular monochromatic boxes;
- Frequency masking: covers with rectangular monochromatic boxes part of the frequencies of the spectrogram.
4. Results
4.1. Traditional ML Classification Approach
4.2. Comparison between Classic ML and CNN Models
4.3. Vocal Biomarkers
5. Discussion
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yu, K.-H.; Beam, A.L.; Kohane, I.S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018, 2, 719–731. [Google Scholar] [CrossRef] [PubMed]
- Saggio, G.; Quitadamo, L.R.; Albero, L. Development and evaluation of a novel low-cost sensor-based knee flexion angle measurement system. Knee 2014, 21, 896–901. [Google Scholar] [CrossRef] [PubMed]
- Costantini, G.; Casali, D.; Paolizzo, F.; Alessandrini, M.; Micarelli, A.; Viziano, A.; Saggio, G. Towards the enhancement of body standing balance recovery by means of a wireless audio-biofeedback system. Med. Eng. Phys. 2018, 54, 74–81. [Google Scholar] [CrossRef] [PubMed]
- Saggio, G.; Tombolini, F.; Ruggiero, A. Technology-Based Complex Motor Tasks Assessment: A 6-DOF Inertial-Based System Versus a Gold-Standard Optoelectronic-Based One. IEEE Sens. J. 2021, 21, 1616–1624. [Google Scholar] [CrossRef]
- Suppa, A.; Asci, F.; Saggio, G.; Di Leo, P.; Zarezadeh, Z.; Ferrazzano, G.; Ruoppolo, G.; Berardelli, A.; Costantini, G. Voice Analysis with Machine Learning: One Step Closer to an Objective Diagnosis of Essential Tremor. Mov. Disord. 2021, 36, 1401–1410. [Google Scholar] [CrossRef]
- Smartphone Subscriptions Worldwide 2027 | Statista. Available online: https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/ (accessed on 20 October 2022).
- How Many People Have Smartphones Worldwide. 2022. Available online: https://www.bankmycell.com/blog/how-many-phones-are-in-the-world (accessed on 2 December 2022).
- Milling, M.; Pokorny, F.B.; Bartl-Pokorny, K.D.; Schuller, B.W. Is Speech the New Blood? Recent Progress in AI-Based Disease Detection From Audio in a Nutshell. Front. Digit. Health 2022, 4, 886615. Available online: https://www.frontiersin.org/articles/10.3389/fdgth.2022.886615 (accessed on 17 January 2023). [CrossRef]
- Amato, F.; Borzì, L.; Olmo, G.; Orozco-Arroyave, J.R. An algorithm for Parkinson’s disease speech classification based on isolated words analysis. Health Inf. Sci. Syst. 2021, 9, 32. [Google Scholar] [CrossRef]
- Poewe, W.; Seppi, K.; Tanner, C.M.; Halliday, G.M.; Brundin, P.; Volkmann, J.; Schrag, A.-E.; Lang, A.E. Parkinson disease. Nat. Rev. Dis. Prim. 2017, 3, 1–21. [Google Scholar] [CrossRef]
- Hlavnička, J.; Čmejla, R.; Tykalová, T.; Šonka, K.; Růžička, E.; Rusz, J. Automated analysis of connected speech reveals early biomarkers of Parkinson’s disease in patients with rapid eye movement sleep behaviour disorder. Sci. Rep. 2017, 7, 12. [Google Scholar] [CrossRef] [Green Version]
- Defazio, G.; Guerrieri, M.; Liuzzi, D.; Gigante, A.F.; di Nicola, V. Assessment of voice and speech symptoms in early Parkinson’s disease by the Robertson dysarthria profile. Neurol. Sci. 2016, 37, 443–449. [Google Scholar] [CrossRef]
- Massano, J.; Bhatia, K.P. Clinical approach to Parkinson’s disease: Features, diagnosis, and principles of management. Cold Spring Harb. Perspect. Med. 2012, 2, a008870. [Google Scholar] [CrossRef]
- Ricci, M.; Di Lazzaro, G.; Pisani, A.; Mercuri, N.B.; Giannini, F.; Saggio, G. Assessment of Motor Impairments in Early Untreated Parkinson’s Disease Patients: The Wearable Electronics Impact. IEEE J. Biomed. Health Inform. 2020, 24, 120–130. [Google Scholar] [CrossRef]
- Gómez-García, J.A.; Moro-Velázquez, L.; Godino-Llorente, J.I. On the design of automatic voice condition analysis systems. Part I: Review of concepts and an insight to the state of the art. Biomed. Signal Process. Control 2019, 51, 181–199. [Google Scholar] [CrossRef] [Green Version]
- Amato, F.; Borzì, L.; Olmo, G.; Artusi, C.A.; Imbalzano, G.; Lopiano, L. Speech Impairment in Parkinson’s Disease: Acoustic Analysis of Unvoiced Consonants in Italian Native Speakers. IEEE Access 2021, 9, 166370–166381. [Google Scholar] [CrossRef]
- Ma, A.; Lau, K.K.; Thyagarajan, D. Voice changes in Parkinson’s disease: What are they telling us? J. Clin. Neurosci. 2020, 72, 1–7. [Google Scholar] [CrossRef]
- Kim, S.; Kwon, N.; O’Connell, H.; Fisk, N.; Ferguson, S.; Bartlett, M. “How are you?” Estimation of anxiety, sleep quality, and mood using computational voice analysis. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020; pp. 5369–5373. [Google Scholar]
- Jacobi, J.; Rebernik, T. The effect of levodopa on vowel articulation in Parkinson’s disease. In Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia, 5–9 August 2019; pp. 1069–1073. [Google Scholar]
- Costantini, G.; Cesarini, V.; Robotti, C.; Benazzo, M.; Pietrantonio, F.; Girolamo, S.; Pisani, A.; Canzi, P.; Mauramati, S.; Bertino, G.; et al. Deep learning and machine learning-based voice analysis for the detection of COVID-19: A proposal and comparison of architectures. Knowl.-Based Syst. 2022, 253, 109539. [Google Scholar] [CrossRef]
- Suppa, A.; Costantini, G.; Asci, F.; Di Leo, P.; Al-Wardat, M.; Di Lazzaro, G.; Scalise, S.; Pisani, A.; Saggio, G. Voice in Parkinson’s Disease: A Machine Learning Study. Front. Neurol. 2022, 13, 831428. [Google Scholar] [CrossRef]
- Robotti, C.; Costantini, G.; Saggio, G.; Cesarini, V.; Calastri, A.; Maiorano, E.; Piloni, D.; Perrone, T.; Sabatini, U.; Ferretti, V.; et al. Machine Learning-based Voice Assessment for the Detection of Positive and Recovered COVID-19 Patients. J. Voice 2021. [Google Scholar] [CrossRef]
- Asci, F.; Costantini, G.; Di Leo, P.; Zampogna, A.; Ruoppolo, G.; Berardelli, A.; Saggio, G.; Suppa, A. Machine-Learning Analysis of Voice Samples Recorded through Smartphones: The Combined Effect of Ageing and Gender. Sensors 2020, 20, 5022. [Google Scholar] [CrossRef]
- Costantini, G.; Parada-Cabaleiro, E.; Casali, D.; Cesarini, V. The Emotion Probe: On the Universality of Cross-Linguistic and Cross-Gender Speech Emotion Recognition via Machine Learning. Sensors 2022, 22, 2461. [Google Scholar] [CrossRef]
- Saggio, G.; Costantini, G. Worldwide Healthy Adult Voice Baseline Parameters: A Comprehensive Review. J. Voice 2022, 36, 637–649. [Google Scholar] [CrossRef] [PubMed]
- Fabbri, M.; Guimarães, I.; Cardoso, R.; Coelho, M.; Guedes, L.C.; Rosa, M.M.; Godinho, C.; Abreu, D.; Gonçalves, N.; Antonini, A.; et al. Speech and Voice Response to a Levodopa Challenge in Late-Stage Parkinson’s Disease. Front. Neurol. 2017, 8, 432. [Google Scholar] [CrossRef] [PubMed]
- Im, H.; Adams, S.; Abeyesekera, A.; Pieterman, M.; Gilmore, G.; Jog, M. Effect of Levodopa on Speech Dysfluency in Parkinson’s Disease. Mov. Disord. Clin. Pract. 2019, 6, 150–154. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pah, N.D.; Motin, M.A.; Kempster, P.; Kumar, D.K. Detecting Effect of Levodopa in Parkinson’s Disease Patients Using Sustained Phonemes. IEEE J. Transl. Eng. Health Med. 2021, 9, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Pinho, P.; Monteiro, L.; Soares, M.F.d.P.; Tourinho, L.; Melo, A.; Nóbrega, A.C. Impact of levodopa treatment in the voice pattern of Parkinson’s disease patients: A systematic review and meta-analysis. Codas 2018, 30, e20170200. [Google Scholar] [CrossRef]
- Baumann, A.; Nebel, A.; Granert, O.; Giehl, K.; Wolff, S.; Schmidt, W.; Baasch, C.; Schmidt, G.; Witt, K.; Deuschl, G.; et al. Neural Correlates of Hypokinetic Dysarthria and Mechanisms of Effective Voice Treatment in Parkinson Disease. Neurorehabil. Neural Repair 2018, 32, 1055–1066. [Google Scholar] [CrossRef]
- Ishikawa, K.; Rao, M.B.; MacAuslan, J.; Boyce, S. Application of a Landmark-Based Method for Acoustic Analysis of Dysphonic Speech. J. Voice 2020, 34, 645.e11–e645.e18. [Google Scholar] [CrossRef]
- Costantini, G.; Di Leo, P.; Asci, F.; Zarezadeh, Z.; Marsili, L.; Errico, V.; Suppa, A.; Saggio, G. Machine Learning based Voice Analysis in Spasmodic Dysphonia: An Investigation of Most Relevant Features from Specific Vocal Tasks. In Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies BIOSTEC, Vienna, Austria, 11–13 February 2021. [Google Scholar]
- Cesarini, V.; Casiddu, N.; Porfirione, C.; Massazza, G.; Saggio, G.; Costantini, G. A Machine Learning-Based Voice Analysis for the Detection of Dysphagia Biomarkers. In Proceedings of the 2021 IEEE International Workshop on Metrology for Industry 4.0 IoT (MetroInd4.0 IoT), Rome, Italy, 7–9 June 2021; pp. 407–411. [Google Scholar] [CrossRef]
- Anthes, E. Alexa, do I have COVID-19? Nature 2020, 586, 22–25. [Google Scholar] [CrossRef]
- Alam, M.Z.; Simonetti, A.; Brillantino, R.; Tayler, N.; Grainge, C.; Siribaddana, P.; Nouraei, S.A.R.; Batchelor, J.; Rahman, M.S.; Mancuzo, E.V.; et al. Predicting Pulmonary Function from the Analysis of Voice: A Machine Learning Approach. Front. Digit. Health 2022, 4, 750226. [Google Scholar] [CrossRef]
- Aftab, A.; Morsali, A.; Ghaemmaghami, S.; Champagne, B. Light-SERNet: A lightweight fully convolutional neural network for speech emotion recognition. arXiv 2021, arXiv:2110.03435. Available online: http://arxiv.org/abs/2110.03435 (accessed on 17 February 2022).
- Gómez-Vilda, P.; Gómez-Rodellar, A.; Palacios-Alonso, D.; Rodellar-Biarge, V.; Álvarez-Marquina, A. The Role of Data Analytics in the Assessment of Pathological Speech—A Critical Appraisal. Appl. Sci. 2022, 12, 11095. [Google Scholar] [CrossRef]
- Anand, A.; Haque, M.A.; Alex, J.S.R.; Venkatesan, N. Evaluation of Machine learning and Deep learning algorithms combined with dimentionality reduction techniques for classification of Parkinson’s Disease. In Proceedings of the 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Louisville, KY, USA, 6–8 December 2018; pp. 342–347. [Google Scholar] [CrossRef]
- Kuresan, H.; Samiappan, D.; Jeevan, A.; Gupta, S. Performance Study of ML Models and Neural Networks for Detection of Parkinson Disease using Dysarthria Symptoms. Eur. J. Mol. Clin. Med. 2021, 8, 767–779. [Google Scholar]
- Ul Haq, A.; Li, J.; Memon, M.H.; Khan, J.; Din, S.U.; Ahad, I.; Sun, R.; Lai, Z. Comparative Analysis of the Classification Performance of Machine Learning Classifiers and Deep Neural Network Classifier for Prediction of Parkinson Disease. In Proceedings of the 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 14–16 December 2018; pp. 101–106. [Google Scholar]
- Caliskan, A.; Badem, H.; Basturk, A.; Yüksel, M. Diagnosis of the Parkinson disease by using deep neural network classifier. Istanb. Univ. - J. Electr. Electron. Eng. 2017, 17, 3311–3318. [Google Scholar]
- Gunduz, H. Deep Learning-Based Parkinson’s Disease Classification Using Vocal Feature Sets. IEEE Access 2019, 7, 115540–115551. [Google Scholar] [CrossRef]
- Pramanik, M.; Pradhan, R.; Nandy, P.; Bhoi, A.K.; Barsocchi, P. Machine Learning Methods with Decision Forests for Parkinson’s Detection. Appl. Sci. 2021, 11, 581. [Google Scholar] [CrossRef]
- Sahu, L.; Sharma, R.; Sahu, I.; Das, M.; Sahu, B.; Kumar, R. Efficient detection of Parkinson’s disease using deep learning techniques over medical data. Expert Syst. 2022, 39, e12787. [Google Scholar] [CrossRef]
- Varalakshmi, P.; Priya, B.T.; Rithiga, B.A.; Bhuvaneaswari, R. Parkinson Disease Detection Based on Speech Using Various Machine Learning Models and Deep Learning Models. In Proceedings of the 2021 International Conference on System, Computation, Automation and Networking (ICSCAN), Puducherry, India, 30–31 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Yousif, N.R.; Balaha, H.M.; Haikal, A.Y.; El-Gendy, E.M. A generic optimization and learning framework for Parkinson disease via speech and handwritten records. J. Ambient Intell. Humaniz. Comput. 2022, 1–21. [Google Scholar] [CrossRef]
- Zahid, L.; Maqsood, M.; Durrani, M.Y.; Bakhtyar, M.; Baber, J.; Jamal, H.; Mehmood, I.; Song, O.-Y. A Spectrogram-Based Deep Feature Assisted Computer-Aided Diagnostic System for Parkinson’s Disease. IEEE Access 2020, 8, 35482–35495. [Google Scholar] [CrossRef]
- Anudeep, P.; Mourya, P.; Anandhi, T. Parkinson’s Disease Detection Using Machine Learning Techniques. In Advances in Electronics, Communication and Computing; Springer Nature: Singapore, 2021; Available online: https://www.springerprofessional.de/en/parkinson-s-disease-detection-using-machine-learning-techniques/18809718 (accessed on 18 January 2023).
- Majda-Zdancewicz, E.; Potulska-Chromik, A.; Jakubowski, J.; Nojszewska, M.; Kostera-Pruszczyk, A. Deep learning vs feature engineering in the assessment of voice signals for diagnosis in Parkinson’s disease. Bull. Pol. Acad. Sciences. Tech. Sci. 2021, 69, e137347. [Google Scholar] [CrossRef]
- Quan, C.; Ren, K.; Luo, Z.; Chen, Z.; Ling, Y. End-to-end deep learning approach for Parkinson’s disease detection from speech signals. Biocybern. Biomed. Eng. 2022, 42, 556–574. [Google Scholar] [CrossRef]
- Goyal, J.; Khandnor, P.; Aseri, T.C. A Hybrid Approach for Parkinson’s Disease diagnosis with Resonance and Time-Frequency based features from Speech signals. Expert Syst. Appl. 2021, 182, 115283. [Google Scholar] [CrossRef]
- Amato, F.; Saggio, G.; Cesarini, V.; Olmo, G.; Costantini, G. Machine Learning- and Statistical-based Voice Analysis of Parkinson’s Disease Patients: A Survey. Expert Syst. Appl. 2023, 219, 119651. [Google Scholar] [CrossRef]
- Jeancolas, L.; Mangone, G.; Petrovska-Delacrétaz, D.; Benali, H.; Benkelfat, B.-E.; Arnulf, I.; Corvol, J.-C.; Vidailhet, M.; Lehéricy, S. Voice characteristics from isolated rapid eye movement sleep behavior disorder to early Parkinson’s disease. Park. Relat. Disord. 2022, 95, 86–91. [Google Scholar] [CrossRef]
- Hireš, M.; Gazda, M.; Drotár, P.; Pah, N.D.; Motin, M.A.; Kumar, D.K. Convolutional neural network ensemble for Parkinson’s disease detection from voice recordings. Comput. Biol. Med. 2022, 141, 105021. [Google Scholar] [CrossRef]
- Er, M.B.; Isik, E.; Isik, I. Parkinson’s detection based on combined CNN and LSTM using enhanced speech signals with Variational mode decomposition. Biomed. Signal Process. Control 2021, 70, 103006. [Google Scholar] [CrossRef]
- Govindu, A.; Palwe, S. Early detection of Parkinson’s disease using machine learning. Procedia Comput. Sci. 2023, 218, 249–261. [Google Scholar] [CrossRef]
- Carrón, J.; Campos-Roca, Y.; Madruga, M.; Pérez, C.J. A mobile-assisted voice condition analysis system for Parkinson’s disease: Assessment of usability conditions. Biomed. Eng. Online 2021, 20, 114. [Google Scholar] [CrossRef]
- Postuma, R.B.; Gagnon, J.-F.; Bertrand, J.-A.; Génier Marchand, D.; Montplaisir, J.Y. Parkinson risk in idiopathic REM sleep behavior disorder. Neurology 2015, 84, 1104–1113. [Google Scholar] [CrossRef] [Green Version]
- Asci, F.; Costantini, G.; Saggio, G.; Suppa, A. Fostering Voice Objective Analysis in Patients with Movement Disorders. Mov. Disord. 2021, 36, 1041. [Google Scholar] [CrossRef]
- Suppa, A.; Asci, F.; Saggio, G.; Marsili, L.; Casali, D.; Zarezadeh, Z.; Ruoppolo, G.; Berardelli, A.; Costantini, G. Voice analysis in adductor spasmodic dysphonia: Objective diagnosis and response to botulinum toxin. Park. Relat. Disord. 2020, 73, 23–30. [Google Scholar] [CrossRef]
- Boersma, P.; Weenink, D. Praat: Doing Phonetics by Computer [Computer Program]. 2021. Version 6.20.06. Available online: https://www.praat.org. (accessed on 23 January 2022).
- Zawawi, S.A.; Hamzah, A.A.; Majlis, B.Y.; Mohd-Yasin, F. A Review of MEMS Capacitive Microphones. Micromachines 2020, 11, 484. [Google Scholar] [CrossRef] [PubMed]
- Marsano-Cornejo, M.-J.; Roco-Videla, Á. Comparison of the Acoustic Parameters Obtained with Different Smartphones and a Professional Microphone. Acta Otorrinolaringol. (Engl. Ed.) 2022, 73, 51–55. [Google Scholar] [CrossRef] [PubMed]
- Fahed, V.S.; Doheny, E.P.; Busse, M.; Hoblyn, J.; Lowery, M.M. Comparison of Acoustic Voice Features Derived from Mobile Devices and Studio Microphone Recordings. J. Voice 2022. [Google Scholar] [CrossRef] [PubMed]
- Pohjalainen, J.; Fabien Ringeval, F.; Zhang, Z.; Schuller, B. Spectral and Cepstral Audio Noise Reduction Techniques in Speech Emotion Recognition. In Proceedings of the 24th ACM International Conference on Multimedia; ACM: Amsterdam, The Netherlands, 2016; pp. 670–674. [Google Scholar]
- Chen, B.; Kou, H.; Hou, B.; Zhou, Y. Music Feature Extraction Method Based on Internet of Things Technology and Its Application. Comput. Intell. Neurosci. 2022, 2022, e8615152. [Google Scholar] [CrossRef] [PubMed]
- Student. The Probable Error of a Mean. Biometrika 1908, 6, 1–25. [Google Scholar] [CrossRef]
- Pearson’s Correlation Coefficient. In Encyclopedia of Public Health; Kirch, W. (Ed.) Springer: Dordrecht, The Netherlands, 2008; pp. 1090–1091. [Google Scholar] [CrossRef]
- Tsanas, A. Accurate telemonitoring of Parkinson’s disease symptom severity using nonlinear speech signal processing and statistical machine learning. Ph.D. Thesis, Oxford University, Oxford, UK, 2016. Available online: https://ora.ox.ac.uk/objects/uuid:2a43b92a-9cd5-4646-8f0f-81dbe2ba9d74 (accessed on 20 August 2022).
- Tsanas, A.; Little, M.; Mcsharry, P.; Ramig, L. New nonlinear markers and insights into speech signal degradation for effective tracking of Parkinson’s disease symptom severity. In Proceedings of the International Symposium on Nonlinear Theory and Its Applications (NOLTA), Krakow, Poland, 5–8 September 2010. [Google Scholar]
- Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson’s disease symptom severity. J. R. Soc. Interface 2011, 8, 842–855. [Google Scholar] [CrossRef] [Green Version]
- Brückl, M. Vocal Tremor Measurement Based on Autocorrelation of Contours. In Proceedings of the ISCA’s 13th Annual Conference, Portland, OR, USA, 9–13 September 2012. [Google Scholar]
- Brückl, M. Measurement of Tremor in the Voices of Speakers with Parkinson’s Disease. In Proceedings of the International Conference on Natural Language and Speech Processing, Algiers, Algeria, 18–19 October 2015. [Google Scholar]
- Brückl, M. Acoustic Tremor Measurement: Comparing Two Systems. In Proceedings of the International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA 2017), Firenze, Italy, 13–15 December 2017. [Google Scholar]
- Jadoul, Y.; Thompson, B.; de Boer, B. Introducing Parselmouth: A Python interface to Praat. J. Phon. 2018, 71, 1–15. [Google Scholar] [CrossRef] [Green Version]
- Wang, M.; Wen, Y.; Mo, S.; Yang, L.; Chen, X.; Luo, M.; Yu, H.; Xu, F.; Zou, X. Distinctive acoustic changes in speech in Parkinson’s disease. Comput. Speech Lang. 2022, 75, 101384. [Google Scholar] [CrossRef]
- Antoniadou, I.; Manson, G.; Dervilis, N.; Barszcz, T.; Staszewski, W.J.; Worden, K. Use of the Teager-Kaiser energy operator for condition monitoring of a wind turbine gearbox. In Proceedings of the ISMA2012 including USD2012, Leuven, Belgium, 17–19 September 2012. [Google Scholar]
- Jović, A.; Brkić, K.; Bogunović, N. A review of feature selection methods with applications. In Proceedings of the 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 25–29 May 2015; pp. 1200–1205. [Google Scholar]
- Hall, M. Correlation-Based Feature Selection for Machine Learning. Ph.D. Thesis, Department of Computer Science, The University of Waikato, Hamilton, New Zealand, 2000. [Google Scholar]
- Dechter, R.; Pearl, J. Generalized best-first search strategies and the optimality of A*. J. ACM 1985, 32, 505–536. [Google Scholar] [CrossRef]
- Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 2005, 3, 185–205. [Google Scholar] [CrossRef]
- Tsanas, A.; Little, M.A.; McSharry, P.E. A methodology for the analysis of medical data. In Handbook of Systems and Complexity in Health; Springer: New York, NY, USA, 2013. [Google Scholar]
- Mei, J.; Desrosiers, C.; Frasnelli, J. Machine Learning for the Diagnosis of Parkinson’s Disease: A Review of Literature. Front. Aging Neurosci. 2021, 13, 633752. [Google Scholar] [CrossRef]
- Mockus, J. Bayesian Approach to Global Optimization: Theory and Applications; Kluwer Academic Publishers: Amsterdam, The Netherlands, 1989. [Google Scholar]
- Gelbart, M.A.; Snoek, J.; Adams, R.P. Bayesian Optimization with Unknown Constraints. arXiv 2014. [Google Scholar] [CrossRef]
- Huzaifah, M. Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks. arXiv 2017, arXiv:1706.07156. Available online: http://arxiv.org/abs/1706.07156 (accessed on 2 December 2022).
- Salamon, J.; Bello, J.P. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification. IEEE Signal Process. Lett. 2017, 24, 279–283. [Google Scholar] [CrossRef]
- Monson, B.B.; Hunter, E.J.; Lotto, A.J.; Story, B.H. The perceptual significance of high-frequency energy in the human voice. Front. Psychol. 2014, 5, 587. [Google Scholar] [CrossRef] [Green Version]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
- McFee, B.; Raffel, C.; Liang, D.; Ellis, D.; McVicar, M.; Battenberg, E.; Nieto, O. librosa: Audio and Music Signal Analysis in Python. In Proceedings of the 14th Python in Science Conference, Austin, TX, USA, 6–12 July 2015; pp. 18–24. [Google Scholar] [CrossRef] [Green Version]
- Gómez-García, J.A.; Moro-Velázquez, L.; Arias-Londoño, J.D.; Godino-Llorente, J.I. On the design of automatic voice condition analysis systems. Part III: Review of acoustic modelling strategies. Biomed. Signal Process. Control 2021, 66, 102049. [Google Scholar] [CrossRef]
- Biagetti, G.; Crippa, P.; Falaschetti, L.; Tanoni, G.; Turchetti, C. A comparative study of machine learning algorithms for physiological signal classification. Procedia Comput. Sci. 2018, 126, 1977–1984. [Google Scholar] [CrossRef]
- Hasan, H.; Shafri, H.Z.M.; Habshi, M. A Comparison between Support Vector Machine (SVM) and Convolutional Neural Network (CNN) Models for Hyperspectral Image Classification. IOP Conf. Ser. Earth Environ. Sci. 2019, 357, 012035. [Google Scholar] [CrossRef] [Green Version]
Study | Dataset | Classification Approach | Reported Results (ACC) | Notes and Limitations |
---|---|---|---|---|
Jeancolas et al., 2022 [53] | 256 (117 PD) | SVM | 79.5% | Also takes into account RBD patients (ACC = 63%). The features extracted are not detailed and in general, it is too little a subset. |
Hireš et al., 2022 [54] | 100 (50 PD) | CNN | 99% (vowel /a/) | Small dataset (PC-GITA). Only vowel tasks are considered, with /a/ being reported as the most effective. |
Er et al., 2021 [55] | 100 (50 PD) | CNN and LSTM | 98.5% | Small dataset (PC-GITA). Several pre-trained nets are employed, especially ResNet variants. |
Govindu et al., 2023 [56] | 149 (100 PD) | SVM, linear regression, Random Forest, KNN | 91.8% (Random Forest) | Small, unbalanced dataset consisting of just a few speech features (no audio). Upsampling was used to address imbalance and wrangling was used to infer missing attributes. |
Carrón et al., 2021 [57] | UEX (60 total, 30 PD) and mPower (1060 PD) | Gradient Boosting, Logistic Regression, Passive Aggressive, MLP, Random Forest, SVM | 92% (UEX), 71% (m-Power) | The mPower dataset is crowdsourced, non-validated and self-reported. On the other hand, the proposed UEX dataset is very small (30 PD). Only 33 features are used, including the sex of the subject. |
Feature Family | Number of Features | Brief Description | ID |
---|---|---|---|
Fundamental Frequency | 2 | Lowest frequency of the quasi-periodic vocal signal, which represents the vibration frequency of the vocal folds | F0 |
Jitter | 22 | Variability/perturbation of the fundamental frequency | Jitter |
Shimmer | 22 | Voice amplitude perturbation | Shimmer |
HNR/NHR | 4 | Harmonic-to-noise ratio | HNR, NHR |
Mel Frequency Cepstral Coefficients | 82 | Cepstral coefficients that estimate the filtering effects of the vocal tract on the sustained emission | MFCC |
Vocal Formants | 96 | Vocal tract resonance frequencies, which are related to tongue position and vocal tract morphology | F1, F2, F3, F4, F5 |
Detrended Fluctuation Analysis | 1 | An estimate of the turbulent air-flow that traverses the vocal tract | DFA |
Recurrence Period Density Entropy | 1 | This measures the stability of the oscillation produced by the vocal folds evaluating the periodicity of the signal | RPDE |
Pitch Period Entropy | 1 | This evaluates the stability of the intonation (pitch) during the emission of a sustained vowel without being confused by the microtremor present even in healthy voices | PPE |
Wavelet Decomposition Measures | 182 | Signal decomposition through the discrete wavelet transform (DWT) for the purposes of calculating the energy present in the various frequency sub-bands | WavDec_det (detailed coefficient) WavDec_app (approximate coefficient) |
Empirical Mode Decomposition Excitation Ratio (EMD-ER) | 6 | This decomposes the signal through the intrinsic mode functions (IMF) and analyzes them to quantify the noise due to an incomplete glottal closure through entropy and SNR measurements | IMF |
Glottis Quotient | 3 | A measure of the aperiodicity of the glottal cycle | GQ |
Glottal-to-Noise Excitation Ratio | 6 | An estimate of the noise caused by the incomplete closure of the vocal folds calculated by cross-correlating the envelopes of the glottal cycles | GNE |
Vocal Fold Excitation Ratio | 7 | This estimates noise unrelated to the vocal emission, similarly to GNE | VFER |
Low-Frequency Vocal Tremor | 18 | Parameters related to the unintentional low-frequency oscillations of the vocal fold and their amplitude | Trem |
Comparison | Selection Method | Number of Features | Classification Model | Accuracy |
---|---|---|---|---|
| CFS | 12 | KNN | 0.80 ± 0.008 |
IG | 100 | SVM | 0.74 ± 0.04 | |
mRMRS | 50 | SVM | 0.77 ± 0.008 | |
| CFS | 17 | NB | 0.82 ± 0.007 |
IG | 30 | SVM | 0.78 ± 0.16 | |
mRMRS | 70 | SVM | 0.83 ± 0.02 | |
| CFS | 17 | KNN | 0.85 ± 0.02 |
IG | 30 | NB | 0.79 ± 0.02 | |
mRMRS | 10 | NB | 0.78 ± 0.01 | |
| CFS | 10 | KNN | 0.79 ± 0.005 |
IG | 10 | NB | 0.66 ± 0.03 | |
mRMRS | 10 | NB | 0.69 ± 0.016 | |
| CFS | 21 | KNN | 0.61 ± 0.03 |
IG | 70 | KNN | 0.60 ± 0.01 | |
mRMRS | 10 | SVM | 0.60 ± 0.01 | |
| CFS | 21 | NB | 0.58 ± 0.01 |
IG | 100 | SVM | 0.54 ± 0.04 | |
mRMRS | 70 | KNN | 0.54 ± 0.03 |
Comparison | CFS | IG | mRMRS | |
---|---|---|---|---|
Binary Classifications | 1. Mid-Advanced PD vs. HC | 0.78 ± 0.09 | 0.73 ± 0.04 | 0.75 ± 0.05 |
2. Early PD vs. HC | 0.80 ± 0.05 | 0.74 ± 0.02 | 0.78 ± 0.04 | |
3. Mid-Advanced PD vs. Early PD | 0.84 ± 0.01 | 0.75 ± 0.02 | 0.75 ± 0.02 | |
4. Mid-Advanced PD ON vs. OFF L-dopa | 0.72 ± 0.05 | 0.56 ± 0.1 | 0.63 ± 0.06 | |
Average | 0.78 ± 0.05 | 0.70 ± 0.09 | 0.73 ± 0.07 | |
Multiclass Classifications | 5. MID-Advanced PD vs. Early PD vs. HC | 0.61 ± 0.01 | 0.57 ± 0.02 | 0.59 ± 0.02 |
6. Mid-Advanced PD ON vs. OFF L-dopa vs. HC | 0.55 ± 0.03 | 0.50 ± 0.04 | 0.50 ± 0.03 | |
Average | 0.57 ± 0.05 | 0.54 ± 0.05 | 0.55 ± 0.06 |
Comparison | KNN | SVM | NB | |
---|---|---|---|---|
Binary Classifications | 1. Mid-Advanced PD vs. HC | 0.75 ± 0.04 | 0.75 ± 0.02 | 0.74 ± 0.03 |
2. Early PD vs. HC | 0.79 ± 0.04 | 0.79 ± 0.04 | 0.78 ± 0.04 | |
3. Mid-Advanced PD vs. Early PD | 0.79 ± 0.05 | 0.79 ± 0.05 | 0.78 ± 0.03 | |
4. Mid-Advanced PD ON vs. OFF L-dopa | 0.69 ± 0.08 | 0.66 ± 0.08 | 0.67 ± 0.07 | |
Average | 0.76 ± 0.04 | 0.75 ± 0.06 | 0.74 ± 0.05 | |
Multiclass Classifications | 5. Mid-Advanced PD vs. Early PD vs. HC | 0.59 ± 0.03 | 0.60 ± 0.03 | 0.59 ± 0.02 |
6. Mid-Advanced PD ON vs. OFF L-dopa vs. HC | 0.53 ± 0.03 | 0.51 ± 0.02 | 0.54 ± 0.03 | |
Average | 0.56 ± 0.04 | 0.57 ± 0.05 | 0.56 ± 0.03 |
Comparison | Model | Acc | PPV | NPV | Sen | Spec | AUC | F1 Score |
1. Mid-Advanced PD vs. HC | KNN | 0.80 ± 0.01 | 0.79 ± 0.03 | 0.80 ± 0.02 | 0.80 ± 0.01 | 0.79 ± 0.02 | 0.87 ± 0.04 | 0.80 ± 0.03 |
CNN | 0.82 ± 0.07 | 0.87 ± 0.05 | 0.78 ± 0.06 | 0.75 ± 0.04 | 0.87 ± 0.05 | 0.83 ± 0.05 | 0.79 ± 0.05 | |
2. Early PD vs. HC | SVM | 0.83 ± 0.02 | 0.81 ± 0.01 | 0.83 ± 0.01 | 0.83 ± 0.03 | 0.82 ± 0.02 | 0.88 ± 0.05 | 0.82 ± 0.01 |
CNN | 0.70 ± 0.06 | 0.72 ± 0.04 | 0.75 ± 0.03 | 0.73 ± 0.04 | 0.66 ± 0.07 | 0.73 ± 0.02 | 0.70 ± 0.03 | |
3. Mid-Advanced PD vs. Early PD | KNN | 0.85 ± 0.02 | 0.77 ± 0.05 | 0.86 ± 0.02 | 0.83 ± 0.02 | 0.81 ± 0.03 | 0.91 ± 0.06 | 0.80 ± 0.04 |
CNN | 0.74 ± 0.09 | 0.75 ± 0.05 | 0.76 ± 0.06 | 0.69 ± 0.08 | 0.75 ± 0.07 | 0.75 ± 0.05 | 0.68 ± 0.05 | |
4. Mid-Advanced PD ON vs. OFF L-dopa | KNN | 0.79 ± 0.01 | 0.71 ± 0.02 | 0.87 ± 0.05 | 0.84 ± 0.01 | 0.75 ± 0.02 | 0.82 ± 0.03 | 0.77 ± 0.03 |
CNN | 0.53 ± 0.08 | 0.53 ± 0.06 | 0.57 ± 0.08 | 0.69 ± 0.05 | 0.37 ± 0.08 | 0.58 ± 0.05 | 0.65 ± 0.06 |
Comparison | Model | Macro-Acc | Macro-PPV | Macro-Sen | Macro-F1 Score |
---|---|---|---|---|---|
5. Mid-Advanced PD vs. Early PD vs. HC | KNN | 0.61 ± 0.03 | 0.61 ± 0.02 | 0.61 ± 0.03 | 0.60 ± 0.03 |
CNN | 0.62 ± 0.03 | 0.58 ± 0.03 | 0.57 ± 0.04 | 0.56 ± 0.03 | |
6. Mid-Advanced PD ON vs. OFF L-dopa vs. HC | NB | 0.58 ± 0.01 | 0.56 ± 0.01 | 0.57 ± 0.02 | 0.54 ± 0.01 |
CNN | 0.60 ± 0.03 | 0.58 ± 0.04 | 0.49 ± 0.05 | 0.53 ± 0.04 |
1. Mid-Advanced PD vs. HC | |||
Rank | CFS | mRMRS | IG |
1 | MFCC_std_8thDelta_delta | MFCC_std_8thDelta_delta | MFCC_std_10thDelta_delta |
2 | MFCC_std_11thDelta | WavDec_det_TKEO_mean_1_coef | MFCC_mean_5thDelta_delta |
3 | MFCC_std_1stCoef | MFCC_mean_deltaDeltaLogEnergy | MFCC_std_8thDelta_delta |
4 | VFER_SNR_TKEO | Shimmer__F0_abs_dif | MFCC_std_8thDelta |
5 | MFCC_std_10thCoef | GNE__std | MFCC_mean_6thDelta |
2.Early PD vs. HC | |||
Rank | CFS | mRMRS | IG |
1 | MFCC_std_4thDelta | WavDec_app_entropy_log_2_coef | Trem_ATrPS |
2 | WavDec_app_LT_entropy_log_9_coef | MFCC_mean__4thCoef | WavDec_det_Ed2_1_coef |
3 | IMF_NSR_entropy | WavDec_det_LT_TKEO_mean_3_coef | WavDec_app_entropy_log_6_coef |
4 | Trem_FTrCIP | WavDec_det_entropy_shannon_1_coef | WavDec_app_LT_entropy_shannon_1_coef |
5 | MFCC_std_1stDeltaDelta | MFCC_std_3rdCoef | WavDec_det_LT_entropy_shannon_1_coef |
3.Mid-Advanced PD vs. Early PD | |||
Rank | CFS | mRMRS | IG |
1 | MFCC_std_10thDelta | MFCC_std_10thDelta | MFCC_std_8thDelta |
2 | MFCC_std_10thDelta_delta | MFCC_mean_7thDelta_delta | MFCC_std_10thDelta |
3 | MFCC_std__10thCoef | GNE__std | MFCC_std_10thDelta_delta |
4 | GNE__std | Shimmer__F0_PQ3_generalised_Schoentgen | WavDec_app_LT_TKEO_mean_3_coef |
5 | MFCC_std__7thCoef | F0_slopeLinFit | MFCC_std_9thDelta |
4.Mid-Advanced PD On vs. Off L-dopa | |||
Rank | CFS | mRMRS | IG |
1 | Jitter__F0_PQ5_classical_Baken | MFCC_mean__6thCoef | WavDec_app_LT_TKEO_std_6_coef |
2 | F1_TKEO_mean | F0_slopeLinFit | Trem_AMoN |
3 | F5_rangePerc | F5_TKEO_perc95 | MFCC_std_11thDelta |
4 | WavDec_det_LT_entropy_shannon_2_coef | MFCC_std_2ndDelta | F4_perc5 |
5 | mean_MFCC_6thCoef | F1_perc5 | Jitter__F0_PQ11_classical_Schoentgen |
5.Mid-Advanced PD vs. Early PD vs. HC | |||
Rank | CFS | mRMRS | IG |
1 | MFCC_std_10thDelta_delta | MFCC_std_10thDelta | MFCC_std_10thDelta_delta |
2 | MFCC_std_10thDelta | MFCC_mean_7thDelta_delta | MFCC_std_8thDelta |
3 | GNE__std | F3__TKEO_slopeLinFit | MFCC_std_3rdDelta |
4 | WavDec_app_LT_TKEO_mean_3_coef | Shimmer__F0_PQ3_generalised_Schoentgen | MFCC_std_8thDelta_delta |
5 | Shimmer__F0_PQ3_generaised_Schoentgen | GNE__std | WavDec_app_LT_entropy_log_7_coef |
6. Mid-Advanced PD On vs. Off L-dopa vs. HC | |||
Rank | CFS | mRMRS | IG |
1 | Shimmer__F0_DB | Shimmer__F0_PQ11_classical_Schoentgen | MFCC_std_8thDelta_delta |
2 | Shimmer__F0_PQ5_classical_Schoentgen | MFCC_mean_2ndDelta_delta | MFCC_std_11thDelta |
3 | F0__TKEO_perc25 | Shimmer__F0_PQ11_classical_Baken | MFCC_std_10thDelta_delta |
4 | Shimmer__F0_abs_dif | Jitter__F0_TKEO_prc25 | WavDec_det_TKEO_std_1_coef |
5 | Shimmer__F0_TKEO_prc75 | Shimmer__F0_TKEO_prc75 | MFCC_std_8thDelta |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Costantini, G.; Cesarini, V.; Di Leo, P.; Amato, F.; Suppa, A.; Asci, F.; Pisani, A.; Calculli, A.; Saggio, G. Artificial Intelligence-Based Voice Assessment of Patients with Parkinson’s Disease Off and On Treatment: Machine vs. Deep-Learning Comparison. Sensors 2023, 23, 2293. https://doi.org/10.3390/s23042293
Costantini G, Cesarini V, Di Leo P, Amato F, Suppa A, Asci F, Pisani A, Calculli A, Saggio G. Artificial Intelligence-Based Voice Assessment of Patients with Parkinson’s Disease Off and On Treatment: Machine vs. Deep-Learning Comparison. Sensors. 2023; 23(4):2293. https://doi.org/10.3390/s23042293
Chicago/Turabian StyleCostantini, Giovanni, Valerio Cesarini, Pietro Di Leo, Federica Amato, Antonio Suppa, Francesco Asci, Antonio Pisani, Alessandra Calculli, and Giovanni Saggio. 2023. "Artificial Intelligence-Based Voice Assessment of Patients with Parkinson’s Disease Off and On Treatment: Machine vs. Deep-Learning Comparison" Sensors 23, no. 4: 2293. https://doi.org/10.3390/s23042293