You are currently viewing a new version of our website. To view the old version click .
Engineering Proceedings
  • Proceeding Paper
  • Open Access

26 November 2024

Artificial Intelligence-Based Effective Detection of Parkinson’s Disease Using Voice Measurements †

,
,
,
and
1
Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, Karnataka, India
2
School of Computer Science and Engineering, VIT-AP University, Amaravati 522237, Andhra Pradesh, India
3
School of Electronics Engineering, VIT-AP University, Amaravati 522237, Andhra Pradesh, India
4
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram 522502, Andhra Pradesh, India
This article belongs to the Proceedings The 11th International Electronic Conference on Sensors and Applications

Abstract

Parkinson’s disease (PD) is a neurodegenerative illness that affects the central nervous system and leads to a gradual degeneration of neurons that results in movement slowness, mental health problems, speaking difficulties, etc. In the past 20 years, the frequency of PD has doubled. Global estimates revealed that over 8.5 million cases have been identified so far. Thus, early and accurate detection of PD is crucial for treatment. Traditional detection methods are subjective and prone to delays, as they are reliant on clinical evaluation and imaging. Alternatively, artificial intelligence (AI) has recently emerged as a transformative technology in the healthcare sector, showing decent and promising results. However, an effective algorithm needs to be investigated for the most accurate prediction of a particular disease. Thus, this paper explores the ability of different machine learning algorithms in regard to the effective detection of PD. A total of 26 algorithms were implemented using the Scikit-Learn library on the Oxford PD detection dataset. This is a collection of 195 voice measurements recorded from 31 individuals, of which 23 have PD. The implemented algorithms are logistic regression, decision tree, k-nearest neighbors, random forest, support vector machine, Gaussian naïve bayes, multi-layered perceptron (MLP), extreme gradient boosting, adaptive boosting, stochastic gradient descent, gradient boosting machine, extra tree classifier, light gradient boosting machine, categorical boosting, Bernoulli naïve bayes, complement naïve bayes, multinomial naïve bayes, histogram-based gradient boosting, nearest centroid, radius neighbors classifier, logistic regression with elastic net regularization, extreme learning machine, ridge classifier, huber classifier, perceptron classifier, and voting classifier. Among them, MLP outperformed the other algorithms with a testing accuracy of 95%, precision of 94%, sensitivity of 100%, F1 score of 97%, and AUC of 98%. Thus, it successfully discriminates healthy individuals from those with PD, thereby helping for accurate early detection of PD for new patients using their voice measurements.

1. Introduction

Parkinson’s disease (PD) is a long-standing degenerative disorder that progressively damages the brain, leading to the gradual worsening of nerve cells. It is characterized by a combination of motor and non-motor symptoms. The most common motor symptoms include tremors (involuntary shaking), muscle rigidity (stiffness), and others. These symptoms can significantly hinder a person’s capacity to perform everyday tasks, diminishing their overall quality of life. Besides motor symptoms, PD is linked to a variety of non-motor symptoms as well. These include difficulties with thinking and memory, emotional disorders, problems with sleep, feeling tired, and challenges with speaking. While PD is most commonly diagnosed in older adults, it can also affect younger individuals, a condition referred to as early-onset PD. Men have a greater likelihood of developing PD compared to women. Genetic factors play a role as well; individuals with a family history of PD are more likely to have this disease. Environmental factors also contribute to the risk of developing PD. Prolonged exposure to air pollution, pesticides, and certain solvents may increase the likelihood of developing PD. Over the last 25 years, the occurrence of PD has doubled, making it the fastest-growing neurological disorder worldwide. As the global population continues to age, this number is expected to rise, highlighting the need for effective methods of early detection, diagnosis, and management to mitigate the disease’s impact on individuals and healthcare systems.
Artificial intelligence (AI) is reshaping the healthcare sector by enhancing accuracy, efficiency, and accessibility in medical care. It is revolutionizing multiple facets of healthcare. Machine learning (ML) algorithms analyze vast amounts of complex medical data such as wearable sensor data, patient demographic information, and clinical trial data to detect patterns and generate predictions. This capability enables earlier disease detection, more accurate diagnoses, and customized treatment plans that improve patient outcomes. Moreover, AI-powered tools are reducing healthcare professionals’ workloads by automating routine tasks, streamlining administrative processes, and optimizing resource allocation, allowing providers to focus more on direct patient care. AI encompasses various types based on functionality and capability, such as Narrow AI (or Weak AI), General AI (or Artificial General Intelligence, AGI), and Super-Intelligent AI. Recent advancements in AI include Attention Mechanisms, Explainable AI (XAI), Federated Learning, and Generative AI, which encompasses models such as GANs, VAEs, Transformer-based models, Diffusion models, Autoregressive models, and Large Language Models (LLMs) [,]. As AI continues to evolve, it holds immense potential to bridge gaps in healthcare access, reduce costs, and enhance the quality of care on a global scale.
This paper explores the effectiveness of different ML algorithms in predicting PD, emphasizing early diagnosis and the development of a robust detection technique. These ML algorithms were realized using the Scikit-Learn library on the Oxford Parkinson’s Disease detection dataset []. The rest of the paper is ordered as follows: Related Works, Methodology, Results and Discussion, and Conclusion. Section 2 discusses Related Works on the early diagnosis and prediction of PD. Section 3 describes the execution flow for the entire work presented in the paper. Section 4 demonstrates the experimental results, whereas Section 5 covers the conclusions and future directions.

3. Methodology

The methodology for this research involves several steps, including data splitting, model training, evaluation, and hyperparameter tuning, as depicted in Figure 1. The dataset used is the Oxford Parkinson’s Disease detection dataset, which includes 22 features across 197 instances, containing biomedical voice measurements of individuals with and without PD. The target variable (‘status’) indicates the presence (1) or absence (0) of PD, with the features representing a comprehensive range of voice characteristics typically affected by PD, thereby facilitating effective disease detection. The dataset was divided randomly, with 80% being allocated for training and 20% being reserved for testing. The training set was used to train ML models while the testing set evaluated their performance. A total of 26 ML algorithms were implemented using the Scikit-Learn library in Python, with each algorithm being applied to the training data to predict the target variable. To optimize the models’ performance, hyperparameter tuning was performed using techniques such as grid search and random search, systematically testing different combinations of model parameters to determine the optimal configuration for the best predictive performance on the validation set. The effectiveness of each ML model was assessed using 5 evaluation metrics (accuracy, precision, sensitivity, area under the curve, and F1 score), providing a comprehensive evaluation of the models’ ability to detect PD accurately and minimize false positives.
Figure 1. Experimental workflow for PD detection.

4. Results and Discussion

In this experiment, the results achieved from various algorithms provide significant insights into the detection of PD and are detailed as follows. Logistic regression (LR) was implemented two times. The first time, it was directly implemented on the dataset. The second time, LR was implemented with ElasticNet regularization (LR + EN). The penalty was set to ‘elasticnet’ and the l1_ratio was set to 0.5, indicating a balance between L1 and L2 regularization. Decision tree (DT) and random forest (RF) were implemented with a maximum depth of 5 each. Implementing the K-nearest neighbors (KNNs) produced a testing accuracy of 95% when the nearest neighbors were set to 5. The Radius Neighbor Classifier (RNC) and nearest centroid (NC) could not perform well. The SVMs have shown a decent performance, training, and testing accuracy of 90% each. Four variants of Naïve Bayes were used in this experiment: Gaussian naïve bayes (GNB), Bernoulli naïve bayes (BNB), complement naïve bayes (CNB), and multinomial naïve bayes (MNB). However, they could not perform well on the dataset. Among multi-layered perceptron (MLP), perceptron classifier (PC), and extreme learning machine (ELM), MLP outperformed the others with a testing accuracy of 95%. Ridge classifier (RC) produced a testing accuracy of 92%. However, huber classifier (HC) and stochastic gradient descent (SGD) did not perform well and were not suitable for operational use in detecting PD. Among the boosting algorithms, extreme gradient boosting (XGB), gradient boosting machine (GBM), light gradient boosting machine (LGBM), categorical boosting (CB), and histogram gradient Boosting (HistGB) have outperformed most other classifiers. LR, DT, RF, KNN, SVM, MLP, XGB, adaptive boosting (ADB), SGD, GBM, extra tree classifier (ETC), LGBM, CB, HistGB, RNC, RC and LR+EN were used to implement the voting classifier (VC), which resulted in a testing accuracy of 95%, a precision of 94%, a sensitivity of 100%, an F1 score of 97%, and an AUC of 92%. It can be observed from Table 1 that MLP, XGB, GBM, ETC, LGBM, and CB produced almost similar results. They all achieved a training accuracy of 100%, a testing accuracy of 95%, a precision of 94%, a sensitivity of 100%, and an F1 score of 97%. However, the AUC made the difference, as shown in Figure 2. The XGB and GBM achieved an AUC of 93%, ETC and LGBM achieved an AUC of 95%, CB achieved an AUC of 97%, and MLP achieved an AUC of 98%. Generally, any classifier with an AUC of more than 80% can be considered good enough to contribute to the core findings of the research. However, those six classifiers achieved an AUC of more than 90%. The highest AUC was achieved for MLP with 98%, and its true positives were 32, true negatives were 5, false negatives were 0, and false positives were 2 as shown in Figure 3.
Table 1. Results achieved with various ML classifiers.
Figure 2. AUC of various classifiers with (a) MLP; (b) XGB; (c) GBM; (d) ETC; (e) LGBM; (f) CB.
Figure 3. Confusion Matrix for MLP, XGB, GBM, ETC, LGBM, and CB classifiers.

5. Conclusions

Early and precise detection of PD is critically important due to its progressive behavior and substantial impact on patients’ quality of life. This paper validates the potential of machine learning algorithms as effective tools for detecting PD using non-invasive voice measurements. A total of 26 ML algorithms were implemented on the Oxford PD detection dataset, comprising 195 voice measurements from 31 individuals. Among these algorithms, the multi-layered Perceptron (MLP) demonstrated superior performance, achieving a testing accuracy of 95%, a precision of 94%, a sensitivity of 100%, an F1 score of 97%, and an AUC of 98%. These metrics indicate that the MLP algorithm is highly effective in distinguishing between healthy individuals and those with PD based on voice data. The results highlight the significant potential of artificial intelligence (AI)-driven approaches in enhancing the early detection of PD. The use of voice measurements provides a non-invasive, cost-effective, and efficient means of screening large populations, which is particularly valuable given the increasing prevalence of the disease worldwide. Future work can be extended to validate these findings on larger, new diverse datasets and to investigate the integration of such AI models into clinical settings.

Author Contributions

Conceptualization, G.P.R.; methodology, G.P.R., D.R. and Y.V.P.K.; software, K.P.P.; formal analysis, M.S.; funding acquisition, Y.V.P.K.; investigation, G.P.R. and D.R.; Resources, K.P.P.; data curation, Y.V.P.K., M.S. and K.P.P.; supervision, Y.V.P.K.; validation, M.S.; visualization, D.R.; project administration, Y.V.P.K.; writing—original draft, G.P.R. and D.R.; writing—review and editing, Y.V.P.K., M.S. and K.P.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in Oxford Parkinson’s Disease Detection Dataset at https://archive.ics.uci.edu/dataset/174/parkinsons (accessed on 26 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Reddy, G.P.; Pavan Kumar, Y.V. A Beginner’s Guide to Federated Learning. In Proceedings of the 2023 Intelligent Methods, Systems, and Applications (IMSA), Giza, Egypt, 15 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 557–562. [Google Scholar] [CrossRef]
  2. Reddy, G.P.; Pavan Kumar, Y.V.; Prakash, K.P. Hallucinations in Large Language Models (LLMs). In Proceedings of the 2024 IEEE Open Conference of Electrical, Electronic and Information Sciences (eStream), Vilnius, Lithuania, 25 April 2024; IEEE: Piscataway, NJ, USA, 2014; pp. 1–6. [Google Scholar] [CrossRef]
  3. Little, M.A.; McSharry, P.E.; Hunter, E.J.; Spielman, J.; Ramig, L.O. Suitability of Dysphonia Measurements for Telemonitoring of Parkinson’s Disease. IEEE Trans. Biomed. Eng. 2009, 56, 1015–1022. [Google Scholar] [CrossRef] [PubMed]
  4. Solana-Lavalle, G.; Galán-Hernández, J.-C.; Rosas-Romero, R. Automatic Parkinson Disease Detection at Early Stages as a Pre-Diagnosis Tool by Using Classifiers and a Small Set of Vocal Features. Biocybern. Biomed. Eng. 2020, 40, 505–516. [Google Scholar] [CrossRef]
  5. Soumaya, Z.; Drissi Taoufiq, B.; Benayad, N.; Yunus, K.; Abdelkrim, A. The Detection of Parkinson Disease Using the Genetic Algorithm and SVM Classifier. Appl. Acoust. 2021, 171, 107528. [Google Scholar] [CrossRef]
  6. Karan, B.; Sahu, S.S.; Mahto, K. Parkinson Disease Prediction Using Intrinsic Mode Function Based Features from Speech Signal. Biocybern. Biomed. Eng. 2020, 40, 249–264. [Google Scholar] [CrossRef]
  7. Zhang, T.; Zhang, Y.; Sun, H.; Shan, H. Parkinson Disease Detection Using Energy Direction Features Based on EMD from Voice Signal. Biocybern. Biomed. Eng. 2021, 41, 127–141. [Google Scholar] [CrossRef]
  8. Hireš, M.; Gazda, M.; Drotár, P.; Pah, N.D.; Motin, M.A.; Kumar, D.K. Convolutional Neural Network Ensemble for Parkinson’s Disease Detection from Voice Recordings. Comput. Biol. Med. 2022, 141, 105021. [Google Scholar] [CrossRef] [PubMed]
  9. Srinivasan, S.; Ramadass, P.; Mathivanan, S.K.; Panneer Selvam, K.; Shivahare, B.D.; Shah, M.A. Detection of Parkinson Disease Using Multiclass Machine Learning Approach. Sci. Rep. 2024, 14, 13813. [Google Scholar] [CrossRef]
  10. Dadu, A.; Satone, V.; Kaur, R.; Hashemi, S.H.; Leonard, H.; Iwaki, H.; Makarious, M.B.; Billingsley, K.J.; Bandres-Ciga, S.; Sargent, L.J.; et al. Identification and Prediction of Parkinson’s Disease Subtypes and Progression Using Machine Learning in Two Cohorts. NPJ Park. Dis. 2022, 8, 172. [Google Scholar] [CrossRef] [PubMed]
  11. Hällqvist, J.; Bartl, M.; Dakna, M.; Schade, S.; Garagnani, P.; Bacalini, M.-G.; Pirazzini, C.; Bhatia, K.; Schreglmann, S.; Xylaki, M.; et al. Plasma Proteomics Identify Biomarkers Predicting Parkinson’s Disease up to 7 Years Before Symptom Onset. Nat. Commun. 2024, 15, 4759. [Google Scholar] [CrossRef]
  12. Ahmadi Rastegar, D.; Ho, N.; Halliday, G.M.; Dzamko, N. Parkinson’s Progression Prediction Using Machine Learning and Serum Cytokines. NPJ Park. Dis. 2019, 5, 14. [Google Scholar] [CrossRef]
  13. Yang, Y.; Yuan, Y.; Zhang, G.; Wang, H.; Chen, Y.-C.; Liu, Y.; Tarolli, C.G.; Crepeau, D.; Bukartyk, J.; Junna, M.R.; et al. Artificial Intelligence-Enabled Detection and Assessment of Parkinson’s Disease Using Nocturnal Breathing Signals. Nat. Med. 2022, 28, 2207–2215. [Google Scholar] [CrossRef]
  14. Balaji, E.; Brindha, D.; Balakrishnan, R. Supervised Machine Learning Based Gait Classification System for Early Detection and Stage Classification of Parkinson’s Disease. Appl. Soft Comput. 2020, 94, 106494. [Google Scholar] [CrossRef]
  15. Loh, H.W.; Ooi, C.P.; Palmer, E.; Barua, P.D.; Dogan, S.; Tuncer, T.; Baygin, M.; Acharya, U.R. GaborPDNet: Gabor Transformation and Deep Neural Network for Parkinson’s Disease Detection Using EEG Signals. Electronics 2021, 10, 1740. [Google Scholar] [CrossRef]
  16. Oliveira, G.C.; Ngo, Q.C.; Passos, L.A.; Papa, J.P.; Jodas, D.S.; Kumar, D. Tabular Data Augmentation for Video-Based Detection of Hypomimia in Parkinson’s Disease. Comput. Methods Programs Biomed. 2023, 240, 107713. [Google Scholar] [CrossRef] [PubMed]
  17. Allebawi, M.F.; Dhieb, T.; Neji, M.; Farhat, N.; Smaoui, E.; Hamdani, T.M.; Damak, M.; Mhiri, C.; Neji, B.; Beyrouthy, T.; et al. Parkinson’s Disease Detection From Online Handwriting Based on Beta-Elliptical Approach and Fuzzy Perceptual Detector. IEEE Access 2024, 12, 56936–56950. [Google Scholar] [CrossRef]
  18. Quan, C.; Ren, K.; Luo, Z. A Deep Learning Based Method for Parkinson’s Disease Detection Using Dynamic Features of Speech. IEEE Access 2021, 9, 10239–10252. [Google Scholar] [CrossRef]
  19. Wang, W.; Lee, J.; Harrou, F.; Sun, Y. Early Detection of Parkinson’s Disease Using Deep Learning and Machine Learning. IEEE Access 2020, 8, 147635–147646. [Google Scholar] [CrossRef]
  20. Noaman Kadhim, M.; Al-Shammary, D.; Sufi, F. A Novel Voice Classification Based on Gower Distance for Parkinson Disease Detection. Int. J. Med. Inform. 2024, 191, 105583. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.