Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features
Abstract
:1. Introduction
Research Gap and Contribution
- Inappropriate validation methodologies: Most studies suffer from data leakage issues where multiple recordings from the same subject appear in both training and testing sets, leading to artificially inflated and clinically irrelevant performance estimates.
- Recording-wise vs. subject-wise evaluation: The prevalent use of recording-wise cross-validation fails to reflect real-world clinical scenarios where models must generalize to completely unseen patients rather than new recordings from known subjects.
- Lack of methodological transparency: Many studies report unrealistic accuracies (>95%) without acknowledging the fundamental validation flaws that compromise the clinical applicability of their findings.
- Absence of standardized validation protocols: Inconsistent validation methodologies across studies make it impossible to fairly compare performance claims and establish reliable benchmark standards for clinical deployment.
- Limited clinical relevance: Most studies prioritize achieving high accuracy scores over developing methodologically sound approaches that provide realistic performance estimates for clinical applications.
- Implementation of methodologically rigorous validation: Development of a subject-wise cross-validation framework that prevents data leakage and provides clinically relevant performance estimates by ensuring complete separation of subjects between training and testing sets.
- Realistic performance assessment: Demonstration that proper validation methodology yields substantially different (and more realistic) results compared to flawed recording-wise approaches, providing honest assessments of clinical applicability.
- Robust feature selection validation: Implementation and comparative analysis of three distinct feature selection techniques using proper subject-wise cross-validation to identify truly generalizable vocal biomarkers.
- Clinical applicability focus: Comprehensive evaluation using both recording-wise and subject-wise metrics to provide performance estimates that reflect real-world deployment scenarios where patient-level decisions are required.
- Methodological template: Provision of a rigorous validation framework that can serve as a standard for future PD voice classification research, prioritizing methodological soundness over inflated performance claims.
2. Methodology
2.1. Data Acquisition and Characteristics
2.2. Subject-Wise Data Splitting and Preprocessing
- 16 PD subjects contributing 102 recordings
- 6 healthy control subjects contributing 34 recordings
- Total training instances: 136
- 7 PD subjects contributing 45 recordings
- 2 healthy control subjects contributing 14 recordings
- Total testing instances: 59
- 102 original PD recordings + 68 synthetic PD recordings = 170 PD instances
- 34 original control recordings + 136 synthetic control recordings = 170 control instances
- Total balanced training instances: 340
2.3. Subject-Wise Cross-Validation and Feature Selection
2.4. Classification of PD Using Stacked Ensemble Learning
- I.
- Support Vector Machine (SVM): Configured with a linear kernel and C = 1. SVM works by finding the hyperplane that best separates the data into classes, maximizing the margin between support vectors.
- II.
- K-Nearest Neighbors (KNN): Configured with 3 nearest neighbors. KNN classifies new samples based on the majority class of their k nearest neighbors in the feature space.
- III.
- Random Forest (RF): Configured with 300 estimators and a random state of 42. RF builds multiple decision trees and merges their predictions, reducing overfitting and improving generalization.
- IV.
- Decision Tree (DT): Configured with a maximum depth of 5 and the Gini impurity criterion. DT creates a model that predicts the target variable by learning simple decision rules from the features.
- SVM: linear kernel (effective for high-dimensional data), C = 1 (balanced regularization)
- KNN: k = 3 (provides robustness without oversmoothing decision boundaries)
- RF: n_estimators = 300 (sufficient diversity without excessive computational cost), random_state = 42 (reproducibility)
- DT: max_depth = 5 (prevents overfitting), criterion = “gini” (standard impurity measure)
- LR: max_iter = 1000 (ensures convergence), solver = “lbfgs” (efficient for multiclass problems)
Algorithm 1. Stacked Ensemble for Prediction |
Step 1: Split the data set • X = The attributes • Y = The Status Step 2: Balance the dataset using SMOTE Step 3: Split the Dataset into training and testing set • x_train, x_test, y_train, y_test, stratify y and test_size = 0.3 Step 4: Import stacking classifier from Sklearn Library Step 5: Import all the classifiers also from Sklearn Library • Import SVM • Import K-neighbor classifier • Import LR • Import RF Classifier Step 6: Initiate the hyper parameters of the Classifiers Step 7: Stacked the classifiers and initiate LR as the final estimator Step 8: Train the Stacked Ensemble with x_train and y_train Step 9: Predict using x_test Step 10: Print the Confusion matrix, Classification report and accuracy score. Step 11: End Abbreviations—SMOTE: Synthetic Minority Oversampling Technique; SVM: Support Vector Machine; LR: Logistic Regression; RF: Random Forest |
- I.
- Subject-wise split into training (22 subjects) and testing (9 subjects)
- II.
- SMOTE applied only to training set
- III.
- Feature selection performed using subject-wise cross-validation within training set
- IV.
- Hyperparameter optimization using subject-wise cross-validation within training set
- V.
- Final model training on complete balanced training set
- VI.
- Final evaluation on held-out test set (unseen subjects)
2.5. Implementation and Experimental Setup
2.6. Performance Evaluation
- (i)
- Accuracy: This measures the overall effectiveness of the developed system, and it is measured in percentage (%). Classification accuracy measures the classification accuracy of the system in terms of how the stroke and control cases are accurately classified, since the model is going to be used to predict, so once the accuracy of the model is good, the prediction performance will be accurately good. It is given mathematically by Equation (1)
- (ii)
- Recall: This is the ratio of the number of positive classes classified correctly to the total number of positive classes. It is given mathematically by Equation (2)
- (iii)
- Precision: It depicts the number of truth positive (positive classes) predicted that really belong to the positive class. It is given mathematically by Equation (3)
- (iv)
- F1 score: This is the harmonic mean of recall and precision. It is given mathematically by Equation (4)
- (v)
- Subject-wise accuracy: Percentage of subjects correctly classified (majority vote per subject)
3. Results
3.1. Subject-Wise Validation Results
- Cross-validation accuracy: 89.2 ± 4.3%
- Cross-validation precision: 88.7 ± 5.1%
- Cross-validation recall: 90.1 ± 3.8%
- Cross-validation F1-score: 89.4 ± 4.2%
3.2. Determination of the Optimal Features with Subject-Wise Validation
3.3. Held-Out Test Set Results (Unseen Subjects)
3.4. Systematic Analysis of Feature Selection Impact
3.5. Classifier Performance Comparison
3.6. Clinical Validation: Subject-Level Analysis
4. Discussion
Comparison with Recent Studies
5. Limitations of the Study
- I.
- Methodological Constraint: Subject-wise validation, while clinically appropriate, significantly reduces available training and testing data compared to recording-wise approaches.
- II.
- Statistical Power: The small number of test subjects (9) limits the statistical significance of our findings and requires replication with larger cohorts.
- III.
- Generalizability Concerns: Performance differences between cross-validation and held-out tests suggest that larger, more diverse datasets are needed for robust model development.
- IV.
- Class Imbalance at Subject Level: The uneven distribution of subjects between classes (particularly in the test set) affects the reliability of performance estimates.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yoo, I.; Alafaireet, P.; Marinov, M.; Pena-Hernandez, K.; Gopidi, R.; Chang, J.F.; Hua, L. Data mining in healthcare and biomedicine: A survey of the literature. J. Med. Syst. 2012, 36, 2431–2448. [Google Scholar] [CrossRef] [PubMed]
- Dev, S.; Wang, H.; Nwosu, C.S.; Jain, N.; Veeravalli, B.; John, D. A predictive analytics approach for stroke prediction using machine learning and neural networks. Healthc. Anal. 2022, 2, 100032. [Google Scholar] [CrossRef]
- Olawade, D.B.; Aderinto, N.; David-Olawade, A.C.; Egbon, E.; Adereni, T.; Popoola, M.R.; Tiwari, R. Integrating AI-driven wearable devices and biometric data into stroke risk assessment: A review of opportunities and challenges. Clin. Neurol. Neurosurg. 2024, 249, 108689. [Google Scholar] [CrossRef]
- Olawade, D.B.; David-Olawade, A.C.; Wada, O.Z.; Asaolu, A.J.; Adereni, T.; Ling, J. Artificial intelligence in healthcare delivery: Prospects and pitfalls. J. Med. Surg. Public Health 2024, 3, 100108. [Google Scholar] [CrossRef]
- Olawade, D.B.; Aderinto, N.; Olatunji, G.; Kokori, E.; David-Olawade, A.C.; Hadi, M. Advancements and applications of Artificial Intelligence in cardiology: Current trends and future prospects. J. Med. Surg. Public Health 2024, 3, 100109. [Google Scholar] [CrossRef]
- Challa, K.N.R.; Pagolu, V.S.; Panda, G.; Majhi, B. An improved approach for prediction of Parkinson’s disease using machine learning techniques. In Proceedings of the 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), Paralakhemundi, India, 3–5 October 2016. [Google Scholar]
- Wingate, J.; Kollia, I.; Bidaut, L.; Kollias, S. Unified deep learning approach for prediction of Parkinson’s disease. IET Image. Process. 2020, 14, 1980–1989. [Google Scholar] [CrossRef]
- Tsanas, A.; Little, M.A.; McSharry, P.E.; Spielman, J.; Ramig, L.O. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans. Biomed. Eng. 2012, 59, 1264–1271. [Google Scholar] [CrossRef]
- Vaiciukynas, E.; Verikas, A.; Gelzinis, A.; Bacauskiene, M. Detecting Parkinson’s disease from sustained phonation and speech signals. PLoS ONE 2017, 12, e0185613. [Google Scholar] [CrossRef] [PubMed]
- Sakar, B.E.; Serbes, G.; Sakar, C.O. Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson’s disease. PLoS ONE 2019, 14, e0214362. [Google Scholar]
- Saeed, F.; Al-Sarem, M.; Al-Mohaimeed, M.; Emara, A.; Boulila, W.; Alasli, M.; Ghabban, F. Enhancing Parkinson’s Disease Prediction Using Machine Learning and Feature Selection Methods. Comput. Mater. Contin. 2022, 71, 5639–5657. [Google Scholar] [CrossRef]
- Velmurugan, T.; Dhinakaran, J. A Novel Ensemble Stacking Learning Algorithm for Parkinson’s Disease Prediction. Math. Probl. Eng. 2022, 2022, 9209656. [Google Scholar] [CrossRef]
- Shibina, V.; Thasleema, T.M. Voice feature-based diagnosis of Parkinson’s disease using nature inspired squirrel search algorithm with ensemble learning classifiers. Iran J. Comput. Sci. 2025, 8, 393–406. [Google Scholar] [CrossRef]
- Ouhmida, A.; Saleh, S.; Ammar, A.; Raihani, A.; Cherradi, B. HEFS-MLDR: A novel hybrid ensemble feature selection framework for improved deep neural network architecture in the diagnosis of Parkinson’s disease. Multimed. Tools Appl. 2024, 83, 11235–11254. [Google Scholar] [CrossRef]
- Hadjaidji, E.; Korba, M.C.A.; Khelil, K. Improving detection of Parkinson’s disease with acoustic feature optimization using particle swarm optimization and machine learning. Mach. Learn. Sci. Technol. 2025, 6, 015026. [Google Scholar] [CrossRef]
- Dhanalakshmi, S.; Das, S.; Senthil, R. Speech features-based Parkinson’s disease classification using combined SMOTE-ENN and binary machine learning. Health Technol. 2024, 14, 393–406. [Google Scholar] [CrossRef]
- Pardo-Moreno, T.; García-Morales, V.; Suleiman-Martos, S.; Rivas-Domínguez, A.; Mohamed-Mohamed, H.; Ramos-Rodríguez, J.J.; Melguizo-Rodríguez, L.; González-Acedo, A. Current Treatments and New, Tentative Therapies for Parkinson’s Disease. Pharmaceutics 2023, 15, 770. [Google Scholar] [CrossRef]
- Kobylecki, C. Update on the diagnosis and management of Parkinson’s disease. Clin. Med. 2020, 20, 393–398. [Google Scholar] [CrossRef]
- Al Imran, A.; Rahman, A.; Kabir, M.H.; Rahim, M.S. The Impact of Feature Selection Techniques on the Performance of Predicting Parkinson’s Disease. Int. J. Inf. Technol. Comput. Sci. 2018, 11, 14–29. [Google Scholar] [CrossRef]
- Joloudari, J.H.; Hussain, S.; Nematollahi, M.A.; Bagheri, R.; Fazl, F.; Alizadehsani, R.; Lashgari, R.; Talukder, A. BERT-Deep CNN: State-of-the-Art for Sentiment Analysis of COVID-19 Tweets. Soc. Netw. Anal. Min. 2023, 13, 99. [Google Scholar] [CrossRef]
- Noroozi, Z.; Orooji, A.; Erfannia, L. Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction. Sci. Rep. 2023, 13, 22588. [Google Scholar] [CrossRef]
- Potharlanka, J.L.; M, N.B. Feature importance feedback with Deep Q process in ensemble-based metaheuristic feature selection algorithms. Sci. Rep. 2024, 14, 2923. [Google Scholar] [CrossRef] [PubMed]
- Singha, S.; Shenoy, P.P. An adaptive heuristic for feature selection based on complementarity. Mach. Learn. 2018, 107, 2027–2071. [Google Scholar] [CrossRef]
- Lee, S.; Kc, B.; Choeh, J.Y. Comparing performance of ensemble methods in predicting movie box office revenue. Heliyon 2020, 6, e04260. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Meng, Y.; Xiao, D. Prediction techniques of movie box office using neural networks and emotional mining. Sci. Rep. 2024, 14, 21209. [Google Scholar] [CrossRef]
- Polat, K.; Nour, M. Parkinson disease classification using one against all based support vector machine classifier. Med. Hypotheses 2020, 140, 109678. [Google Scholar] [CrossRef]
- Rana, B.; Juneja, A.; Agarwal, M.; Sinha, A.; Grobert, J.; Singh, H. Feature selection based machine learning classification of Parkinson’s disease. J. Med. Syst. 2019, 43, 302. [Google Scholar]
- Stamate, C.; Magoulas, G.D.; Küppers, S.; Nomikou, E.; Daskalopoulos, I.; Luchini, M.U.; Moussouri, T.; Roussos, G. The impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications. Healthc. Inform. Res. 2021, 27, 189–202. [Google Scholar]
- Rahman, S.M.S.B.; Maniar, H.; Datta, A.; Sharma, R. Machine Learning-based Early Diagnosis of Parkinson’s Disease using Voice Features. Proc. Int. Conf. Adv. Comput. Commun. Syst. 2021, 1016–1020. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Fernández, A.; García, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
- Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
- Saeys, Y.; Inza, I.; Larrañaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007, 23, 2507–2517. [Google Scholar] [CrossRef] [PubMed]
- Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [Google Scholar] [CrossRef]
- Naderalvojoud, B.; Hernandez-Boussard, T. Improving machine learning with ensemble learning on observational healthcare data. AMIA Annu. Symp. Proc. 2024, 2023, 521–529. [Google Scholar]
- Kuzudisli, C.; Bakir-Gungor, B.; Bulut, N.; Qaqish, B.; Yousef, M. Review of feature selection approaches based on grouping of features. PeerJ 2023, 11, e15666. [Google Scholar] [CrossRef]
- Soladoye, A.A.; Olawade, D.B.; Adeyanju, I.A.; Akpa, O.M.; Aderinto, N.; Owolabi, M.O. Optimizing stroke prediction using gated recurrent unit and feature selection in Sub-Saharan Africa. Clin. Neurol. Neurosurg. 2025, 249, 108761. [Google Scholar] [CrossRef]
- Chen, L.; Zhang, Y.; Song, G. Automated screening for Parkinson’s disease through acoustic analysis with artificial intelligence methods: A systematic review. Biomed. Signal Process. Control 2021, 70, 103001. [Google Scholar]
- Karan, B.; Sahu, S.S.; Mahto, K. Parkinson disease prediction using intrinsic mode function based features from speech signal. Biocybern. Biomed. Eng. 2020, 40, 249–264. [Google Scholar] [CrossRef]
- Hoops, S.; Nazem, S.; Siderowf, A.D.; Duda, J.E.; Xie, S.X.; Stern, M.B.; Weintraub, D. Validity of the MoCA and MMSE in the detection of MCI and dementia in Parkinson disease. Neurology 2009, 73, 1738–1745. [Google Scholar] [CrossRef]
- Müller, B.; Assmus, J.; Herlofson, K.; Larsen, J.P.; Tysnes, O.B. Importance of motor vs. non-motor symptoms for health-related quality of life in early Parkinson’s disease. Park. Relat. Disord. 2013, 19, 1027–1032. [Google Scholar] [CrossRef]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Dietterich, T.G. Ensemble methods in machine learning. In Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Zhou, Z.H. Ensemble Methods: Foundations and Algorithms; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
- Ali, L.; Javeed, A.; Noor, A.; Rauf, H.T.; Kadry, S.; Gandomi, A.H. Parkinson’s disease detection based on features refinement through L1 regularized SVM and deep neural network. Sci. Rep. 2024, 14, 1333. [Google Scholar]
- Cantürk, İ.; Karabiber, F. A machine learning system for the diagnosis of Parkinson’s disease from speech signals and its application to multiple speech signal types. Arab. J. Sci. Eng. 2016, 41, 5049–5059. [Google Scholar] [CrossRef]
- Rusz, J.; Hlavnicka, J.; Tykalova, T.; Novotny, M.; Dusek, P.; Sonka, K.; Ruzicka, E. Smartphone allows capture of speech abnormalities associated with high risk of developing Parkinson’s disease. IEEE Trans. Neural Syst. Rehabil. Eng. 2018, 26, 1495–1507. [Google Scholar] [CrossRef]
- Suppa, A.; Costantini, G.; Asci, F.; Di Leo, P.; Al-Wardat, M.S.; Di Lazzaro, G.; Scalise, S.; Pisani, A.; Saggio, G. Voice in Parkinson’s disease: A machine learning study. Front. Neurol. 2022, 15, 831428. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4765–4774. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
- Collins, G.S.; Reitsma, J.B.; Altman, D.G.; Moons, K.G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMJ 2015, 350, g7594. [Google Scholar] [CrossRef]
- Luo, W.; Phung, D.; Tran, T.; Gupta, S.; Rana, S.; Karmakar, C.; Shilton, A.; Yearwood, J.; Dimitrova, N.; Ho, T.B.; et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: A multidisciplinary view. J. Med. Internet Res. 2016, 18, e323. [Google Scholar] [CrossRef]
- Kuncheva, L.I. Combining Pattern Classifiers: Methods and Algorithms, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
- Polikar, R. Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 2006, 6, 21–45. [Google Scholar] [CrossRef]
- Tang, J.; Alelyani, S.; Liu, H. Feature selection for classification: A review. In Data Classification: Algorithms and Applications; CRC Press: Boca Raton, FL, USA, 2014; pp. 37–64. [Google Scholar]
- Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature selection: A data perspective. ACM Comput. Surv. 2017, 50, 1–45. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Poldrack, R.A.; Huckins, G.; Varoquaux, G. Establishment of best practices for evidence for prediction: A review. JAMA Psychiatry 2020, 77, 534–540. [Google Scholar] [CrossRef] [PubMed]
- Bzdok, D.; Altman, N.; Krzywinski, M. Statistics versus machine learning. Nat. Methods 2018, 15, 233–234. [Google Scholar] [CrossRef]
- Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson’s disease symptom severity. J. R. Soc. Interface 2011, 8, 842–855. [Google Scholar] [CrossRef]
- Rusz, J.; Tykalova, T.; Klempir, J.; Cmejla, R.; Ruzicka, E. Effects of dopaminergic replacement therapy on motor speech disorders in Parkinson’s disease: Longitudinal follow-up study on previously untreated patients. J. Neural Transm. 2016, 123, 379–387. [Google Scholar] [CrossRef]
- Arlot, S.; Celisse, A. A survey of cross-validation procedures for model selection. Stat. Surv. 2010, 4, 40–79. [Google Scholar] [CrossRef]
- Varoquaux, G.; Raamana, P.R.; Engemann, D.A.; Hoyos-Idrobo, A.; Schwartz, Y.; Thirion, B. Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines. Neuroimage 2017, 145, 166–179. [Google Scholar] [CrossRef]
- Rajkomar, A.; Dean, J.; Kohane, I. Machine learning in medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [Google Scholar] [CrossRef]
- Chen, J.H.; Asch, S.M. Machine learning and prediction in medicine—Beyond the peak of inflated expectations. N. Engl. J. Med. 2017, 376, 2507–2509. [Google Scholar] [CrossRef]
- Carron, J.; Campos-Roca, Y.; Madruga, M.; Pérez, C.J. A mobile-assisted voice condition analysis system for Parkinson’s disease: Assessment of usability conditions. Biomed. Eng. Online 2021, 20, 114. [Google Scholar] [CrossRef]
- Beam, A.L.; Kohane, I.S. Big data and machine learning in health care. JAMA 2018, 319, 1317–1318. [Google Scholar] [CrossRef]
S/N | Attributes | Description | Data Type |
---|---|---|---|
1 | MDVP:Fo (Hz) | Average vocal fundamental frequency | Numeric |
2 | MDVP:Fhi (Hz) | Maximum vocal fundamental frequency | Numeric |
3 | MDVP:Flo (Hz) | Minimum vocal fundamental frequency | Numeric |
4 | MDVP:Jitter (%) | MDVP jitter as percentage | Numeric |
5 | MDVP:Jitter (Abs) | MDVP jitter as absolute value in microseconds | Numeric |
6 | MDVP:RAP | MDVP Relative Amplitude Perturbation | Numeric |
7 | MDVP:PPQ | MDVP Period Perturbation Quotient | Numeric |
8 | Jitter:DDP | Difference of differences between cycles, divided by the average period | Numeric |
9 | MDVP:Shimmer | MDVP local shimmer | Numeric |
10 | MDVP:Shimmer (dB) | MDVP local shimmer in decibels | Numeric |
11 | Shimmer:APQ3 | 3 Point Amplitude Perturbation Quotient | Numeric |
12 | Shimmer:APQ5 | 5 Point Amplitude Perturbation Quotient | Numeric |
13 | MDVP:APQ | MDVP Amplitude Perturbation Quotient | Numeric |
14 | Shimmer:DDA | Average absolute difference between consecutive differences and the amplitude of consecutive period | Numeric |
15 | NHR | Noise to Harmonic Ratio | Numeric |
16 | HNR | Harmonics to Noise Ratio | Numeric |
17 | RPDE | Recurrence Period Density Entropy | Numeric |
18 | DFA | Detrended Fluctuation Analysis | Numeric |
19 | spread1 | Non-Linear measure of fundamental frequency | Numeric |
20 | spread2 | Non-Linear measure of fundamental frequency | Numeric |
21 | D2 | Correlation Dimension | Numeric |
22 | PPE | Pitch Period Entropy | Numeric |
23 | Status | Health Status: 1—Parkinson, 0—Healthy | Nominal |
Dataset Component | Subjects | PD Subjects | Control Subjects | Total Recordings |
---|---|---|---|---|
Training Set | 22 | 16 | 6 | 340 (with SMOTE) |
Testing Set | 9 | 7 | 2 | 59 (original) |
Total | 31 | 23 | 8 | 399 |
Ranking | Gain Ratio (Subject-Wise CV) | Kruskal–Wallis (Subject-Wise CV) | Performance Impact |
---|---|---|---|
1 | MDVP:Flo (Hz) | Spread1 | High |
2 | spread1 | PPE | High |
3 | MDVP:APQ | MDVP:APQ | Moderate |
4 | PPE | Spread2 | Moderate |
5 | NHR | MDVP:Jitter(Abs) | Moderate |
Metric | Stacked Ensemble | Individual Best (KNN) | Performance Difference |
---|---|---|---|
Recording-wise accuracy | 84.7% | 81.4% | +3.3% |
Subject-wise accuracy | 77.8% | 66.7% | +11.1% |
Precision | 82.2% | 78.9% | +3.3% |
Recall | 86.7% | 84.4% | +2.3% |
F1-score | 84.4% | 81.6% | +2.8% |
Feature Selection | Features Used | CV Accuracy | Test Accuracy | Clinical Interpretability |
---|---|---|---|---|
Top 5 (gain ratio) | 5 | 86.1 ± 3.2% | 82.2% | High |
Top 10 (gain ratio) | 10 | 89.2 ± 4.3% | 84.7% | High |
Top 5 (Kruskal–Wallis) | 5 | 84.8 ± 4.1% | 79.7% | Moderate |
Top 10 (Kruskal–Wallis) | 10 | 87.5 ± 3.8% | 83.1% | Moderate |
Forward search | Variable | 88.7 ± 3.9% | 83.9% | Moderate |
All features | 22 | 87.3 ± 4.7% | 82.5% | Low |
Algorithm | CV Accuracy | Test Accuracy | Test Precision | Test Recall | Test F1-Score |
---|---|---|---|---|---|
Stacked ensemble | 89.2 ± 4.3% | 84.7% | 82.2% | 86.7% | 84.4% |
K-nearest neighbor | 86.8 ± 5.1% | 81.4% | 78.9% | 84.4% | 81.6% |
Random forest | 85.2 ± 4.8% | 79.7% | 76.3% | 82.2% | 79.1% |
Support vector machine | 82.4 ± 6.2% | 76.3% | 73.7% | 77.8% | 75.7% |
Logistic regression | 79.1 ± 5.9% | 72.9% | 69.2% | 75.6% | 72.3% |
Decision tree | 77.6 ± 7.1% | 71.2% | 67.9% | 73.3% | 70.5% |
Subject Category | Subjects (n) | Correctly Classified | Accuracy | Clinical Confidence |
---|---|---|---|---|
PD Subjects | 7 | 6 | 85.7% | High |
Control subjects | 2 | 1 | 50.0% | Low |
Overall | 9 | 7 | 77.8% | Moderate |
Study | Year | Validation Method | Performance | Dataset |
---|---|---|---|---|
Current study (corrected) | 2025 | Subject-wise CV | 84.7% | UCI Parkinson’s |
Ali et al. [46] | 2024 | Subject-wise validation | 100% LOSO, 97.5% k-fold | Voice recordings |
Cantürk and Karabiber [47] | 2016 | Leave-One-Subject-Out | 57.5% LOSO | Multiple speech types |
Rusz et al. [48] | 2021 | Subject-wise validation | 82.4% subject-wise | mPower smartphone |
Suppa et al. [49] | 2022 | Clinical validation | 85.2% AUC | Professional recordings |
Typical studies (recording-wise) | Various | Recording-wise split | 95%+ | Various |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Omodunbi, B.A.; Olawade, D.B.; Awe, O.F.; Soladoye, A.A.; Aderinto, N.; Ovsepian, S.V.; Boussios, S. Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features. Diagnostics 2025, 15, 1467. https://doi.org/10.3390/diagnostics15121467
Omodunbi BA, Olawade DB, Awe OF, Soladoye AA, Aderinto N, Ovsepian SV, Boussios S. Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features. Diagnostics. 2025; 15(12):1467. https://doi.org/10.3390/diagnostics15121467
Chicago/Turabian StyleOmodunbi, Bolaji A., David B. Olawade, Omosigho F. Awe, Afeez A. Soladoye, Nicholas Aderinto, Saak V. Ovsepian, and Stergios Boussios. 2025. "Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features" Diagnostics 15, no. 12: 1467. https://doi.org/10.3390/diagnostics15121467
APA StyleOmodunbi, B. A., Olawade, D. B., Awe, O. F., Soladoye, A. A., Aderinto, N., Ovsepian, S. V., & Boussios, S. (2025). Stacked Ensemble Learning for Classification of Parkinson’s Disease Using Telemonitoring Vocal Features. Diagnostics, 15(12), 1467. https://doi.org/10.3390/diagnostics15121467