PDD-ET: Parkinson’s Disease Detection Using ML Ensemble Techniques and Customized Big Dataset
Abstract
:1. Introduction
- To the best of our understanding, we are pioneers in constructing a customized big dataset encompassing diverse attributes from both individuals affected by Parkinson’s disease and those who are in good health. Furthermore, we are introducing the novel notion of the customized expansive dataset into PD detection and classification.
- To identify PD during its initial phases within the bodies of afflicted individuals to manage its progression, we employed premotor, cerebrospinal fluid (CF), and SPECT indicators to achieve a proficient and precise detection of PD.
- Employing ensemble techniques (ET) to categorize the distinct levels of PD.
- We extensively examine the latest machine learning and deep learning techniques alongside our proposed ensemble technique-driven PDD-ET model. Consequently, the assessed methods encompass shallow machine learning (SML), deep learning (DL), and ensemble learning (EL) approaches.
2. Customized Big PD Dataset and EL Technique
2.1. Customized Big PD Dataset (CBPDD)
- Demographic Information: Basic patient details such as age, gender, and ethnicity are often considered, as they can provide insights into potential risk factors.
- Medical History: Previous medical conditions, surgeries, medication history, and family history of PD or related neurological disorders are essential for understanding a patient’s overall health.
- Symptom Profiles: Detailed descriptions of motor symptoms like tremors, rigidity, bradykinesia (slowness of movement), and postural instability are fundamental indicators of PD.
- Non-Motor Symptoms: These include cognitive impairments, sleep disturbances, mood changes, loss of smell (anosmia), and autonomic dysfunctions.
- UPDRS Assessment: The Unified Parkinson’s Disease Rating Scale (UPDRS) finds extensive application as a commonly employed instrument to evaluate the severity of PD symptoms. It covers both motor and non-motor symptoms.
- Gait Analysis: Gait abnormalities are common in PD. Analyzing gait patterns and abnormalities can aid in early detection.
- Speech Patterns: PD often affects speech, leading to changes in volume, pitch, articulation, and rhythm. Speech analysis can provide valuable diagnostic insights.
- Fine Motor Skills: Assessments of handwriting, finger tapping speed, and dexterity can reveal motor impairments indicative of PD.
- Reaction Time and Movement Speed: Slowed reaction times and reduced movement speed can be early indicators of PD.
- Response to Levodopa: Observing how a patient responds to levodopa, a common PD medication, can help confirm the diagnosis.
- Self-Reported Questionnaires: Patients’ self-reported questionnaires about their quality of life, daily activities, and emotional state can contribute to behavioral data.
- Neuropsychological Testing: Assessments of cognitive functions like memory, attention, and executive functions can provide additional diagnostic information.
- Electrophysiological Data: Electroencephalography (EEG), electromyography (EMG), and other electrophysiological tests can reveal abnormal brain activity and muscle responses.
- Imaging Data: Neuroimaging techniques such as MRI, DAT scans, and PET scans can detect structural and functional changes in the brain associated with PD.
- MDVP: Fo(Hz)—Average vocal fundamental frequency;
- MDVP: Fhi(Hz)—Maximum vocal fundamental frequency;
- MDVP: Flo(Hz)—Minimum vocal fundamental frequency;
- MDVP: Jitter(%);
- MDVP: Jitter(Abs);
- MDVP: RAP;
- MDVP: PPQ;
- Jitter: DDP—Several measures of variation in fundamental frequency;
- MDVP: Shimmer;
- MDVP: Shimmer(dB);
- Shimmer: APQ3;
- Shimmer: APQ5;
- MDVP: APQ;
- Shimmer: DDA—Several measures of variation in amplitude NHR;
- HNR—Two measures of ratio of noise to tonal components in the voice status:
- –
- Health status of the subject (one)-Parkinson’s;
- –
- (zero)-healthy RPDE.
- D2—Two nonlinear dynamical complexity measures;
- DFA—Signal fractal scaling exponent spread1, spread2;
- PPE—Three nonlinear measures of fundamental frequency variation.
2.2. Ensemble Learning Technique (ELT)
3. System Architecture of Our Proposed PDD-ET Model
4. Proposed PDD-ET Model
- Initially, the customized big PD dataset is subjected to normalization using the zero-score normalization technique. Subsequently, a pre-processing step is implemented to eliminate any absent or inaccurate values from the training dataset before initiating the training of the proposed Parkinson’s Disease Detection Ensemble Technique (PDD-ET) model.
- After that, we divided the customized big dataset into training, testing, and validation datasets (80%:10%:10%) and fed them into the PDD-ET model.
- With this setup, we start the training procedure of the PDD-ET model.
- Finally, we evaluate the results through comparison with the predicted and observed outcomes.
4.1. Construction of the PDD-ET Model
4.2. Training of the PDD-ET Model
4.3. Deployment of the PDD-ET Model
- Adaptive Boosting algorithm [40]:
- Adaptive boosting algorithm is an ensemble technique used to improve weak models’ performance.
- Here, we designed three adaptive boosting algorithms with different models. The first boosting algorithm deals with the classification tree. The second and third boosting algorithms are concerned with different linear models.
- Random Forest algorithm [41]:
- Random forest (RF) also belongs to the class of ML ensemble techniques. It aggregates the results in a cluster form. RF proceeds using the bootstrap mechanism and de-correlates the classification trees via random splits during training.
- SVR algorithm [29]:
- SVR is an offshoot of support vector machines (SVM), originating from its principles. Employed to transform features from a lower-dimensional realm to a higher-dimensional one through the utilization of kernel functions, SVR constructs a hyperplane that maximizes optimization outcomes.
- LSTM algorithm [26]:
- LSTM is a variant of recurrent neural network (RNN) having the memorization capacity with its cell units [42]. It works efficiently by using its complex input, forget, and output gates at each layer of the hierarchy. The output of every layer is treated as the input of its next immediate layer during training.
- GRU algorithm [33]:
- GRU is a modified version of the LSTM neural network.
- It improves the complexity and gate architecture of standard LSTM by using update and reset gates.
- Stacked LSTM algorithm [31]:
- Stacked LSTM (SLSTM) is a special kind of ensemble technique to enhance the performance of a standard LSTM neural network. Inside this SLSTM neural network, the memory states (i.e., cells) are reset at each layer to obtain an accurate result at every step.
- Stacking architecture makes the model deeper and reaches for improved performance.
- The SLSTM neural network also reduces the accumulation and propagation errors for long-term detection.
- It also minimizes the computational complexity of the iterative strategy for long-term multi-step prognosis.
5. Result Analysis and Discussion
5.1. Implementation Details
5.2. Experimental Setup
5.3. Model Evaluation
Algorithm 1 PDD-ET: Parkinson’s Disease Detection using ML Ensemble Techniques and Customized Big Dataset |
|
Algorithm 2 Early PD Detection Criteria. |
|
5.4. Performance Analysis
5.5. PD Detection of PDD-ET Model
5.6. Comparing the PD Detection with Other ML/DL Models
5.7. Impact and Application of the PDD-ET Model
- Early and Accurate Diagnosis:
- ML ensemble techniques can lead to more accurate and dependable detection of Parkinson’s disease, including its initial phases when symptoms might be subtle.
- Early diagnosis enables timely intervention and treatment, potentially slowing the advancement of the condition and enhancing the quality of life for patients.
- Personalized Treatment:
- Accurate diagnosis allows for customized treatment strategies tailored to the unique condition of each individual patient.
- Healthcare professionals can prescribe targeted therapies and medications, reducing unnecessary side effects and optimizing treatment outcomes.
- Reduced Misdiagnosis:
- ML ensemble techniques can significantly decrease the rate of misdiagnosis, which is expected due to the complexity of Parkinson’s symptoms.
- This process reduces patient frustration and the risk of inappropriate treatments.
- Improved Monitoring and Progression Tracking:
- ML-based ensemble models can monitor patients’ symptoms and disease progression continuously.
- This process enables medical professionals to make informed adjustments to treatment plans as needed.
- Advancing Medical Research:
- Our designed customized big dataset contributes to a more comprehensive understanding of Parkinson’s disease.
- The dataset can be valuable for researchers investigating the disease’s genetic, environmental, and clinical factors.
- Facilitating Research Collaboration:
- Sharing our customized big dataset and methodologies can foster collaboration among researchers, enabling them to advance the field of Parkinson’s disease research collectively.
- Enhancing Medical Expertise:
- ML models can complement the expertise of medical professionals, providing them with an additional tool for accurate diagnosis.
- Medical professionals can focus more on patient care and treatment decisions.
- Data-Driven Insights:
- Analyzing our customized big dataset using ensemble techniques can reveal insights into the disease that might not be apparent through traditional methods.
- These insights could lead to new hypotheses and avenues of research.
- Public Health Impact:
- Improved diagnosis and treatment can positively impact public health by reducing the burden of Parkinson’s disease on healthcare systems and improving patients’ overall well-being.
- Ethical Considerations:
- Our work raises awareness about the ethical considerations of using AI in healthcare, encouraging discussions on data privacy, patient consent, and responsible AI deployment.
5.8. Discussion
- Early Detection and Diagnosis: The developed model can contribute to the early identification of Parkinson’s disease, allowing for timely intervention and treatment planning. This has the potential to enhance patient results and quality of life.
- Personalized Treatment: The PDD-ET model’s accurate detection can lead to increased customization of treatment strategies uniquely adapted to the individual patient’s condition and needs.
- Monitoring Disease Progression: The PDD-ET model can monitor disease progression over time, assisting clinicians in adjusting treatment strategies as needed.
- Assisting Medical Professionals: Clinicians can use the PDD-ET model’s predictions as an additional diagnostic tool, helping them make more informed decisions in conjunction with their expertise.
- Telemedicine and Remote Monitoring: The PDD-ET model can be integrated into telemedicine platforms, enabling remote monitoring of patients’ PD status and providing healthcare professionals with valuable insights for remote consultations.
- Clinical Trials and Research: The PDD-ET model can contribute to the recruitment and stratification of randomized participants, leading to more accurate research outcomes.
- Improved Performance: Further optimization and fine-tuning of ensemble models can enhance their accuracy and reliability in detecting early-stage PD.
- Incorporating Multi-Modal Data: Integration of multiple data sources, such as imaging, genetic, and wearable sensor data, can provide a more comprehensive understanding of PD and improve detection accuracy.
- Longitudinal Monitoring: Developing models that analyze changes in patient data over time can aid in predicting disease progression and treatment responses.
- Explainable AI: Enhancing the interpretability of the model’s predictions can increase its clinical acceptance by providing insights into the features driving the detection.
- Real-Time Monitoring: Creating models that can operate in real time can enable continuous monitoring of PD symptoms, allowing for rapid adjustments to treatment plans.
- Global Deployment: Scaling the model’s deployment across different healthcare systems and regions can ensure a broader impact on PD detection and patient care.
- Collaboration with Clinicians: Continued collaboration with medical professionals is essential for refining the model’s clinical relevance, validation, and integration into clinical workflows.
- Ethical Considerations: Addressing ethical concerns related to patient data privacy, model bias, and responsible AI usage is critical for the sustainable application of these techniques in clinical settings.
- Patient Empowerment: Developing patient-friendly tools that leverage the model’s predictions can empower individuals to monitor their PD status actively.
- Expanding the model’s capabilities to detect other neurological disorders can make it a versatile tool in clinical neurology.
6. Conclusions
7. Future Work
- Interdisciplinary Collaboration: Engage in collaborative efforts between data scientists, machine learning experts, and clinical professionals. This collaboration can help ensure that the model’s development aligns with clinical realities and requirements.
- Simplified Reporting: Create an intermediate layer that translates the model’s complex predictions into more understandable and actionable insights for clinicians. This layer could provide explanations for the model’s decisions and present the results in a format that clinicians are familiar with.
- User-Friendly Interface: Develop a user-friendly interface that simplifies the interaction with the model. This could involve a dashboard or application that presents the model’s output in an easily interpretable manner.
- Clinical Guidelines: Develop guidelines for clinicians on how to interpret the model’s results and incorporate them into their decision-making process. This could include recommendations on when and how to use the model’s predictions alongside traditional diagnostic methods.
- Training for Clinicians: Provide training sessions for clinicians to understand the underlying concepts of the model and its practical application. This can help them gain confidence in utilizing the model effectively.
- Gradual Implementation: Introduce the model in a phased manner, starting with specific use cases where the model’s predictions can provide valuable insights. Gradually expand its usage as clinicians become more comfortable with its application.
- Feedback Loop: Establish a feedback loop where clinicians can provide input on the model’s performance, usability, and areas for improvement. This iterative process can lead to a model that better aligns with clinical needs.
- Real-World Testing: Conduct pilot studies or simulations within a controlled clinical environment to observe how the model’s predictions integrate into the workflow and impact decision making.
- Validation Studies: Conduct validation studies that compare the model’s predictions with established diagnostic methods. This can help establish the model’s clinical validity and reliability.
- Ethical and Regulatory Considerations: Ensure compliance with ethical guidelines and regulatory requirements for medical devices and AI applications in healthcare.
- Patient Involvement: Involve patients in the implementation process. Their feedback can provide insights into the practical implications of using the model in clinical care.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Prashanth, R.; Roy, S.D. Early detection of Parkinson’s disease through patient questionnaire and predictive modelling. Int. J. Med. Inform. 2018, 119, 75–87. [Google Scholar] [CrossRef] [PubMed]
- Singh, N.; Pillay, V.; Choonara, Y.E. Advances in the treatment of Parkinson’s disease. Prog. Neurobiol. 2007, 81, 29–44. [Google Scholar] [CrossRef] [PubMed]
- Gunduz, H. Deep learning-based Parkinson’s disease classification using vocal feature sets. IEEE Access 2018, 7, 115540–115551. [Google Scholar] [CrossRef]
- Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Accurate telemonitoring of Parkinson’s disease progression by noninvasive speech tests. IEEE Trans. Biomed. Eng. 2010, 57, 884–893. [Google Scholar] [CrossRef] [PubMed]
- Lahmiri, S.; Shmuel, A. Detection of Parkinson’s disease based on voice patterns ranking and optimized support vector machine. Biomed. Signal Process. Control 2019, 49, 427–433. [Google Scholar] [CrossRef]
- Braga, D.; Madureira, A.M.; Coelho, L.; Ajith, R. Automatic detection of Parkinson’s disease based on acoustic analysis of speech. Eng. Appl. Artif. Intell. 2019, 77, 148–158. [Google Scholar] [CrossRef]
- Zhao, A.; Qi, L.; Li, J.; Dong, J.; Yu, H. A hybrid spatio-temporal model for detection and severity rating of Parkinson’s disease from gait data. Neurocomputing 2018, 315, 1–8. [Google Scholar] [CrossRef]
- Bilgin, S. The impact of feature extraction for the classification of amyotrophic lateral sclerosis among neurodegenerative diseases and healthy subjects. Biomed. Signal Process. Control 2017, 31, 288–294. [Google Scholar] [CrossRef]
- Silveira-Moriyama, L.; Petrie, A.; Williams, D.R.; Evans, A.; Katzenschlager, R.; Barbosa, E.R.; Lees, A.J. The use of a color coded probability scale to interpret smell tests in suspected parkinsonism. Mov. Disord. 2009, 24, 1144–1153. [Google Scholar] [CrossRef]
- Valenza, G.; Orsolini, S.; Diciotti, S.; Citi, L.; Scilingo, E.P.; Guerrisi, M.; Danti, S.; Lucetti, C.; Tessa, C.; Barbieri, R.; et al. Assessment of spontaneous cardiovascular oscillations in Parkinson’s disease. Biomed. Signal Process. Control 2016, 26, 80–89. [Google Scholar] [CrossRef]
- Illner, V.; Sovka, P.; Rusz, J. Validation of freely available pitch detection algorithms across various noise levels in assessing speech captured by smartphone in Parkinson’s disease. Biomed. Signal Process. Control 2020, 58, 101831. [Google Scholar] [CrossRef]
- Solana-Lavalle, G.; Galán-Hernández, J.-C.; Rosas-Romero, R. Automatic Parkinson disease detection at early stages as a prediagnosis tool by using classifiers and a small set of vocal features. Biocybern. Biomed. Eng. 2016, 40, 505–516. [Google Scholar] [CrossRef]
- El Maachi, I.; Bilodeau, G.-A.; Bouachir, W. Deep 1D-convnet for accurate Parkinson disease detection and severity prediction from gait. Expert Syst. Appl. 2020, 143, 113075. [Google Scholar] [CrossRef]
- Gupta, U.; Bansal, H.; Joshi, D. An improved sex-specific and age dependent classification model for Parkinson’s diagnosis using hand writing measurement. Comput. Methods Programs Biomed. 2020, 189, 105305. [Google Scholar] [CrossRef]
- Wagner, A.; Fixler, N.; Resheff, Y.S. A wavelet-based approach to monitoring Parkinson’s disease symptoms. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 5980–5984. [Google Scholar]
- Arroyo-Gallego, T.; Ledesma-Carbayo, M.J.; Sanchez-Ferro, A.; Butterworth, I.; Mendoza, C.S.; Matarazzo, M.; Montero, P.; Lopez-Blanco, R.; Puertas-Martin, V.; Trincado, R.; et al. Detection of motor impairment in Parkinson’s disease via mobile touchscreen typing. IEEE Trans. Biomed. Eng. 2017, 64, 1994–2002. [Google Scholar] [CrossRef]
- Dinov, I.D.; Heavner, B.; Tang, M.; Glusman, G.; Chard, K.; Darcy, M.; Madduri, R.; Pa, J.; Spino, C.; Kesselman, C.; et al. Predictive big data analytics: A paper of Parkinson’s disease using large, complex, heterogeneous, incongruent, multi-source and incomplete observations. PLoS ONE 2016, 11, e0157077. [Google Scholar] [CrossRef]
- Drotár, P.; Mekyska, J.; Rektorová, I.; Masarová, L.; Smékal, Z.; Faundez-Zanuy, M. Decision support framework for Parkinson’s disease based on novel handwriting markers. IEEE Trans. Neural Syst. Rehabil. Eng. 2015, 23, 508–516. [Google Scholar] [CrossRef]
- Drotár, P.; Mekyska, J.; Rektorová, I.; Masarová, L.; Smékal, Z.; Faundez-Zanuy, M. Evaluation of handwriting kinematics and pressure for differential diagnosis of Parkinson’s disease. Artif. Intell. Med. 2016, 67, 39–46. [Google Scholar] [CrossRef]
- Harrou, F.; Sun, Y.; Hering, A.S.; Madakyaru, M. Statistical Process Monitoring Using Advanced Data-Driven and Deep Learning Approaches: Theory and Practical Applications; Elsevier: Amsterdam, The Netherlands, 2020. [Google Scholar]
- Xing, W.; Bei, Y. Medical Health Big Data Classification Based on KNN Classification Algorithm. IEEE Access 2020, 8, 28808–28819. [Google Scholar] [CrossRef]
- Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Žídek, A.; Nelson, A.W.R.; Bridgland, A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. [Google Scholar] [CrossRef]
- Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [Google Scholar] [CrossRef]
- Little, M.A.; McSharry, P.E.; Hunter, E.J.; Spielman, J.; Ramig, L.O. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng. 2009, 56, 1015–1022. [Google Scholar] [CrossRef] [PubMed]
- Das, R. A comparison of multiple classification methods for diagnosis of parkinson disease. Expert Syst. Appl. 2010, 37, 1568–1572. [Google Scholar] [CrossRef]
- Ashour, A.S.; El-Attar, A.; Dey, N.; El-Kader, H.A.; El-Naby, M.M.A. Long short term memory based patient-dependent model for FOG detection in Parkinson’s disease. Pattern Recognit. Lett. 2020, 131, 23–29. [Google Scholar] [CrossRef]
- Govindu, A.; Palwe, S. Early detection of Parkinson’s disease using machine learning. Procedia Comput. Sci. 2023, 218, 249–261. [Google Scholar] [CrossRef]
- Prashanth, R.; Roy, S.D.; Mandal, P.K.; Ghosh, S. High-accuracy detection of early Parkinson’s disease through multimodal features and machine learning. Int. J. Med. Inform. 2016, 90, 13–21. [Google Scholar] [CrossRef]
- Roh, S.-B.; Oh, S.-K.; Pedrycz, W.; Fu, Z. Dynamically Generated Hierarchical Neural Networks Designed With the Aid of Multiple Support Vector Regressors and PNN Architecture with Probabilistic Selection. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 1385–1399. [Google Scholar] [CrossRef]
- Khatamino, P.; Canturk, I.; Ozyilmaz, L. A deep learning-CNN based system for medical diagnosis: An application on Parkinson’s disease handwriting drawings. In Proceedings of the 2018 6th International Conference on Control Engineering & Information Technology (CEIT), Istanbul, Turkey, 25–27 October 2018; pp. 1–6. [Google Scholar]
- Chakraborty, R.; Hasija, Y. Predicting MicroRNA Sequence Using CNN and LSTM Stacked in Seq2Seq Architecture. IEEE/Acm Trans. Comput. Biol. Bioinform. 2020, 17, 2183–2188. [Google Scholar] [CrossRef]
- Parziale, A.; Della, C.A.; Senatore, R.; Marcelli, A. A decision tree for automatic diagnosis of Parkinson’s disease from offline drawing samples: Experiments and findings. In Proceedings of the Image Analysis and Processing, Trento, Italy, 9–13 September 2019; Springer: Cham, Switzerland, 2019; pp. 196–206. [Google Scholar]
- Moetesum, M.; Siddiqi, I.; Javed, F.; Masroor, U. Dynamic Handwriting Analysis for Parkinson’s Disease Identification using C-BiGRU Model. In Proceedings of the 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), Dortmund, Germany, 8–10 September 2020; pp. 115–120. [Google Scholar]
- Nõmm, S.; Zarembo, S.; Medijainen, K.; Taba, P.; Toomela, A. Deep CNN based classification of the archimedes spiral drawing tests to support diagnostics of the Parkinson’s disease. IFAC-PapersOnLine 2020, 53, 260–264. [Google Scholar] [CrossRef]
- Johri, A.; Tripathi, A. Parkinson disease detection using deep neural networks. In Proceedings of the 2019 Twelfth International Conference on Contemporary Computing (IC3), Noida, India, 8–10 August 2019; pp. 1–4. [Google Scholar]
- Folador, J.P.; Santos, M.C.S.; Luiz, L.M.D.; Souza, L.A.P.D.; Vieira, M.F.; Pereira, A.A.; Andrade, A.D.O. On the use of histograms of oriented gradients for tremor detection from sinusoidal and spiral handwritten drawings of people with Parkinson’s disease. Med. Biol. Eng. Comput. 2021, 59, 195–214. [Google Scholar] [CrossRef]
- Parisi, L.; Neagu, D.; Ma, R.; Campean, F. Quantum ReLU activation for convolutional neural networks to improve diagnosis of Parkinson’s disease and COVID-19. Exp. Syst. Appl. 2022, 187, 115892. [Google Scholar] [CrossRef]
- Available online: https://www.kaggle.com/datasets/debasisdotcom/parkinson-disease-detection (accessed on 15 January 2023).
- Available online: https://en.wikipedia.org/wiki/Ensemble_learning (accessed on 15 January 2023).
- Gu, X.; Angelov, P.P. Multiclass Fuzzily Weighted Adaptive-Boosting-Based Self-Organizing Fuzzy Inference Ensemble Systems for Classification. IEEE Trans. Fuzzy Syst. 2022, 30, 3722–3735. [Google Scholar] [CrossRef]
- Tie, J.; Lei, X.; Pan, Y. Metabolite-disease association prediction algorithm combining DeepWalk and random forest. Tsinghua Sci. Technol. 2022, 27, 58–67. [Google Scholar] [CrossRef]
- Jun, K.; Lee, D.-W.; Lee, K.; Lee, S.; Kim, M.S. Feature Extraction Using an RNN Autoencoder for Skeleton-Based Abnormal Gait Recognition. IEEE Access 2020, 8, 19196–19207. [Google Scholar] [CrossRef]
- Fang, Z. Improved KNN algorithm with information entropy for the diagnosis of Parkinson’s disease. In Proceedings of the 2022 International Conference on Machine Learning and Knowledge Engineering (MLKE), Guilin, China, 25–27 February 2022; pp. 98–101. [Google Scholar]
- Wang, W.; Lee, J.; Harrou, F.; Sun, Y. Early Detection of Parkinson’s Disease Using Deep Learning and Machine Learning. IEEE Access 2020, 8, 147635–147646. [Google Scholar] [CrossRef]
- Available online: https://chat.openai.com/ (accessed on 25 August 2023).
Models | Advantages | Disadvantages |
---|---|---|
SVR [29] | (i) It has capability to handle non-linear relationships between input features and the target output. (ii) It can effectively capture complex patterns and variations in the data, making it suitable for situations where the underlying relationships are not linear. | (i) Its sensitivity to the selection of hyperparameters, such as the kernel type and regularization parameters. (ii) Poorly tuned hyperparameters can lead to overfitting or underfitting, impacting the model’s generalization performance. |
CNN [30] | (i) Its capacity to autonomously discern pertinent attributes from unprocessed data, like images, without the need for manual feature engineering. (ii) CNNs are particularly adept at capturing spatial patterns and hierarchies of features, making them well-suited for image-based data like brain scans or medical images commonly used in PD diagnosis. | (i) Its susceptibility to overfitting, especially when dealing with limited training data. CNNs contain a large number of learnable parameters, and without enough diverse data, the model might generalize poorly to new, unseen examples. (ii) Regularization techniques and data augmentation can mitigate this issue, but careful attention to dataset size and quality is necessary. |
Stacked-LSTM [31] | (i) Its proficiency in capturing and learning long-term dependencies within sequential data. (ii) It can effectively handle complex temporal patterns in time series data related to PD symptoms. | (i) Its susceptibility to overfitting, especially when dealing with limited training data. |
LSTM [26] | (i) Its aptitude to apprehend extended correlations and sequential patterns within time series data. | (i) Its sensitivity to hyperparameter tuning. (ii) It struggle when dealing with very short sequences and the temporal dependencies. |
Decision Tree, RF, SVR [32] | (i) Diverse Learning, (ii) Bias Reduction, (iii) Robustness, (iv) Capturing Complex Patterns, and (v) High Accuracy. | (i) Complexity, (ii) Hyperparameter Tuning, (iii) Computational Intensity, (iv) Data Requirements, and (v) Risk of Overfitting. |
GRU [33] | (i) It has the capacity to encompass distant connections within sequential data. | (i) It might not capture very long-term dependencies as effectively as more complex models like LSTMs. |
Alex Net [34] | (i) Its capacity to effectively extract features from images and visual data. (ii) It can automatically learn hierarchical features, making it suitable for processing visual data such as brain scans to PD diagnosis. | (i) Its relatively large number of parameters, which can result in higher computational requirements and increased training times. (ii) Alex Net may not be as efficient in capturing intricate spatial patterns for complex image recognition tasks. |
Deep Neural Network [35] | (i) Feature Learning, (ii) Hierarchy of Features, (iii) Versatility, (iv) Performance, and (v) Transfer Learning. | (i) Data Hunger, (ii) Complexity, (iii) Hyperparameter Tuning, (iv) Black Box Nature, and (v) Computational Intensity. |
Models | Advantages | Disadvantages |
---|---|---|
HOG [36] | (i) Efficient Feature Extraction, (ii) Robustness to Illumination and Color Variations, (iii) Interpretability,(iv) Low-Dimensional Representation, and (v) Applicability to Different Data Modalities. | (i) Limited Representation of Complex Patterns, (ii) Limited Contextual Information, (iii) Dependency on Image Quality,(iv) Feature Engineering Required, and (v) Limited Application to Non-Visual Data. |
Quantum ReLU Activator [37] | (i) Quantum Advantage, (ii) Feature Transformation, and (iii) Non-linearity. | (i) Complexity, (ii) Hardware and Infrastructure, (iii) Lack of Quantum Data, (iv) Interpretability, and (v) Limited Adoption. |
Proposed PDD-ET model | (i) Improved Accuracy and Robustness, (ii) Reduced Overfitting, (iii) Capturing Diverse Patterns, (iv) Handling Noisy Data, (v) Model Flexibility, (vi) Interpretable Insights, (vii) Scalability, Parallelism, and Adaptability. | (i) Architecture of the model is Complex. (ii) Training of the model is complex and time consuming. |
Select ML algorithms for Ensemble | Accuracy | Sensitivity | Precision | F1-Score |
---|---|---|---|---|
Adaptive boosting, RF | 65.93% | 67.23% | 68.69% | 69.23% |
Adaptive boosting, RF, and KNN | 75.223% | 75.98% | 76.23% | 78.23% |
Adaptive boosting, RF, KNN, and LSTM | 80.12% | 80.26% | 80.32% | 80.95% |
Adaptive boosting, RF, Stacked RF, GRU, and KNN | 85.12% | 86.26% | 86.32% | 86.95% |
Adaptive boosting, RF, Stacked RF, GRU, and Improved KNN | 90.0123% | 89.926% | 89.932% | 89.95% |
Adaptive boosting, RF, Stacked KNN, Stacked LSTM, and SVR | 90.12% | 91.26% | 91.32% | 92.95% |
Adaptive boosting, RF, SVR, LSTM, GRU, and Stacked LSTM | 96.12% | 97.26% | 98.32% | 98.05% |
Hyperparameter | Value |
---|---|
Loss | MSE and MAE |
Optimizer | stochastic gradient descent (SGD) |
Batch size | 68 |
Time step | 1 |
Epochs | 1000 |
Learning Rate | 0.0003 |
Dropout | 0.03 |
PD Status | 117,541 Units of Samples | |
---|---|---|
PD Positive | PD Negative | |
Training Set | 25% | 10.6% |
Validation Set | 5% | 6% |
Testing Set | 45.4% | 8% |
Total Set | 75.4% | 24.6% |
Model | (%) | (%) | (%) | (%) | (%) | Loss (%) | AUC (%) |
---|---|---|---|---|---|---|---|
SVR [29] | 58.236 | 59.986 | 50.236 | 50.6921 | 55.236 | 40.23 | 78.23 |
CNN [30] | 65.329 | 66.159 | 60.956 | 65.2369 | 65.659 | 30.26 | 85.655 |
Stacked-LSTM [31] | 75.415 | 76.652 | 75.236 | 74.489 | 75.658 | 20.365 | 80.369 |
LSTM [26] | 68.968 | 68.265 | 68.266 | 67.856 | 68.236 | 30.256 | 80.235 |
GRU [33] | 70.968 | 70.265 | 70.256 | 71.256 | 71.658 | 26.23 | 82.365 |
Alex Net [34] | 77.235 | 76.235 | 76.556 | 76.698 | 76.569 | 30.123 | 83.569 |
Decision Tree, RF, SVR [32] | 76.652 | 76.231 | 76.359 | 76.589 | 76.658 | 26.256 | 82.698 |
Deep Neural Network [35] | 68.568 | 68.236 | 68.356 | 68. 432 | 68.569 | 32.123 | 84.236 |
HOG [36] | 70.236 | 70.569 | 70.165 | 70.3215 | 70.213 | 30.215 | 81.023 |
Quantum ReLU Activator [37] | 72.123 | 72.369 | 72.456 | 72.325 | 72.658 | 25.369 | 78.256 |
Improved KNN [43] | 50.9658 | 50.3256 | 50.4568 | 50.231 | 50.562 | 45.236 | 75.9869 |
Adaptive Boosting [40] | 55.658 | 55.956 | 55.7858 | 55.9831 | 55.7562 | 40.136 | 78.169 |
RF [41] | 53.1258 | 53.366 | 53.556 | 53.111 | 53.1262 | 32.236 | 79.1169 |
Deep Learning Model [44] | 79.918 | 79.561 | 79.116 | 79.569 | 79.662 | 30.106 | 79.969 |
Proposed PDD-ET model | 95.325 | 95.265 | 95.955 | 95.1225 | 95.925 | 19.325 | 88.00 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chatterjee, K.; Kumar, R.P.; Bandyopadhyay, A.; Swain, S.; Mallik, S.; Li, A.; Ray, K. PDD-ET: Parkinson’s Disease Detection Using ML Ensemble Techniques and Customized Big Dataset. Information 2023, 14, 502. https://doi.org/10.3390/info14090502
Chatterjee K, Kumar RP, Bandyopadhyay A, Swain S, Mallik S, Li A, Ray K. PDD-ET: Parkinson’s Disease Detection Using ML Ensemble Techniques and Customized Big Dataset. Information. 2023; 14(9):502. https://doi.org/10.3390/info14090502
Chicago/Turabian StyleChatterjee, Kalyan, Ramagiri Praveen Kumar, Anjan Bandyopadhyay, Sujata Swain, Saurav Mallik, Aimin Li, and Kanad Ray. 2023. "PDD-ET: Parkinson’s Disease Detection Using ML Ensemble Techniques and Customized Big Dataset" Information 14, no. 9: 502. https://doi.org/10.3390/info14090502
APA StyleChatterjee, K., Kumar, R. P., Bandyopadhyay, A., Swain, S., Mallik, S., Li, A., & Ray, K. (2023). PDD-ET: Parkinson’s Disease Detection Using ML Ensemble Techniques and Customized Big Dataset. Information, 14(9), 502. https://doi.org/10.3390/info14090502