You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

13 February 2023

Bayesian Optimization with Support Vector Machine Model for Parkinson Disease Classification

,
,
,
,
and
1
Computer Science Department, Faculty of Computers and Information, Suez University, Suez 43512, Egypt
2
Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh 33516, Egypt
3
Deanship of Scientific Research, Umm Al-Qura University, Makkah 21955, Saudi Arabia
4
Information Systems Department, Faculty of Computers and Information, Mansoura University, Mansoura 35561, Egypt
This article belongs to the Special Issue Machine Learning and AI for Medical Data Analysis

Abstract

Parkinson’s disease (PD) has become widespread these days all over the world. PD affects the nervous system of the human and also affects a lot of human body parts that are connected via nerves. In order to make a classification for people who suffer from PD and who do not suffer from the disease, an advanced model called Bayesian Optimization-Support Vector Machine (BO-SVM) is presented in this paper for making the classification process. Bayesian Optimization (BO) is a hyperparameter tuning technique for optimizing the hyperparameters of machine learning models in order to obtain better accuracy. In this paper, BO is used to optimize the hyperparameters for six machine learning models, namely, Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), Ridge Classifier (RC), and Decision Tree (DT). The dataset used in this study consists of 23 features and 195 instances. The class label of the target feature is 1 and 0, where 1 refers to the person suffering from PD and 0 refers to the person who does not suffer from PD. Four evaluation metrics, namely, accuracy, F1-score, recall, and precision were computed to evaluate the performance of the classification models used in this paper. The performance of the six machine learning models was tested on the dataset before and after the process of hyperparameter tuning. The experimental results demonstrated that the SVM model achieved the best results when compared with other machine learning models before and after the process of hyperparameter tuning, with an accuracy of 92.3% obtained using BO.

1. Introduction

Parkinson’s disease (PD) is a recognized clinical illness with a variety of etiologies and clinical manifestations. According to current definitions, PD is defined as the presence of bradykinesia together with either rest tremor, stiffness, or both. In the majority of populations, genetic factors connected to known PD genes account for 3–5% of PD, which is referred to as monogenic PD. In contrast, 90 genetic risk variations account for 16–36% of the heritable risk of non-monogenic PD. Constipation, being a non-smoker, having a relative with PD or tremor, and the additional causative factors all at least double the chance of PD. There is currently no treatment that can slow or stop the course of PD, however new knowledge about its genetic origins and processes of neuronal death is being developed [1].

1.1. Problem Statement

The use of machine learning (ML) algorithms is becoming increasingly common in the medical industry. As its name indicates, ML enables software to train data and develop outstanding representations in a semi-automatic manner. For the purpose of diagnosing Parkinson’s disease (PD), several data formats have been applied to ML approaches. ML also makes it possible to combine data from many imaging systems in order to identify Parkinson’s disease. In order to rely on these different measures for diagnosing Parkinson’s disease in preclinical phases or atypical structures, relevant characteristics that are not typically utilized in the diagnosing of Parkinson’s disease are discovered through the application of ML algorithms. This allows for the diagnosis of Parkinson’s disease in earlier stages. In recent years, there has been an increase in the number of publications published that discuss the use of ML to diagnose PD. Earlier studies did investigate the use of ML in the diagnosis and assessment of Parkinson’s disease, but they were only able to evaluate inputs from sensing devices and motor and kinematics symptoms [2]. Computer-based statistical methods known as machine learning algorithms may be trained to look for recurring patterns in large volumes of data. Clinicians can use machine learning techniques to identify patients based on several criteria at once [3].

1.2. Objectives

It is possible to use model-based and model-free strategies to predict certain medical outcomes or diagnostic characteristics. Generalized linear models are an illustration of model-based techniques. One of the most often used model-based techniques is logistic regression, which is useful when the output parameters are assessed on a binary scale (e.g., failure/success) and follow the distribution of Bernoulli. Therefore, using the predicted probabilities as a basis, categorization may be performed. The model assumptions must be thoroughly examined, verified, and the right connection functions must be chosen by the investigators. Because the statistical principles may not always apply in real-world circumstances, particularly for significant volumes of incongruent data, the model-based procedures may not be applicable or may provide biased conclusions. This is especially the case if there are massive quantities of incongruent data. Model-free approaches, on the other hand, make less assumptions and accommodate the underlying characteristics of the data without having to build any models in advance. Model-free approaches, such as Random Forest, Support Vector Machines, AdaBoost, Neural Networks, XGBoost, and SuperLearner are capable of building non-parametric interpretations, which are also known as (non-parametric) techniques, from difficult data without simplifying the issue. Since these algorithms do not provide ideal classification/regression outcomes, they benefit from ongoing learning or retraining. Nevertheless, model-free ML algorithms offer significant promise for tackling real-world issues when properly maintained, as well as trained and reinforced effectively [4]. The accurate and early identification of PD is critical because it can reveal valuable information that can be used to slow down the course of the disease [5].
Classification has a purpose in PD identification to reduce time and improve treatment effectiveness. The challenge is to find the classification method that is most effective for PD detection; however, a study of the relevant knowledge reveals that various different classification techniques have been employed to provide superior outcomes. The difficulty in choosing the best classification method is that it must be applied to a local dataset.

1.3. Paper Contribution

In this study, Bayesian Optimization is used to optimize the hyperparameters for six machine learning models, namely, Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB), Logistic Regression (LR), Ridge Classifier (RC), and Decision Tree (DT) to determine the categorization method that is both the most effective and precise for PD. The dataset used consists of 23 features and 195 instances. The experimental results demonstrated that the SVM model achieved the best outcomes when compared with various ML models before and after the process of hyperparameter tuning, with an accuracy 92.3% obtained using BO.

1.4. Paper Organization

The remaining sections of paper are arranged in the following order. Section 2 describes a comprehensively summary of some studies published that used machine learning techniques in the diagnosis and classification of PD to provide a comprehensive overview of data source, sample size, ML techniques, associated outcomes, and benefits and limitations. Section 3 presents the proposed BO for ML models in PD categorization. Section 4 shows the evaluation of the proposed approach and comparison with various ML approaches in the classification of Parkinson’s disease. Section 5 discusses the conclusions of this study.

3. Materials and Methods

These days, Parkinson’s disease (PD) is very prevalent all over the world. The human nervous system and numerous bodily components that are linked by nerves are impacted by PD. The ML models development that can aid in disease prediction can be extremely important for early prediction. In this study, we use a common dataset and a few machine learning techniques to classify Parkinson’s patients. Before assessing a performance of a model, hyper-parameter optimization enables fine-tuning. The Bayesian Optimization (BO) approach is utilized to generate samples of hyper-parameter values in order to discover the optimum values. Classification approaches are trained using a training set for optimization and tested using a test set for each hyper-parameter configuration. The ideal parameter setup is the one that provides the highest overall accuracy. The following phase involves training each model using the original training set’s optimums, and the accuracy is assessed by classifiers on the test set. In this study, the hyperparameters for six machine learning models, namely, SVM, RF, LR, NB, RC, and DT, are optimized using Bayesian optimization (BO). Twenty-three features and 195 instances make up the dataset used in this study. Accuracy, recall, F1-score, and precision were computed as evaluation measures to assess the effectiveness of the supposed categorization models. Using the dataset, the six machine learning models’ performance was evaluated both before and after the hyperparameter tuning procedure, and the experiments showed that support vector machine is the optimal classifier among the utilized classifiers. Figure 1 shows the proposed Bayesian Optimization for various machine learning (ML) models in Parkinson’s disease categorization.
Figure 1. The proposed BO-ML models for Parkinson’s disease classification.
In this research, we use a real-world dataset to develop a hybrid BO-SVM model for classifying patients with Parkinson’s disease. Separate portions of the entire dataset are used for training and testing purposes. Models of classifiers may be built using training data. Later, the created models are scored by how well they categorize the test data. In order to construct an effective classification model for Parkinson’s disease, SVM is supposed with Bayesian Optimization for tuning hyperparameters. Identifying the variables you want to use as predictors and the result you want to obtain are the initial steps in developing a classification model of SVM. The next step is to run searches to fine-tune the SVM’s hyperparameters.
Finally, the tuned hyperparameters of SVM are used in classification, and the model’s performance is evaluated using test data. According to experimental results, SVM achieved 89.6% before hyperparameter tuning compared to 80.9%, 82.1%, 85.7%, 85.3%, and 87.2% for RC, NB, DT, LR, and RF, respectively. After applying hyperparameter tuning, there were two hyperparameters of SVM: Kernel = rbf, and regularization parameter (C) = 0.4. The SVM with BO achieved 92.3% compared to 83.3%, 84.6%, 88.5%, 87.2%, and 89.7% for BO with NB, DT, LR, and RF, respectively. Therefore, it is worth utilizing the SVM method for Parkinson’s disease classification in conjunction with other ML algorithms.

3.1. Min-Max Normalization

A crucial step in any analysis that compares data from multiple domains is normalization. Normalization moves information from a given domain to a range, such as between (0, 1). Numerous techniques exist for normalizing data, such as decimal scaling, min-max, Z-score, median-mad, mean-mad, and norm normalization techniques [38]. The min-max normalization approach rescales the property from its domain to a new set of values, such as between (0, 1). The basis of this approach is as follows:
f n = n min n max n min n
where f(n) is the normalized features, and n is the input feature value. The max(n) and min(n) are the highest and lowest sets of the input feature.

3.2. Bayesian Optimization

Hyperparameters are a group of factors used in testing and training to support the learning process. The learning rate, iterations number, batch size, hidden layers, momentum, regularization, and activation functions are examples of hyperparameters. The parameters might be an integer or categorical or continuous variable with values ranging from the lower to higher bounds. Hyperparameters are stable throughout the training process, which improves model accuracy while simultaneously reducing memory usage and training time. Based on the problem description, different models use different hyperparameters. There is no optimum hyperparameters that apply to all models [39].
The term “Bayesian Optimization (BO)” refers to a method that may be used in a sequential fashion to optimize the parameters of any black-box function f(x). BO integrates prior belief for the purpose of evaluating a response surface function fˆ(x), utilizing fˆ(x) to choose the configuration xn to try, evaluates f(xn) by using true f(x), specifies posterior belief through assessed performance f(xn), and continues the procedure in sequential manner until a stop criteria is arrived at to tune the test sample for achieving enhanced parameters that collaborate for better classification [40]. The framework of BO is shown in Figure 2.
Figure 2. The framework of BO.
The Bayesian theorem forms the basis of BO [41,42]. In order to update the optimization function posterior, it establishes a prior over the optimization function and collects information from the previous sample set [43]. Equation (2), which asserts for a model A and observation B, is the foundation for the optimization process that based on Bayes’ Theorem [44].
P A | B = ( P ( B | A ) P A ) / P B
where P(A|B) denotes the likelihood of A given B, P(B|A) represents the likelihood of B given A, P(A) indicates the prior probability of A, and P(B) signifies the marginal probability of B. Bayesian Optimization is utilized to determine the minimal value of a function on a limited set [39].

3.3. Machine Learning Models Using Bayesian Hyperparameter Optimization

We present some ML models for Parkinson’s disease categorization. The Bayesian Optimization approach is used to fine-tune hyperparameters for six popular ML models: SVM, RF, LR, NB, RK, and DT. The Machine Learning Repository (UCI) dataset [45] was used to assess the classifiers’ efficiency. BO is a hyperparameter tuning method for improving the accuracy of machine learning models. BO seeks to collect observations that disclose as much information as possible about the function and the position of its optimal value With Bayesian Optimization, the ideal value might be discovered using relatively few samples. It does not need the explicit formulation of the function, in contrast to conventional optimization techniques. Therefore, Bayesian Optimization is ideal for hyperparameter tuning. Therefore, initially, BO is applied to tune hyperparameters for the Support Vector Machine (SVM) algorithm [46], Random Forest (RF) [47], Logistic Regression (LR) [48], Naive Bayes (NB) [49], Ridge Classifier (RC) [50], and Decision Tree (DT) [51].
SVM is a popular supervised machine learning technique used for both classification and regression tasks; it is based on the kernel method [52]. Because of this, we set out to optimize the SVM hyperparameters in search of the kernel function and parameters that would provide the most reliable model [53,54]. Using a random starting point in the hyperparameter space, the Bayesian technique iteratively assesses prospective hyperparameter configurations in light of the existing model to see if any of them enhance the model. Based on the experimental results presented in this work, the suggested Bayesian Optimization-Support Vector Machine (BO-SVM) achieves the greatest accuracy for the classification process. The pseudocode of proposed approach is presented in Algorithm 1.
Algorithm 1. Bayesian Optimization-Support Vector Machine (BO-SVM)
Input: Dataset D, hyper-parameter space Θ, Target score function T( θ ), max n ° of evaluation nmax.
Split randomly the D into N folds; one for train set and the other for test set.
Build a model m on the train dataset using SVM approach.
Choose a starting configuration θ0   ϵ Θ.
Assess the original score y0 = T(θ0).
Initialize S0 = {θ0, y0}
While t < maximum number of iterations do
For m = 1, …, mmax do
Choose a new hyperparameter arrangement θm ϵ Θ by enhancing function Um
Θm = a r g θ   ϵ   Θ   max Um ( θ , St),
Analyze H in θm to get a new numerical score ym = T(θm).
Strengthen the data S m = S m 1   m, ym}.
Update the surrogate model.
m = m + 1
End for
End while
Extract optimized hyperparameters.
Build SVM model using these tuned hyperparameters from the test data set.
Solve the optimization problem, evaluate the accuracy and save it in array.
Output: Mean accuracy of classification.

4. Experimental Results

4.1. Dataset Description

The dataset used in this paper is available at [45]. The dataset consists of 23 features and 195 instances. The data were first created in a collaboration between Oxford University and the National Centre for Voice and Speech by Max Little. They include 195 sustained vowels aggregated from 31 females and males, 23 of them diagnosed with PD. All patients ranged from 46 to 85 (65.8 ± 9.8). The duration from diagnosis was 0 to 28 years. For each subject, a range of biomedical phonetics was recorded which ranged from one to 36 s. The data were recorded using IAC sound with an AKG C420 Microphone that was positioned about 8 cm from the patient’s lips. Then, the voice signals were transferred directly to a computer based on CSL 4300B kay. All voice signals were sampled at 44.2 kHz with 16-bit resolution. Despite amplitude normalization, which affects the calibration, the study mainly focused on changes in the absolute change pressure level. The data were in the ASCII CSV format. Each column represents a specific voice measure, while each row represents one recording from patients. For each patient, there are roughly six recordings with different specific voice measures. The first column in the dataset refers to the patient’s name. Table 2 details the dataset features information. The statistical analysis for the dataset is illustrated in Table 3.
Table 2. Dataset features description.
Table 3. Statistical analysis for the dataset.
The heatmap analysis for the dataset features is shown in Figure 3. Heatmap analysis is a commonly used technique in data analysis to visualize the relationship between variables in a dataset. It is used to identify strong and weak relationships between features, and to understand how features are correlated with one another. In this figure, the vertical and horizontal bars are the numerical data of the applied features. The numerical data in the heat map are normalized from 0 to 1, the brightness indicates that the value is 1 and the dark color indicates that the value is 0. The diagonal values are 1, which means that the features are totally corelated and when the values decreased, it means the correlation between features is decreased. This statistically helps us to diagnose and prognose the PD in terms of the heatmap figure. Figure 4 shows the Box plot visualization per category label analysis for the dataset characteristics.
Figure 3. Heatmap analysis for the dataset features.
Figure 4. Box plot visualization per category label analysis for the dataset features.
Figure 5 demonstrates a box plot for distribution analysis of the features. It is a useful tool for visualizing the distribution of numerical data. When analyzing the distribution of features in a dataset, a box plot can be used to display the distribution of each feature. This type of visualization is called a box plot for distribution analysis of features. In this figure, we visualize the enrolled features which are the most significant 23 features of the applied PD dataset. Box plots split data into portions that each include around 25% of the data in that set. Box plots are valuable because they give a visual overview of the data, allowing us to easily determine mean values, dataset dispersion, and skewness.
Figure 5. Box plot for distribution analysis of the features.
Figure 6 demonstrates the histogram for distribution analysis of the features, which is a graphical representation of the distribution of a dataset, showing the frequency of data points within different intervals. It is a useful tool for visualizing the distribution of numerical data. We explored the histogram of the characteristics in this figure, which is a standard graphing tool used to incorporate discrete and continuous data recorded on an interval scale. It is frequently used to depict the key aspects of data distribution in a convenient format.
Figure 6. Histogram for distribution analysis of the features.

4.2. Evaluation Metrics

The experimental results were executed using jupyter notebook version (6.4.6). Jupyter Notebook is a popular tool for data analysis and visualization in Python. It allows you to write and run code, display visualizations, and document your findings all in one place. It runs on a web browser and supports many programming languages, including Python 3.8. The experiment was run using a computer with an Intel Core i5 processor and 16 GB RAM, using the Microsoft Windows 10 operating system. In this paper Bayesian Optimization is used to optimize the hyperparameters for six machine learning classification models, namely, Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), Ridge Classifier (RC), and Decision Tree (DT). The performance of BO-SVM model was compared with several machine learning models. The performance of the classification models utilized in this article was measured using four different metrics: accuracy, recall, precision and F1 score. Accuracy is calculated using Equation (3):
Accuracy = T P + T N T P + F P + F N + T N
where TP if true positive, TN is true negative, FP is false positive, and FN is false negative. Recall is calculated using Equation (4):
Recall = T P T P + F N
Precision is calculated using Equation (5):
Precision = T P T P + F P
F1 score is computed using Equation (6):
F 1   score = 2 R e c a l l P r e c i s i o n R e c a l l + P r e c i s i o n
The hyperparameters for the classification models in the experimental were optimized using a Bayesian Optimization approach. The best hyperparameters for each model are listed in the Table 4, where:
Table 4. Hyperparameters tuning for the classification models using the Bayesian Optimization approach.
  • Random Forest (RF): The best number of estimators was 10, using the “gini” criterion.
  • Ridge Classifier (RC): The best alpha was 0.4, with “copy_X” set to false, “fit_intercept” set to true, “normalize” set to false, and using the “lsqr” solver with a tolerance of 0.01.
  • Decision Tree (DT): The best criterion was “entropy” and the best splitter was “random”.
  • Naive Bayes (NB): The best alpha was 0.1 and the best value for “var_smoothing” was 0.00001.
  • Logistic Regression (LR): The best penalty was “l2” and the best solver was “lbfgs”.
  • Support Vector Machine (SVM): The best kernel was “rbf” and the best value for the regularization parameter (C) was 0.4.
Table 5 show the performance of each model using the Bayesian Optimization approach in terms of accuracy, F1 score, recall, and precision. The model with the highest accuracy, F1 score, recall, and precision is BO-SVM, with an accuracy of 92.3%, F1 score of 92.1%, recall of 92.3%, and precision of 92.1%. The lowest results among the models are seen in BO-RC, with an accuracy of 83.3%, F1 score of 82.2%, recall of 83.3%, and precision of 82%. Figure 7 represents the accuracy of the classification models using the Bayesian Optimization approach.
Table 5. Performance of the classification models using the Bayesian Optimization approach.
Figure 7. Representation of the models in the term of accuracy using the Bayesian Optimization approach.
Table 6 shows the performance of the classification models in terms of accuracy, using default parameters. The use of default values can simplify the modeling process, as it eliminates the need for manual tuning of hyperparameters. The default values specified by the scikit-learn library are chosen based on general best practices and have been found to work well in a variety of situations. From the results in Table 6, it can be seen that the SVM model has the highest accuracy among the models, with 89.6%. The Random Forest model comes in second with an accuracy of 87.2%. On the other hand, the Ridge Classifier model has the lowest accuracy among the models with 80.9%. It is important to note that these results are based on the default parameters of each model and may be improved through hyperparameter tuning, as demonstrated in Table 5.
Table 6. Performance of the classification models in the term of accuracy using the default parameters.
It can be concluded that hyperparameter tuning through Bayesian Optimization significantly improves the performance of the models compared to their default parameters. The Bayesian Optimization approach helps to optimize the hyperparameters and results in better accuracy for each of the models. Figure 8 represents the accuracy of the classification models using the default hyperparameters.
Figure 8. Representation of the models in the term of accuracy using the default parameters.
The results of the study were further evaluated using confusion matrices, which were presented in Figure 9. These matrices helped to more effectively evaluate the performance of each classifier. The results indicated that the BO-SVM had the best performance, outperforming the other classifiers.
Figure 9. Confusion matrix for classifiers (a) BO-SVM, (b) BO-RF, (c) BO-LR, (d) BO-DT, (e) BO-NB, and (f) BO-RC.

4.3. Discussion

This paper proposes a novel method for distinguishing between those who have Parkinson’s disease (PD) and those who do not, based on Bayesian Optimization-Support Vector Machine (BO-SVM). Bayesian Optimization (BO) with a hyperparameter tuning technique is used to optimize the hyperparameters for six distinct machine learning models, namely, Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), Ridge Classifier (RC), and Decision Tree (DT). The dataset utilized in this study has 23 characteristics and 195 occurrences, and the models’ performance was measured using four metrics: accuracy, F1-score, recall, and precision.
The findings revealed that the SVM model performed the best among all models, both before and after hyperparameter tuning, with an accuracy of 92.3 percent reached using BO. The paper presented an essential contribution to the subject of machine learning and its applications in healthcare. For diagnosing speech deficits in patients at the early stages of central nervous system illnesses, Lauraitis et al. [55] used a Bidirectional Long Short-Term Memory (BiLSTM) neural network and a Wavelet Scattering Transform with Support Vector Machine (WST-SVM) classifier (CNSD). The study included 339 voice samples obtained from 15 participants: 7 with early stage CNSD (3 Huntington, 1 Parkinson, 1 cerebral palsy, 1 post stroke, 1 early dementia), and 8 healthy subjects. Their speech data are collected using a voice recorder from the Neural Impairment Test Suite (NITS) mobile application. Features are extracted from pitch contours, mel-frequency cepstral coefficients (MFCC), gammatone cepstral coefficients (GTCC), Gabor (analytic Morlet) wavelets, and auditory spectrograms. Ultimately, 94.50% (BiLSTM) and 96.3% (WST-SVM) accuracy is achieved for solving the healthy vs. impaired classification problem. The developed method can be applied for automated CNSD patient health state monitoring and clinical decision support systems, and as a part of the Internet of Medical Things (IoMT). In this work, we utilized BO with SVM. Therefore, the questions here are: although there are several hyperparameter optimization (HPO) tools, why the choice of BO? Does BO carry any distinct advantages when compared with other HPO methods. Will the ML algorithms give better results when optimized using other methods? In answer to these questions, BO has several advantages compared to other hyperparameter optimization (HPO) methods.
  • Model-based approach: BO uses a probabilistic model to represent the relationship between the hyperparameters and the performance of the model. This allows BO to make informed decisions about which hyperparameters to try next based on the results of previous trials.
  • Handling of noisy objectives: BO can handle noisy or stochastic objective functions, such as those that may be encountered in real-world machine learning applications.
  • Incorporation of prior knowledge: BO allows for the incorporation of prior knowledge about the objective function through the use of a prior distribution over the hyperparameters.
  • Efficient exploration–exploitation trade-off: BO balances exploration (trying new, potentially better hyperparameters) and exploitation (using the current best hyperparameters) in an efficient manner, allowing for faster convergence to the optimal hyperparameters.
A single observation from the original dataset is utilized as the validation set, also known as the test set, in leave-one-out cross validation (LOOCV), and the remaining observations constitute the training set. This technique is performed N times, with each observation serving as a validation set once. The LOOCV approach was used to measure classifier performance on unseen instances in separate and pooled datasets. The proportion of correct classifications over the N repetitions is used to define performance here. To ensure that the training set’s attributes, and thus the trained classifier’s, are not influenced by the validation sample, the test subject was removed from the initial dataset before applying the training set (with N1 samples), in order to obtain the subject scores required to train the classifier. The classifier was then utilized to determine the test subject’s label [29]. In this work, we do not need to use LOOCV cross validation because we utilized BO optimization, and the achieved results are promising compared with other results, as shown in Algorithm 1 and Figure 2.
We conducted a comparison study using the same standard dataset published in the UCI repository in [45] to compare the proposed model with the latest technique. Li et al. [56] showed NB, 3-NN, SVM-linear, and SVM-poly, with respective accuracies of 66.31%, 67.73%, 53.91%, and 55.41%. Sajal et al. [57] provided a method based on KNN, SVM, and NB, with accuracies of 90.50%, 87.00%, and 81.00% for five levels of classification in tremor analysis. Furthermore, Haritha et al. [58] obtained 76.20%, 86.71%, 91.83%, 82.90%, and 87.03% accuracy utilizing NB, DT, RF, MLP, and LR, respectively. Abayomi-Alli et al. [59] demonstrated a Bidirectional Long Short-Term Memory (BiLSTM) for the UCI PD dataset, and their model achieved an accuracy of 82.86% with the original data. Fang and Liang [60] presented the UCI dataset for Parkinson’s disease and optimization algorithms such as Particle Swarm Optimization (PSO), Whale Optimization Algorithm (WOA), Grasshopper Optimization Algorithm (GOA), Binary PSO (BPSO), and Binary GOA (BGOA) compared to the Nonlinear Binary Grasshopper Whale Optimization Algorithm (NL-BGWOA), and the results showed that the NL-BGWOA achieved 91.30% higher than other optimization algorithms. Figure 10 demonstrates the comparative study of the proposed method based on BO-SVM with the mentioned methods based on the same applied PD standard dataset.
Figure 10. Comparison between the proposed model and the recent approaches [56,57,58,59,60] using the same standard PD dataset.

5. Conclusions and Future Work

A Bayesian Optimization-Support Vector Machine (BO-SVM) model was proposed for classifying Parkinson’s disease (PD) patients and non-patients in this study. The dataset used consisted of 195 instances with 23 features and the target feature was binary, with 1 indicating PD and 0 indicating no PD. Six machine learning models (SVM, Random Forest, Logistic Regression, Naive Bayes, Ridge Classifier, and Decision Tree) were evaluated using four metrics (accuracy, F1-score, recall, and precision) both before and after hyperparameter tuning using BO. The results showed that SVM outperformed the other models, achieving an accuracy of 92.3% after BO tuning. Future work for this study could include expanding the dataset used to classify Parkinson’s disease to include more diverse and representative sample populations. Additionally, incorporating more advanced machine learning techniques, such as deep learning, could lead to even better results in terms of accuracy and performance. Another area for improvement could be exploring different types of feature selection methods to identify the most important features for the classification task. Finally, validating the results on a separate independent dataset could provide further confidence in the robustness and generalizability of the proposed BO-SVM model. The future direction of this study could be generalization to a larger population and hence potential integration into a larger healthcare system using the Internet of Medical Things and fog computing.

Author Contributions

Conceptualization, A.M.E. (Ahmed M. Elshewey), M.Y.S., Z.T., S.M.S., N.E.-R. and A.M.E. (Abdelghafar M. Elhady); methodology, A.M.E. (Ahmed M. Elshewey), M.Y.S., Z.T., S.M.S. and N.E.-R.; software, M.Y.S., A.M.E. (Ahmed M. Elshewey), Z.T., S.M.S. and N.E.-R.; validation, M.Y.S., A.M.E. (Ahmed M. Elshewey), Z.T., S.M.S., A.M.E. (Abdelghafar M. Elhady) and N.E.-R.; formal analysis, M.Y.S., A.M.E. (Ahmed M. Elshewey), Z.T., S.M.S. and A.M.E. (Abdelghafar M. Elhady); investigation, M.Y.S., A.M.E. (Ahmed M. Elshewey), Z.T., S.M.S., A.M.E. (Abdelghafar M. Elhady) and N.E.-R.; resources, M.Y.S., A.M.E. (Ahmed M. Elshewey), Z.T., S.M.S., A.M.E. (Abdelghafar M. Elhady) and N.E.-R.; data curation, M.Y.S., A.M.E. (Ahmed M. Elshewey), Z.T., S.M.S., A.M.E. (Abdelghafar M. Elhady) and N.E.-R.; writing—original draft preparation, M.Y.S., A.M.E. (Ahmed M. Elshewey) and Z.T.; writing—review and editing, S.M.S., A.M.E. (Abdelghafar M. Elhady) and N.E.-R.; visualization, S.M.S., A.M.E. (Abdelghafar M. Elhady) and N.E.-R.; supervision, M.Y.S.; project administration, M.Y.S., A.M.E. (Ahmed M. Elshewey), Z.T., S.M.S., N.E.-R. and A.M.E. (Abdelghafar M. Elhady); funding acquisition, A.M.E. (Abdelghafar M. Elhady). All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code: (23UQU4331164DSR001).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

This dataset is taken from publicly available database [45].

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bloem, B.R.; Okun, M.S.; Klein, C. Parkinson’s Disease. Lancet 2021, 397, 2284–2303. [Google Scholar] [CrossRef] [PubMed]
  2. Mei, J.; Desrosiers, C.; Frasnelli, J. Machine Learning for the Diagnosis of Parkinson’s Disease: A Review of Literature. Front. Aging Neurosci. 2021, 13, 633752. [Google Scholar] [CrossRef] [PubMed]
  3. Landolfi, A.; Ricciardi, C.; Donisi, L.; Cesarelli, G.; Troisi, J.; Vitale, C.; Barone, P.; Amboni, M. Machine Learning Approaches in Parkinson’s Disease. Curr. Med. Chem. 2021, 28, 6548–6568. [Google Scholar] [CrossRef] [PubMed]
  4. Gao, C.; Sun, H.; Wang, T.; Tang, M.; Bohnen, N.I.; Müller, M.L.T.M.; Herman, T.; Giladi, N.; Kalinin, A.; Spino, C. Model-Based and Model-Free Machine Learning Techniques for Diagnostic Prediction and Classification of Clinical Outcomes in Parkinson’s Disease. Sci. Rep. 2018, 8, 7129. [Google Scholar] [CrossRef] [PubMed]
  5. Wang, W.; Lee, J.; Harrou, F.; Sun, Y. Early Detection of Parkinson’s Disease Using Deep Learning and Machine Learning. IEEE Access 2020, 8, 147635–147646. [Google Scholar] [CrossRef]
  6. Berg, D. Biomarkers for the Early Detection of Parkinson’s and Alzheimer’s Disease. Neurodegener. Dis. 2008, 5, 133–136. [Google Scholar] [CrossRef]
  7. Becker, G.; Müller, A.; Braune, S.; Büttner, T.; Benecke, R.; Greulich, W.; Klein, W.; Mark, G.; Rieke, J.; Thümler, R. Early Diagnosis of Parkinson’s Disease. J. Neurol. 2002, 249, iii40–iii48. [Google Scholar] [CrossRef]
  8. Sveinbjornsdottir, S. The Clinical Symptoms of Parkinson’s Disease. J. Neurochem. 2016, 139, 318–324. [Google Scholar] [CrossRef]
  9. DeMaagd, G.; Philip, A. Parkinson’s Disease and Its Management. Pharm. Ther. 2015, 40, 504–532. [Google Scholar]
  10. Shams, M.Y.; Elzeki, O.M.; Abouelmagd, L.M.; Hassanien, A.E.; Abd Elfattah, M.; Salem, H. HANA: A Healthy Artificial Nutrition Analysis Model during COVID-19 Pandemic. Comput. Biol. Med. 2021, 135, 104606. [Google Scholar] [CrossRef]
  11. Salem, H.; Shams, M.Y.; Elzeki, O.M.; Abd Elfattah, M.; Al-Amri, J.F.; Elnazer, S. Fine-Tuning Fuzzy KNN Classifier Based on Uncertainty Membership for the Medical Diagnosis of Diabetes. Appl. Sci. 2022, 12, 950. [Google Scholar] [CrossRef]
  12. Pham, H.; Do, T.; Chan, K.; Sen, G.; Han, A.; Lim, P.; Cheng, T.; Quang, N.; Nguyen, B.; Chua, M. Multimodal Detection of Parkinson Disease Based on Vocal and Improved Spiral Test. In Proceedings of the 2019 International Conference on System Science and Engineering (ICSSE), Dong Hoi, Vietnam, 20–21 July 2019; pp. 279–284. [Google Scholar]
  13. Pereira, C.R.; Pereira, D.R.; Rosa, G.H.; Albuquerque, V.H.C.; Weber, S.A.T.; Hook, C.; Papa, J.P. Handwritten Dynamics Assessment through Convolutional Neural Networks: An Application to Parkinson’s Disease Identification. Artif. Intell. Med. 2018, 87, 67–77. [Google Scholar] [CrossRef] [PubMed]
  14. Choi, H.; Ha, S.; Im, H.J.; Paek, S.H.; Lee, D.S. Refining Diagnosis of Parkinson’s Disease with Deep Learning-Based Interpretation of Dopamine Transporter Imaging. NeuroImage Clin. 2017, 16, 586–594. [Google Scholar] [CrossRef] [PubMed]
  15. Castillo-Barnes, D.; Ramírez, J.; Segovia, F.; Martínez-Murcia, F.J.; Salas-Gonzalez, D.; Górriz, J.M. Robust Ensemble Classification Methodology for I123-Ioflupane SPECT Images and Multiple Heterogeneous Biomarkers in the Diagnosis of Parkinson’s Disease. Front. Neuroinform. 2018, 12, 53. [Google Scholar] [CrossRef]
  16. Nuvoli, S.; Spanu, A.; Fravolini, M.L.; Bianconi, F.; Cascianelli, S.; Madeddu, G.; Palumbo, B. [(123)I] Metaiodobenzylguanidine (MIBG) Cardiac Scintigraphy and Automated Classification Techniques in Parkinsonian Disorders. Mol. Imaging Biol. 2020, 22, 703–710. [Google Scholar] [CrossRef]
  17. Adeli, E.; Shi, F.; An, L.; Wee, C.-Y.; Wu, G.; Wang, T.; Shen, D. Joint Feature-Sample Selection and Robust Diagnosis of Parkinson’s Disease from MRI Data. NeuroImage 2016, 141, 206–219. [Google Scholar] [CrossRef]
  18. Nunes, A.; Silva, G.; Duque, C.; Januário, C.; Santana, I.; Ambrósio, A.F.; Castelo-Branco, M.; Bernardes, R. Retinal Texture Biomarkers May Help to Discriminate between Alzheimer’s, Parkinson’s, and Healthy Controls. PLoS ONE 2019, 14, e0218826. [Google Scholar] [CrossRef] [PubMed]
  19. Das, R. A Comparison of Multiple Classification Methods for Diagnosis of Parkinson Disease. Expert Syst. Appl. 2010, 37, 1568–1572. [Google Scholar] [CrossRef]
  20. Åström, F.; Koker, R. A Parallel Neural Network Approach to Prediction of Parkinson’s Disease. Expert Syst. Appl. 2011, 38, 12470–12474. [Google Scholar] [CrossRef]
  21. Bhattacharya, I.; Bhatia, M.P.S. SVM Classification to Distinguish Parkinson Disease Patients. In Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India, New York, NY, USA, 16 September 2010; pp. 1–6. [Google Scholar]
  22. Chen, H.-L.; Huang, C.-C.; Yu, X.-G.; Xu, X.; Sun, X.; Wang, G.; Wang, S.-J. An Efficient Diagnosis System for Detection of Parkinson’s Disease Using Fuzzy k-Nearest Neighbor Approach. Expert Syst. Appl. 2013, 40, 263–271. [Google Scholar] [CrossRef]
  23. Li, D.-C.; Liu, C.-W.; Hu, S.C. A Fuzzy-Based Data Transformation for Feature Extraction to Increase Classification Performance with Small Medical Data Sets. Artif. Intell. Med. 2011, 52, 45–52. [Google Scholar] [CrossRef] [PubMed]
  24. Eskidere, Ö.; Ertaş, F.; Hanilçi, C. A Comparison of Regression Methods for Remote Tracking of Parkinson’s Disease Progression. Expert Syst. Appl. 2012, 39, 5523–5528. [Google Scholar] [CrossRef]
  25. Nilashi, M.; Ibrahim, O.; Ahani, A. Accuracy Improvement for Predicting Parkinson’s Disease Progression. Sci. Pep. 2016, 6, 34181. [Google Scholar] [CrossRef] [PubMed]
  26. Peterek, T.; Dohnálek, P.; Gajdoš, P.; Šmondrk, M. Performance Evaluation of Random Forest Regression Model in Tracking Parkinson’s Disease Progress. In Proceedings of the 13th International Conference on Hybrid Intelligent Systems (HIS 2013), Gammarth, Tunisia, 4–6 December 2013; pp. 83–87. [Google Scholar]
  27. Karimi-Rouzbahani, H.; Daliri, M. Diagnosis of Parkinson’s Disease in Human Using Voice Signals. Basic Clin. Neurosci. 2011, 2, 12. [Google Scholar]
  28. Ma, A.; Lau, K.K.; Thyagarajan, D. Voice Changes in Parkinson’s Disease: What Are They Telling Us? J. Clin. Neurosci. Off. J. Neurosurg. Soc. Australas. 2020, 72, 1–7. [Google Scholar] [CrossRef]
  29. Mudali, D.; Teune, L.K.; Renken, R.J.; Leenders, K.L.; Roerdink, J.B.T.M. Classification of Parkinsonian Syndromes from FDG-PET Brain Data Using Decision Trees with SSM/PCA Features. Comput. Math. Methods Med. 2015, 2015, 136921. [Google Scholar] [CrossRef]
  30. Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Enhanced Classical Dysphonia Measures and Sparse Regression for Telemonitoring of Parkinson’s Disease Progression. In Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010; pp. 594–597. [Google Scholar]
  31. Shahid, A.H.; Singh, M.P. A Deep Learning Approach for Prediction of Parkinson’s Disease Progression. Biomed. Eng. Lett. 2020, 10, 227–239. [Google Scholar] [CrossRef]
  32. Fernandes, C.; Fonseca, L.; Ferreira, F.; Gago, M.; Costa, L.; Sousa, N.; Ferreira, C.; Gama, J.; Erlhagen, W.; Bicho, E. Artificial Neural Networks Classification of Patients with Parkinsonism Based on Gait. In Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), New York, NY, USA, 3–6 December 2018. [Google Scholar]
  33. Ghassemi, N.H.; Marxreiter, F.; Pasluosta, C.F.; Kugler, P.; Schlachetzki, J.; Schramm, A.; Eskofier, B.M.; Klucken, J. Combined Accelerometer and EMG Analysis to Differentiate Essential Tremor from Parkinson’s Disease. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2016, 2016, 672–675. [Google Scholar] [CrossRef]
  34. Abiyev, R.H.; Abizade, S. Diagnosing Parkinson’s Diseases Using Fuzzy Neural System. Comput. Math. Methods Med. 2016, 2016, 1267919. [Google Scholar] [CrossRef]
  35. Vlachostergiou, A.; Tagaris, A.; Stafylopatis, A.; Kollias, S. Multi-Task Learning for Predicting Parkinson’s Disease Based on Medical Imaging Information. In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 2052–2056. [Google Scholar]
  36. Liu, L.; Wang, Q.; Adeli, E.; Zhang, L.; Zhang, H.; Shen, D. Feature Selection Based on Iterative Canonical Correlation Analysis for Automatic Diagnosis of Parkinson’s Disease. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17–21 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 1–8. [Google Scholar]
  37. Drotár, P.; Mekyska, J.; Rektorová, I.; Masarová, L.; Smékal, Z.; Faundez-Zanuy, M. Analysis of In-Air Movement in Handwriting: A Novel Marker for Parkinson’s Disease. Comput. Methods Programs Biomed. 2014, 117, 405–411. [Google Scholar] [CrossRef]
  38. Eesa, A.S.; Arabo, W.K. A Normalization Methods for Backpropagation: A Comparative Study. Sci. J. Univ. Zakho 2017, 5, 319–323. [Google Scholar] [CrossRef]
  39. Victoria, A.H.; Maragatham, G. Automatic Tuning of Hyperparameters Using Bayesian Optimization. Evol. Syst. 2021, 12, 217–223. [Google Scholar] [CrossRef]
  40. Cho, H.; Kim, Y.; Lee, E.; Choi, D.; Lee, Y.; Rhee, W. Basic Enhancement Strategies When Using Bayesian Optimization for Hyperparameter Tuning of Deep Neural Networks. IEEE Access 2020, 8, 52588–52608. [Google Scholar] [CrossRef]
  41. Hussain, L.; Malibari, A.A.; Alzahrani, J.S.; Alamgeer, M.; Obayya, M.; Al-Wesabi, F.N.; Mohsen, H.; Hamza, M.A. Bayesian Dynamic Profiling and Optimization of Important Ranked Energy from Gray Level Co-Occurrence (GLCM) Features for Empirical Analysis of Brain MRI. Sci. Rep. 2022, 12, 15389. [Google Scholar] [CrossRef] [PubMed]
  42. Eltahir, M.M.; Hussain, L.; Malibari, A.A.; Nour, M.K.; Obayya, M.; Mohsen, H.; Yousif, A.; Ahmed Hamza, M. A Bayesian Dynamic Inference Approach Based on Extracted Gray Level Co-Occurrence (GLCM) Features for the Dynamical Analysis of Congestive Heart Failure. Appl. Sci. 2022, 12, 6350. [Google Scholar] [CrossRef]
  43. Wu, J.; Chen, X.-Y.; Zhang, H.; Xiong, L.-D.; Lei, H.; Deng, S.-H. Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
  44. Kramer, O.; Ciaurri, D.E.; Koziel, S. Derivative-Free Optimization. In Computational Optimization, Methods and Algorithms; Springer: Berlin/Heidelberg, Germany, 2011; pp. 61–83. [Google Scholar]
  45. UCI Machine Learning Repository. Parkinsons Data Set. 2007. Available online: http://archive.ics.uci.edu/ml/datasets/Parkinsons (accessed on 30 December 2022).
  46. Jakkula, V. Tutorial on Support Vector Machine (Svm). Sch. EECS Wash. State Univ. 2006, 37, 3. [Google Scholar]
  47. Cutler, A.; Cutler, D.R.; Stevens, J.R. Random Forests. In Ensemble Machine Learning; Springer: Berlin/Heidelberg, Germany, 2012; pp. 157–175. [Google Scholar]
  48. Nick, T.G.; Campbell, K.M. Logistic Regression. Top. Biostat. 2007, 404, 273–301. [Google Scholar]
  49. Zhang, H. The Optimality of Naive Bayes. Aa 2004, 1, 1–6. [Google Scholar]
  50. Xingyu, M.A.; Bolei, M.A.; Qi, F. Logistic Regression and Ridge Classifier. 2022. Available online: https://www.cis.uni-muenchen.de/~stef/seminare/klassifikation_2021/referate/LogisticRegressionRidgeClassifier.pdf (accessed on 28 January 2023).
  51. Kotsiantis, S.B. Decision Trees: A Recent Overview. Artif. Intell. Rev. 2013, 39, 261–283. [Google Scholar] [CrossRef]
  52. Najwa Mohd Rizal, N.; Hayder, G.; Mnzool, M.; Elnaim, B.M.; Mohammed, A.O.Y.; Khayyat, M.M. Comparison between Regression Models, Support Vector Machine (SVM), and Artificial Neural Network (ANN) in River Water Quality Prediction. Processes 2022, 10, 1652. [Google Scholar] [CrossRef]
  53. Alabdulkreem, E.; Alzahrani, J.S.; Eltahir, M.M.; Mohamed, A.; Hamza, M.A.; Motwakel, A.; Eldesouki, M.I.; Rizwanullah, M. Cuckoo Optimized Convolution Support Vector Machine for Big Health Data Processing. Comput. Mater. Contin. 2022, 73, 3039–3055. [Google Scholar] [CrossRef]
  54. Al Duhayyim, M.; Mohamed, H.G.; Alotaibi, S.S.; Mahgoub, H.; Mohamed, A.; Motwakel, A.; Zamani, A.S.; Eldesouki, M. Hyperparameter Tuned Deep Learning Enabled Cyberbullying Classification in Social Media. Comput. Mater. Contin. 2022, 73, 5011–5024. [Google Scholar] [CrossRef]
  55. Lauraitis, A.; Maskeliūnas, R.; Damaševičius, R.; Krilavičius, T. Detection of Speech Impairments Using Cepstrum, Auditory Spectrogram and Wavelet Time Scattering Domain Features. IEEE Access 2020, 8, 96162–96172. [Google Scholar] [CrossRef]
  56. Li, D.-C.; Hu, S.C.; Lin, L.-S.; Yeh, C.-W. Detecting Representative Data and Generating Synthetic Samples to Improve Learning Accuracy with Imbalanced Data Sets. PLoS ONE 2017, 12, e0181853. [Google Scholar] [CrossRef] [PubMed]
  57. Sajal, M.; Rahman, S.; Ehsan, M.; Vaidyanathan, R.; Wang, S.; Aziz, T.; Mamun, K.A.A. Telemonitoring Parkinson’s Disease Using Machine Learning by Combining Tremor and Voice Analysis. Brain Inform. 2020, 7, 12. [Google Scholar] [CrossRef] [PubMed]
  58. Haritha, K.; Judy, M.V.; Papageorgiou, K.; Georgiannis, V.C.; Papageorgiou, E. Distributed Fuzzy Cognitive Maps for Feature Selection in Big Data Classification. Algorithms 2022, 15, 383. [Google Scholar] [CrossRef]
  59. Abayomi-Alli, O.O.; Damaševičius, R.; Maskeliūnas, R.; Abayomi-Alli, A. BiLSTM with Data Augmentation Using Interpolation Methods to Improve Early Detection of Parkinson Disease. In Proceedings of the 2020 15th Conference on Computer Science and Information Systems (FedCSIS), Sofia, Bulgaria, 6–9 September 2020; pp. 371–380. [Google Scholar]
  60. Fang, L.; Liang, X. A Novel Method Based on Nonlinear Binary Grasshopper Whale Optimization Algorithm for Feature Selection. J. Bionic Eng. 2022, 20, 237–252. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.