Parkinson’s Disease Detection from Resting-State EEG Signals Using Common Spatial Pattern, Entropy, and Machine Learning Techniques

Parkinson’s disease (PD) is a very common brain abnormality that affects people all over the world. Early detection of such abnormality is critical in clinical diagnosis in order to prevent disease progression. Electroencephalography (EEG) is one of the most important PD diagnostic tools since this disease is linked to the brain. In this study, novel efficient common spatial pattern-based approaches for detecting Parkinson’s disease in two cases, off–medication and on–medication, are proposed. First, the EEG signals are preprocessed to remove major artifacts before spatial filtering using a common spatial pattern. Several features are extracted from spatially filtered signals using different metrics, namely, variance, band power, energy, and several types of entropy. Machine learning techniques, namely, random forest, linear/quadratic discriminant analysis, support vector machine, and k-nearest neighbor, are investigated to classify the extracted features. The impacts of frequency bands, segment length, and reduction number on the results are also investigated in this work. The proposed methods are tested using two EEG datasets: the SanDiego dataset (31 participants, 93 min) and the UNM dataset (54 participants, 54 min). The results show that the proposed methods, particularly the combination of common spatial patterns and log energy entropy, provide competitive results when compared to methods in the literature. The achieved results in terms of classification accuracy, sensitivity, and specificity in the case of off-medication PD detection are around 99%. In the case of on-medication PD, the results range from 95% to 98%. The results also reveal that features extracted from the alpha and beta bands have the highest classification accuracy.


Introduction
With age, the number of connections between brain cells reduces and the neurons shrink. Nerve cells, unlike muscle, skin, and bone cells, cannot regenerate themselves. Neurons die or become damaged as people age [1]. Parkinson's disease (PD) is a neurodegenerative disease in which neurons in the substantia nigra of the brain become damaged. These neurons are in charge of producing a substance known as dopamine. Dopamine is a chemical that acts as a messenger between neurons in the brain. It assists the brain in sending messages to various regions of the body in order for it to work properly, particularly when it comes to body movements and speech delivery. PD symptoms appear when a high number of dopaminergic neurons are destroyed or the quantity of dopamine in the brain is abnormal [2]. According to the World Health Organization, around 10 million individuals have been affected as a result of this disease [3,4]. PD becomes more common as people get older, with people in their fifties and older being the most affected. Approximately 4% of people with PD are diagnosed before they reach the age of 50, and males are 1.5 times more likely than women to have the disease [5,6]. Early symptoms may be minor and The aim of the present study is to address these gaps found in previous studies by presenting uncomplicated feature extraction and classification methods while maintaining high classification accuracy and validating them using two public datasets (UNM and SanDiego datasets). It is worth mentioning that the classification accuracy is influenced not just by the classifier utilized but also by the signal's preprocessing and the method of extracting features. In our recent study [35], a CSP-based diagnostic method for identifying epilepsy and ASDs was developed, and the results were promising. These results motivated us to investigate if the CSP approach may yield good biomarkers of PD patients' restingstate EEGs, allowing them to be distinguished from those of healthy people.
Accordingly, in the present study, novel, simple, and effective CSP-based methods are proposed for the detection of PD in two conditions, namely, off-medication PD vs. HC and on-medication PD vs. HC. To the best of our knowledge, we are the first group to present CSP-based methods for the detection of PD. Unlike traditional CSP-variance, CSP is combined with various methods to improve classification accuracy, including energy and band power (BP). In addition, unlike [35], CSP is also combined with log energy entropy, norm entropy, sure entropy, and Shannon entropy, to provide good biomarkers for PD EEGs. Several linear/nonlinear classifiers are applied to classify the resulting PD features from normal ones. The effects of the frequency band, reduction number, and segment length on classification accuracy are also being investigated.
The rest of this paper is laid out as follows. Section 2 describes the used EEG data and the following EEG signal-processing methods: preprocessing, feature extraction, and classification techniques. The rest of this paper is laid out as follows. Section 3 contains the results as well as a discussion. In Section 4, the conclusion is presented, as well as potential future work options.

Methods
The proposed methods for processing EEG signals are described in this section, which includes data description, preprocessing, feature extraction, and classification methods. Figure 1 provides a high-level overview of the different stages through which EEGs from Parkinson's patients and healthy people are analyzed and then classified. The raw EEG signals are read first, then preprocessed to remove artifacts before being band-pass filtered to find the frequency band of interest. The filtered EEG signals are split into non-overlapping segments with an equal time duration. Each segment is spatially filtered using CSP, after which the PD/HC features are extracted using a variety of metrics including variance, band power, energy, log energy entropy, norm entropy, threshold entropy, sure entropy, and Shannon entropy. Finally, to distinguish off/on PD features from HC ones, various classifiers such as random forest (RF), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), support vector machine (SVM), and k-nearest neighbors (KNN) are used (HC). The subsections that follow go over each stage of the block diagram in further detail.

Data Description and Pre-Processing
In this study, the proposed methods are tested using two public EEG datasets. The first dataset is from the University of San Diego, California [36]. This dataset is referred to as the "SanDiego dataset." The subjects of this dataset were asked to sit comfortably and relax their eyes by fixating on a cross on a screen during data collection. There are two groups in the dataset. The EEGs of 16 healthy subjects with a mean age of 63.5 ± 9.6 standard deviation years, 9 females and 7 males, make up the first group. The EEGs of 15 PD patients, 8 females and 7 males, with a mean age of 63.2 ± 8.2 standard deviation years, make up the second group. As determined by the Mini-Mental State Exam and the North American Adult Reading Test, the right-handedness, gender, age, and cognition of the PD patients were quite similar to those of the HC. All of the patients had mild to severe Parkinson's disease (Hoehn and Yahr scale II and III), with an average disease duration of 4.5 to 3.5 years. To get information on EEG on and off medication, data from PD patients was obtained on two separate days. For the on-medication session, the participants brought their usual medication regimen with them to the recording session. The patients had been taking their medications for about 12 h when they agreed to participate in the off-medication session. The healthy subjects only volunteered once. Using a 32-channel Biosemi Active Two EEG system, EEG signals were recorded for at least 3 min at a sampling frequency of 512 Hz. In addition to the 32 EEG channels, each recording has eight EXG channels. The preprocessing was conducted in Matlab using EEGLAB by removing the mean of each channel and re-referencing all of the data to the common average (excluding excessively noisy electrodes). To reduce low-frequency drift, high pass filtering at 0.5 Hz was applied. Eye blinks and movements, muscle activity, electrical noise, and other sorts of noise were manually examined and removed. This dataset's specifics, including signal capture and preprocessing, are detailed in [37].

Data Description and Pre-Processing
In this study, the proposed methods are tested using two public EEG datasets. The first dataset is from the University of San Diego, California [36]. This dataset is referred to as the "SanDiego dataset." The subjects of this dataset were asked to sit comfortably and relax their eyes by fixating on a cross on a screen during data collection. There are two groups in the dataset. The EEGs of 16 healthy subjects with a mean age of 63.5 ± 9.6 standard deviation years, 9 females and 7 males, make up the first group. The EEGs of 15 PD patients, 8 females and 7 males, with a mean age of 63.2 ± 8.2 standard deviation years, make up the second group. As determined by the Mini-Mental State Exam and the North American Adult Reading Test, the right-handedness, gender, age, and cognition of the PD patients were quite similar to those of the HC. All of the patients had mild to severe Parkinson's disease (Hoehn and Yahr scale II and III), with an average disease duration of 4.5 to 3.5 years. To get information on EEG on and off medication, data from PD patients was obtained on two separate days. For the on-medication session, the participants brought their usual medication regimen with them to the recording session. The patients had been taking their medications for about 12 h when they agreed to participate in the off-medication session. The healthy subjects only volunteered once. Using a 32-channel Biosemi Active Two EEG system, EEG signals were recorded for at least 3 min at a sampling frequency of 512 Hz. In addition to the 32 EEG channels, each recording has eight EXG channels. The preprocessing was conducted in Matlab using EEGLAB by removing the mean of each channel and re-referencing all of the data to the common average (excluding excessively noisy electrodes). To reduce low-frequency drift, high pass filtering at 0.5 Hz was applied. Eye blinks and movements, muscle activity, electrical noise, and other sorts of noise were manually examined and removed. This dataset's specifics, including signal capture and preprocessing, are detailed in [37]. The second set of data comes from a study conducted by the University of New Mexico (UNM; Albuquerque, NM, USA). For simplicity, in the present study, this dataset is referred to as the UNM dataset. This dataset contains the EEGs of 27 Parkinson's disease patients and 27 healthy subjects of equal gender (17 females and 10 males). The mean age plus standard deviation for the PD group is 69.52 ± 8.56 years, while the mean age plus standard deviation for the HC group is 69.52 ± 9.27 years. In terms of age and sex, control subjects and PD patients were demographically matched, and no variations in education or premorbid IQ were discovered. The PD group visited the lab twice, seven days apart, the first time while on medication and the second time after a 15-h overnight withdrawal from their individual dopaminergic pharmacological prescriptions. As a result, the data set includes information from 27 Parkinson's disease patients who were on and off therapy. Data were collected for two minutes for each patient and control group; first, they were instructed to keep their eyes closed for one minute, and then they were asked to record for another minute with their eyes open. For a total of 68 channels, sintered Ag/AgCl electrodes were used for 64 EEG channels, 65 VEOG channels, and 65-67 XYZ accelerometer channels on hand (variable L or R). The sampling rate was 500 samples per second. The Brain Vision data collection system was used with an online CPz reference and an AFz terminal grounded. The paper [38] goes into greater detail about how the data was gathered. Table 2 provides a summary of both the SanDiego and the UNM datasets.
In this study, certain superfluous channels were eliminated from the SanDiego and UNM datasets. The 8 EXG channels (non-EEG channels) were removed from the SanDiego dataset. As a result, each individual has only 32 EEG channels, as shown in Figure 2. The VEOG channel and the XYZ accelerometer have also been removed from the UNM dataset. Figure 3 depicts electrode maps and EEG power spectral density (on a logarithmic scale) for off-PD, on-PD, and HC EEGs. The electrode map is shown for three distinct arbitrary frequencies: 6, 10, and 22 Hz. In general, the power density of the low-frequency spectrum is higher than that of the high-frequency spectrum. Different power spectral density patterns can be seen when comparing the three maps.
For further preprocessing, the EEG signals are divided into M segments with a size of (ch × N), where ch denotes the number of channels and N specifies the number of EEG samples per channel in a given time interval T. The segmented signals are then filtered with a fifth-order band-pass Butterworth filter to remove the interference and noise caused by the electrodes and magnetic fields. The choice of the segmentation time interval T and the frequency band of the filter will be investigated later in this paper. apy. Data were collected for two minutes for each patient and control group; first, they were instructed to keep their eyes closed for one minute, and then they were asked to record for another minute with their eyes open. For a total of 68 channels, sintered Ag/AgCl electrodes were used for 64 EEG channels, 65 VEOG channels, and 65-67 XYZ accelerometer channels on hand (variable L or R). The sampling rate was 500 samples per second. The Brain Vision data collection system was used with an online CPz reference and an AFz terminal grounded. The paper [38] goes into greater detail about how the data was gathered. Table 2 provides a summary of both the SanDiego and the UNM datasets. In this study, certain superfluous channels were eliminated from the SanDiego and UNM datasets. The 8 EXG channels (non-EEG channels) were removed from the SanDiego dataset. As a result, each individual has only 32 EEG channels, as shown in Figure 2. The VEOG channel and the XYZ accelerometer have also been removed from the UNM dataset. Figure 3 depicts electrode maps and EEG power spectral density (on a logarithmic scale) for off-PD, on-PD, and HC EEGs. The electrode map is shown for three distinct arbitrary frequencies: 6, 10, and 22 Hz. In general, the power density of the low-frequency spectrum is higher than that of the high-frequency spectrum. Different power spectral density patterns can be seen when comparing the three maps.
For further preprocessing, the EEG signals are divided into segments with a size of ( ℎ × ), where ch denotes the number of channels and specifies the number of EEG samples per channel in a given time interval . The segmented signals are then filtered with a fifth-order band-pass Butterworth filter to remove the interference and noise caused by the electrodes and magnetic fields. The choice of the segmentation time interval and the frequency band of the filter will be investigated later in this paper.

Common Spatial Pattern
For discriminating between the off/on PD class and HC class, the CSP algorithm is employed as a spatial filter that leads to peak variances [39]. For simplicity, in what follows, these two classes will be denoted by PD and HC, respectively. A set of CSP filters

Common Spatial Pattern
For discriminating between the off/on PD class and HC class, the CSP algorithm is employed as a spatial filter that leads to peak variances [39]. For simplicity, in what follows, these two classes will be denoted by PD and HC, respectively. A set of CSP filters make up the projection matrix W CSP , which is computed only once using the entire training dataset. This is done by first calculating the normalized spatial covariance for both classes as follows and C HC = E HC E HC trace(E HC E HC ) (1) where E PD and E HC denote the EEG segments under two conditions (PD and HC) of size ch × N, where ch denotes the number of channels and N denotes the number of samples per channel in each segment. E is the transpose of E, and trace(EE ) is the sum of the diagonal elements of EE . Then, the averaged normalized covariances C PD and C HC are calculated by averaging all of the segments of each class. The overall composite spatial covariance is given by This covariance matrix is factorized into eigenvalues and eigenvectors as follows where U C is the matrix of the eigenvectors and λ C is the diagonal matrix of the eigenvalues arranged in descending order. Subsequently, the whitening transformation P is obtained by computing This is used to transform the covariance matrices of the two classes into S PD = P C PD P and S HC = P C HC P The sum of the eigenvalues of S PD and S HC should be an identity matrix, and S PD and S HC should have the same eigenvectors, i.e., where B is any orthonormal matrix that satisfies The eigenvector corresponding to the largest eigenvalue for S PD have the smallest eigenvalues for S HC , and vice versa. This demonstrates that the maximization of the eigenvalues of one class at a specific point corresponds to the minimization of the eigenvalues of the other class at the same point. Thus, the covariance between the two classes is successfully maximized. The projection matrix W CSP is defined by which is composed of a set of CSP filters. The first CSP filter w 1 corresponds to the maximum variance of PD class while the last CSP filter w ch provides the maximum variance of HC class. For dimensionality reduction, only the first and last m filters will be used, such that W CSP is redefined as follows where d = 2m is the reduction number. The reduction number is the number by which the channels should be reduced. The process of feature extraction starts by filtering each EEG segment using W CSP to obtain the filtered segment S is given by

Feature Extraction (FE)
In conventional CSP, the variance measurement is used to calculate the feature vectors In the present study, the use of several additional metrics, namely, band power, energy, and entropy are investigated. The features based on band power and energy are given by Band power (BP) [40] Energy (Eng) [40] f Entropy is a metric that is commonly used for evaluating the complexity, regularity, and statistical quantification of time series. Multiple studies have shown that entropy may be used to analyze and establish biomarkers for a number of diseases, including epilepsy [41], attention deficit hyperactivity disorder [42], and autism [43]. This motivates us to look at using entropy as a method for identifying Parkinson's disease. In the present study, instead of computing entropy directly from EEG data, it is proposed to compute it from the spatially filtered segment S 2m×N , which may aid in the development of appropriate biomarkers for PD identification.
Several types of entropy are investigated in this work: Shannon entropy, norm entropy, threshold entropy, sure entropy, and log energy entropy. These metrics are defined as follows. If k is the number of unique values in the discrete signal s j (n) and x i is the probability frequency of the ith unique value, then the entropy features f j are given by: Threshold entropy (ThEn) [44] f j (ThEn) = #{i such that |x i | > α} (16) where ThEn is the number of time instants for which the signal is greater than a threshold α. The threshold is set to 0.2 based on trial-and-error to obtain the best accuracy. Norm entropy (NoEn) [44] where p is the power of the entropy and must be such that 1 ≤ p. In this study, it is selected to be 1.1. Sure entropy (SuEn) [44] where q is the threshold value, and usually > 2. In the present study, it is selected to be 3. Log energy entropy (LogEn) [44]  Shannon entropy (ShEn) [44] A feature vector f of length d is extracted from each EEG segment by filtering it using CSP and then computing one of the above metrics. The size of the resulting feature matrix is M × d, where M denotes the number of segments and d denotes the reduction number. Later on, the effects of M and d on the classification accuracy of PD vs. HC are investigated. In the following subsection, the classification methods and cross-validation stages are described.

Classification and Problem Formulations
In this study, a number of commonly used classification approaches to distinguish between PD and HC features are applied: bagging-based RF, LDA, QDA, quadratic kernelbased SVM, and KNN (kn = 3). The goal is to compare them and see which one produces the best outcomes in terms of off-PD/on-PD against HC classification. A detailed description of these classification methods can be found in [45][46][47][48].
The primary goal of this research is to detect Parkinson's disease in individuals who are in an off-medication state and to distinguish them from those in the healthy control group. Due to the variety of data sets and conditions under which they were obtained, several classification problem formulations are considered and summarized in Table 3. Table 3. Summary of the addressed classification problem formulations in the present study.

Classification Problem Used Dataset Problem Description
Open Close-eyes off-PD vs. HC UNM When the eyes are closed, differentiate off-medication PD patients from the healthy control group.
Close-eyes on-PD vs. HC UNM When the eyes are closed, differentiate on-medication PD patients from the healthy control group.
Close-eyes off-PD vs. on-PD UNM When the eyes are closed, differentiate off-medication PD patients from on-medication PD patients.

Performance Evaluation
In this study, several methods to evaluate the performance of the developed classification models are used: classification accuracy, sensitivity, specificity, F-score, and receiver operating characteristic (ROC) curve. The classification accuracy (CA) is given by where N total denotes the total number of feature vectors to be classified, and N correct denotes the number of feature vectors that are correct. For binary classification, the following formula can also be used to calculate accuracy in terms of the number of positive and negative predictions: where TP = #True Positives, FP = #False Positives, TN = #True Negatives, and FN = #False Negatives. The sensitivity also called recall or true positive rate (TPR), indicates the ability of a classification model to correctly identify patients with the disease. On the other hand, the specificity, also called the true negative rate (TNR), indicates the ability of a classification model to correctly identify people without the disease. The sensitivity and specificity are defined by [49]: The Precision metric quantifies the number of correct positive predictions made. In this study, the F-score is adopted since it provides a way to combine both precision and sensitivity into a single measure. F-score is defined as follows where Precision is calculated as In addition to the above performance metrics, the ROC curve is also evaluated. The ROC curve is a graphical illustration of how a test's TPR (sensitivity) and FPR (1-specificity) differ from one another. The AUC (area under the ROC curve) is a commonly used metric for assessing the detection performance. Good classifiers are characterized by AUC values that are close to 1. More information on ROC-AUC curves can be found in [50].
A k-fold cross-validation technique is implemented to obtain a reliable performance evaluation for the proposed classification models. k = 10 is utilized in all of our experiments, which divides the dataset into ten equal subsets, one for validation (test) and the other nine for training [51]. The technique of cross-validation is repeated ten times (10-fold) by changing the test and training subsets. Equations (21)-(26) are used to evaluate the classification performance at each round. Each performance measure is averaged over the ten rounds to produce a single classification measure. Figure 4 depicts the stages in which Parkinson's patients' and healthy people's EEGs are processed during the training and test phases. As previously discussed, the data are initially separated into two parts: 90% of the data for training and 10% for testing. The training phase starts with BPF filtering of the training data. The filtered signals are then split into M equal segments, each with a size of ch × N. The number of segments is proportional to the length of each segment: the longer the segment, the lower the M, and vice versa. After dividing the signals, CSP is performed to all of the segments (including PD and HC) acquired using Equation (1) through Equation (10) to produce the projection matrix W ch×ch , which contains a set of CSP filters. The dimensionality of this matrix is then reduced by picking the first and last m filters to obtain the projection matrix W d×ch , as described in Equation (11). W d×ch is then used to filter (multiply) each segment, as described in Equation (12). As a result, the size of each filtered segment is d × N. The next step is to create one feature vector f from each filtered segment, where the number of feature vectors is equal to the number of segments M. The number of elements in each feature vector is d: The elements of this vector are calculated using variance, energy, BP, or entropy according to Equations (13)- (20). The final step in the training phase is to train a classifier (RF, LDA, QDA, SVM, or KNN) using the feature vectors derived from the previous step. This concludes the training phase. In the testing phase, the test data subset is filtered with the same BPF and segmented in the same way as the training data. The difference here is that the CSP filters and its projection matrix are not recomputed. Instead, the same W d×ch matrix created during the training phase is reused in the testing phase. The feature vectors are then created in a similar fashion to the training phase. The final step in the testing phase is to classify the test feature vectors using the classifiers that have been trained in the training phase to predict whether it belongs to PD or HC. The classification performance is then computed using Equations (21)-(26), with the cross-validation technique. variance, energy, BP, or entropy according to Equations (13)- (20). The final step in the training phase is to train a classifier (RF, LDA, QDA, SVM, or KNN) using the feature vectors derived from the previous step. This concludes the training phase. In the testing phase, the test data subset is filtered with the same BPF and segmented in the same way as the training data. The difference here is that the CSP filters and its projection matrix are not recomputed. Instead, the same × matrix created during the training phase is reused in the testing phase. The feature vectors are then created in a similar fashion to the training phase. The final step in the testing phase is to classify the test feature vectors using the classifiers that have been trained in the training phase to predict whether it belongs to PD or HC. The classification performance is then computed using Equations (21)-(26), with the cross-validation technique.

Results and Discussion
To verify the methods proposed here, two datasets from two different sources are used: SanDiego and UNM datasets. Because the datasets contain different states and conditions, as well as the diversity of the proposed methods, the results are presented in two separate subsections: SanDiego dataset-based results and UNM dataset-based results.

SanDiego Dataset Results
With this dataset, three classification problems are addressed: off-medication patients versus the healthy control group, on-medication patients versus the healthy control group, and off-medication patients versus on-medication patients, when the eyes are open.

Results and Discussion
To verify the methods proposed here, two datasets from two different sources are used: SanDiego and UNM datasets. Because the datasets contain different states and conditions, as well as the diversity of the proposed methods, the results are presented in two separate subsections: SanDiego dataset-based results and UNM dataset-based results.

SanDiego Dataset Results
With this dataset, three classification problems are addressed: off-medication patients versus the healthy control group, on-medication patients versus the healthy control group, and off-medication patients versus on-medication patients, when the eyes are open.

Off-Medication PD vs. Healthy Control
As the main problem for PD testing, the classification results of off-medication PD patients against a healthy control group are provided and discussed in this part. Each channel's signal is fed into a 0.5-32 Hz BPF before being split into 606 non-overlapping 10-s segments (M = 606). A total of 300 segments are acquired from PD patients, while 306 segments are obtained from HC patients. Each segment is transformed into a feature vector of length 32 (d = 32) using the proposed FE methods. This results in a feature matrix with a length of 606 × 32 that is then sent to the KNN classifier. The eight FE methods are listed in Tables 4 and 5 together with their corresponding classification accuracy, sensitivity, specificity, and F-score results. Table 5 presents the findings for features extracted from CSP filtered signals, whereas Table 4 shows the results when CSP is not applied. For each feature extraction method, ten outcome values (classification accuracy, sensitivity, specificity, and F-value) are generated using 10-fold cross-validation. For each method, the average performance of the ten values is calculated, along with their standard deviation (mean ± st). The results show a great improvement when CSP is applied. When adopting CSP, the average classification accuracy of the ShEn technique, for example, rises from 75.27% to 91.91%. The CSP+Var and CSP+LogEn methods produce the best performance, with average classification accuracies of 96.37% and 94.22%, respectively. Other methods, such as CSP+Eng, CSP+LBP, and CSP+NoEn, have a classification accuracy of above 93%. In comparison to Table 4, the standard deviation values in Table 5 have been reduced. Table 5 shows that the CSP+ThEn and CSP+SuEn feature extraction methods have the worst performance. As a result, neither of these methods will be further investigated. For further examination, four more classification algorithms are used in addition to KNN. Figure 5 compares the classification accuracy of RF, LDA, QDA, and SVM techniques applied to features extracted using all of the proposed methods. With all the FE methods, the KNN and RF classifiers achieve the highest classification accuracy and lowest standard deviation. Figure 6 presents ROC curves and AUC for the five classifiers. The KNN and RF classifiers have the highest AUC for all FE methods, whereas the LDA, QDA, and SVM have the lowest AUC. These results indicate that KNN consistently outperforms the other classifiers in terms of AUC, ROC, and accuracy.

Investigation of Frequency Bands
EEG signals have a frequency range of 0 to 100 Hz, which is typically decomposed into five sub-bands: delta (<4 Hz), theta (4-8 Hz), alpha (8-13 Hz), beta (13-30 Hz), and gamma (>30 Hz). In this subsection, the aim is to find the sub-bands that lead to the highest PD classification performance. The gamma range has been eliminated from this study due to artifacts that may adversely affect the classification accuracy. Figure 7

Investigation of Frequency Bands
EEG signals have a frequency range of 0 to 100 Hz, which is typically decomposed into five sub-bands: delta (<4 Hz), theta (4-8 Hz), alpha (8-13 Hz), beta (13-30 Hz), and gamma (>30 Hz). In this subsection, the aim is to find the sub-bands that lead to the highest PD classification performance. The gamma range has been eliminated from this study due to artifacts that may adversely affect the classification accuracy. Figure 7 depicts the classification accuracy when applying all of the proposed methods to different EEG sub-bands. According to the results shown in the figure (dark blue and yellow), in the highest subsection, our accuracies are acquired from beta and alpha. Because important information may not be concentrated in a single sub-band, the effect of combining two or more sub-bands on the classification performance is also being investigated. It can be seen from Table 6 that the frequency bands formed from both alpha and beta sub-bands lead to the highest classification accuracy. It is important to mention here that the CSP+LogEn method leads to higher classification accuracies compared to other FE methods. According to the results, the highest accuracy is obtained when the EEG signals are filtered using a 10-30-Hz band-pass filter. subsection, our accuracies are acquired from beta and alpha. Because important information may not be concentrated in a single sub-band, the effect of combining two or more sub-bands on the classification performance is also being investigated. It can be seen from Table 6 that the frequency bands formed from both alpha and beta sub-bands lead to the highest classification accuracy. It is important to mention here that the CSP+LogEn method leads to higher classification accuracies compared to other FE methods. According to the results, the highest accuracy is obtained when the EEG signals are filtered using a 10-30-Hz band-pass filter.

Investigation of Reduction Number
As previously discussed, the dimensionality of the ch × ch CSP projection matrix is reduced by picking only the first m and last m CSP filters resulting in a matrix W CSP with reduced dimension d × ch, where the reduction number d is equal to d = 2m. Complexity can be minimized by reducing this number since the size of feature vectors is equal to d. On the other hand, choosing a very small d may lead to poor classification performance. Figure 8 presents the classification accuracies of off-PD versus HC signals filtered at 8-30 Hz with various d values. It is clear from the figure that the results are influenced greatly by the choice of d. In the case of the CSP+Var+KNN method, as the value of d increases from 2 to 10, the classification accuracy curve starts to significantly increase from 66% to 96.38% and then stabilizes or slightly increases after that. Similarly, in the case of CSP+LogEn+KNN, the classification accuracy curve begins to increase significantly from 77% to 98.35% at d = 10 and then stabilizes or increases slightly thereafter. The optimal value of d, which leads to the highest classification accuracy, depends on several factors such as frequency band, FE method, classifier type, and others. In Figure 8 In the case of the CSP+Var+KNN method, as the value of increases from 2 to 10, the classification accuracy curve starts to significantly increase from 66% to 96.38% and then stabilizes or slightly increases after that. Similarly, in the case of CSP+LogEn+KNN, the classification accuracy curve begins to increase significantly from 77% to 98.35% at = 10 and then stabilizes or increases slightly thereafter. The optimal value of , which leads to the highest classification accuracy, depends on several factors such as frequency band, FE method, classifier type, and others. In Figure 8, the highest classification accuracies for CSP+Var and CSP+LogEn are 98.19% and 99.17% obtained at = 30 and = 24, respectively, with the KNN classifier.

Investigation of Segment Length (SL)
Thus far, all of the signals have been split into 10-s segments. In this section, the effect of the segment length on the classification results is investigated. Because the highest classification accuracy is obtained by the CSP+LogEn FE method, it is used in this investigation. The BPF is also set to 10-30 Hz throughout all experiments. Table 7 presents the effect of the segment length along with the reduction number on the KNN classification perfor-

Investigation of Segment Length (SL)
Thus far, all of the signals have been split into 10-s segments. In this section, the effect of the segment length on the classification results is investigated. Because the highest classification accuracy is obtained by the CSP+LogEn FE method, it is used in this investigation. The BPF is also set to 10-30 Hz throughout all experiments. Table 7 presents the effect of the segment length along with the reduction number on the KNN classification performance. The table contains 72 outcomes, where the segment length is increased from 2 to 12 s while d is increased from 10 to 32. Decreasing the segment length leads to an increase in the number of segments M resulting in a larger number of feature vectors that are introduced to the classifier for training and validation. For example, the number of segments M equals 3032 when the segment length is 2 s while M decreases to 505 when the segment length increases to 12 s. According to the results in Table 7, there is a small improvement in the classification accuracy with decreasing segment length, especially at higher values of d. For example, at d = 32, average classification accuracy is increased from 98.42% to 99.41% when segment length decreased from 12 to 2 s. At a segment length of 2 s and d = 30, the highest classification accuracy of 99.47% is obtained by a combination of CSP+LogEn+KNN. At a segment length of 2 s and d = 32, Table 8 shows the classification performance of RF, QDA, SVM, and KNN. The table demonstrates that the KNN classifier still outperforms the other classifiers, in terms of classification accuracy, sensitivity, specificity, and F-score.

On-Medication PD vs. Health Control
This subsection presents and discusses the classification performance results of the on-medication patients vs. the healthy control group. For the purpose of consistency and to facilitate comparisons with the results of Section 3.1.1 (off-medication PD vs. health control), the reduction number is set to 32, frequency band to alpha and beta, and segment length to 10 s. The number of segments retrieved from on-medication patients is 270, whereas the healthy control group has 306 segments. As a result, there are a total of 603 feature vectors. Table 9 shows the same eight feature extraction methods that were used in Table 5 and their classification accuracy, sensitivity, specificity, and F-score as obtained with the KNN classifier. Similarly, the average performance and standard deviation are presented for each method. Table 9 shows that CSP+Var and CSP+LogEn FE methods achieve the best performance with an average classification of 92.87% and 92.85%, respectively. These two methods are the most effective for distinguishing between EEGs of on-PD patients and EEGs of the healthy control group. The results of these FE methods are bolded in Table 9. Similar to off-PD vs. HC classification, CSP+ThEn and CSP+SuEn FE methods provide low classification accuracy and high standard deviation. Consequently, these two methods are also excluded from further investigation.  Figure 9 shows the average classification accuracy using RF, QDA, SVM, and KNN. Unlike Figure 5, there is no specific classifier that outperforms other classifiers in performance across all FE methods. For example, KNN outperforms other classifiers with the CSP+Var FE method, QDA is the best when coupled with the CSP+LogEn method, and SVM outperforms the rest with the CSP+Eng/LBP method. Figure 10 shows ROC curves along with AUC for the four classifiers. Except for CSP+LogEn, the KNN and RF classifiers deliver the highest AUC over all FE methods. In the case of CSP+LogEn, SVM and QDA achieve the highest AUC.  Figure 9 shows the average classification accuracy using RF, QDA, SVM, and KNN. Unlike Figure 5, there is no specific classifier that outperforms other classifiers in performance across all FE methods. For example, KNN outperforms other classifiers with the CSP+Var FE method, QDA is the best when coupled with the CSP+LogEn method, and SVM outperforms the rest with the CSP+Eng/LBP method. Figure 10 shows ROC curves along with AUC for the four classifiers. Except for CSP+LogEn, the KNN and RF classifiers deliver the highest AUC over all FE methods. In the case of CSP+LogEn, SVM and QDA achieve the highest AUC.  In terms of the frequency band, Figure 11 shows the classification accuracy for all proposed methods when applied to different EEG sub-bands. Similar to off-PD vs. HC classification, the features extracted from the beta band are classified more precisely than others. Alpha and theta bands come in the second rank. Features extracted from the delta band lead to the worst classification accuracy. As result, the delta band is excluded from further investigation. Table 10 shows the KNN classification performance of features extracted from several frequency bands. It can also be noted from the table that the presence of the beta band within any wider frequency band improves the classification performance.
For example, the features extracted from the 4-30 Hz frequency band are more accurately classified than those extracted from 4-13 Hz. Over all frequency bands, it can be seen that QDA and SVM classifiers deliver higher classification accuracy than KNN, especially with the CSP+LogEn FE method. In terms of the frequency band, Figure 11 shows the classification accuracy for all proposed methods when applied to different EEG sub-bands. Similar to off-PD vs. HC classification, the features extracted from the beta band are classified more precisely than others. Alpha and theta bands come in the second rank. Features extracted from the delta band lead to the worst classification accuracy. As result, the delta band is excluded from further investigation. Table 10 shows the KNN classification performance of features extracted from several frequency bands. It can also be noted from the table that the presence of the beta band within any wider frequency band improves the classification performance. For example, the features extracted from the 4-30 Hz frequency band are more accurately classified than those extracted from 4-13 Hz. Over all frequency bands, it can be seen that QDA and SVM classifiers deliver higher classification accuracy than KNN, especially with the CSP+LogEn FE method.
Next, the effect of changing the segment length for the on-PD vs. HC classification problem is considered. The BPF is set to 10-32 Hz for this investigation. Table 11 shows the classification results of features extracted by the CSP+LogEn method and classified by RF, QDA, SVM, and KNN. Results show that the effect of changing the segment length in this classification problem is small. For example, when decreasing the segment length from 12 s to 2 s, the accuracy changes from 91.46% to 92.25 and 93.24% to 93.38% for RF and KNN, respectively. At a segment length of 8 s, CSP+LogEn+SVM delivers the highest classification accuracy of 95.76%.    Next, the effect of changing the segment length for the on-PD vs. HC classification problem is considered. The BPF is set to 10-32 Hz for this investigation. Table 11 shows the classification results of features extracted by the CSP+LogEn method and classified by RF, QDA, SVM, and KNN. Results show that the effect of changing the segment length in this classification problem is small. For example, when decreasing the segment length from 12 s to 2 s, the accuracy changes from 91.46% to 92.25 and 93.24% to 93.38% for RF and KNN, respectively. At a segment length of 8 s, CSP+LogEn+SVM delivers the highest classification accuracy of 95.76%.

Off-PD vs. On-PD
In this section, the classification performance results of off-medication versus onmedication patients are discussed. The purpose of this classification is to assess the effectiveness of the methods proposed in this study. Table 12 shows KNN classification performance with the following settings: 13-30 Hz, d = 32, and a segment length of 2 s. The number of segments extracted is 2988 segments. It can be seen that, similar to off-PD vs. HC and on-PD vs. HC, the CSP+LogEn FE method outperforms other methods with an average classification accuracy of 97.52% and a standard deviation of 0.95.

UNM-Based Results
The proposed methods are also tested and validated using the UNM dataset in this section. Based on our findings using the SanDiego dataset, the frequency band is set to 10-30 Hz, the reduction number to 32 and the segment length to 2 s for all UNM-based experiments. CSP+Var, CSP+Eng, CSP+LBP, and CSP+LogEn FE methods are used with two states: open-eyes and close-eyes.
In the case of off-medication PD versus healthy control classification, the total number of the feature vectors is: 1620 (810 PD + 810 HC) for the open-eyes state and 1593 (810 PD + 783 HC) for the closed-eyes state. Table 13 includes the results of off-medication PD patients versus health control group classification using FR, SVM, and KNN classifiers. Two observations can be made from this table. The first is that the CSP+LogEn method achieves the highest classification accuracy, either in the case of open eyes or closed eyes, compared to the other FE methods. The highest classification accuracy of off-PD versus HC with the open-eyes state is 99.01%, which is obtained by the CSP+LogEn+KNN approach. In the case of the close-eyes state, 98.81% is the highest classification accuracy obtained by the same approach. The second observation is that there is not a big difference in the values of classification accuracy in both cases: the open-eyes and the closed-eyes states.
In the case of on-medication PD versus healthy control classification, the total number of the feature vectors is: 1650 (840 PD + 810 HC) for the open-eyes state and 1623 (840 PD + 783 HC) for the closed-eyes state. Table 14 Table 15. As it can be seen from the table, our proposed methods achieve good performance. Finally, the significance of this study can be evaluated by comparing the outcomes of the proposed methods to those of earlier studies. Table 16 compares our results to those of prior studies on Parkinson's disease detection in the resting state. As seen in the table, the proposed methods in the present study achieve good performance using computationally efficient methods compared with other methods in the previous studies. The main advantages of our methods can be summarized as follows:

•
The proposed methods are simple and computationally efficient, making their hardware implementation easier and faster in reality.

•
The proposed methods are robust as they have been developed using a ten-fold CV.

•
The proposed methods achieved good classification accuracy as it has been validated using two datasets from two different sources.

•
To the best of our knowledge, we are the first group to present CSP-based methods for the detection of PD.

Limitations and Future Studies
Although the proposed methods are uncomplicated and perform well, there are some issues that need to be discussed.

•
Channel selection: In the present study, all the signals coming from all channels are used, and CSP is applied to spatially filter the signals and reduce the number of features by decreasing the value of d. Selecting channels that contain only information important for the detection of Parkinson's disease before applying signal processing was not exposed in this study. Future studies should be directed to using heuristic optimization methods to investigate the minimum number of channels that yield the maximum classification accuracy. PD detection using a few channels will be more practical and easier to use. • Classification robustness: k-fold cross-validation is one of the most important techniques that are used to validate classification robustness. This technique was employed in all of the previous studies, shown in Table 16. In the present study, like in previous studies, k-fold cross-validation is also used to evaluate our proposed methods and compare their results with previous studies' results. One of the disadvantages of this technique is that it may lead to the classification biasing problem resulting from data leakage. Therefore, future work includes the use of leave-one subject-out cross-validation along with k-fold. • Source of data: One of the shortcomings of these types of studies is the use of different datasets, which makes the comparison of studies' results unfair. It should specify a standard for evaluating the methods that are proposed by the researchers, including using public datasets. In the present study, two public datasets are used in order to compare the results of this study with those that used the same datasets. The authors also plan to test and confirm the proposed methods for additional brain disorders like autism and Alzheimer's disease.

Conclusions
In recent years, EEG signal-analysis techniques have been used to diagnose brain abnormalities. This study focuses on the detection of Parkinson's disease (PD) through the analysis and processing of EEG signals. Here, efficient common spatial pattern (CSP)-based methods for detecting Parkinson's disease in two cases, namely, off/on-medication PD vs. healthy control group, are introduced. The extraction of the features from spatially filtered signals using different metrics, namely, band power, energy, and several types of entropies, is proposed, and the obtained results are compared with those of conventional CSP.
This study also looks at how frequency bands and reduction numbers influence classification performance. Several classification algorithms are investigated to classify the extracted features. Two EEG datasets are used to evaluate the proposed methods: the SanDiego dataset (31 participants, 93 min) and the UNM dataset (54 subjects, 54 min). Results demonstrate that the combination of CSP and log energy entropy outperforms other FE methods, including conventional CSP. When compared to methods in the literature, the results show that the proposed method is able to achieve comparable classification performance. The results in terms of classification accuracy, sensitivity, specificity, and F-score for off-medication PD detection are 99.41%, 99.47%, 99.35%, and 99.40%, respectively. In the case of on-medication PD, performance results range from 95% to 98%. The findings also show that features extracted from the alpha and beta bands provide a higher classification accuracy. Figure 12 depicts a diagram of the entire procedure that produced the best results. These strategies produce outcomes that are encouraging and comparable to those found in earlier studies. In addition, our proposed method is completely portable and can be used in real-time PD diagnosis using EEG signals. higher classification accuracy. Figure 12 depicts a diagram of the entire procedure that produced the best results. These strategies produce outcomes that are encouraging and comparable to those found in earlier studies. In addition, our proposed method is completely portable and can be used in real-time PD diagnosis using EEG signals.