Automatic Recognition of High-Density Epileptic EEG Using Support Vector Machine and Gradient-Boosting Decision Tree

Background: Epilepsy (Ep) is a chronic neural disease. The diagnosis of epilepsy depends on detailed seizure history and scalp electroencephalogram (EEG) examinations. The automatic recognition of epileptic EEG is an artificial intelligence application developed from machine learning (ML). Purpose: This study compares the classification effects of two kinds of classifiers by controlling the EEG data source and characteristic values. Method: All EEG data were collected by GSN HydroCel 256 leads and high-density EEG from Xiangya Third Hospital. This study used time-domain features (mean, kurtosis and skewness processed by empirical mode decomposition (EMD) and three IMFs), a frequency-domain feature (power spectrum density, PSD) and a non-linear feature (Shannon entropy). Support vector machine (SVM) and gradient-boosting decision tree (GBDT) classifiers were used to recognize epileptic EEG. Result: The result of the SVM classifier showed an accuracy of 72.00%, precision of 73.98%, and an F1_score of 82.28%. Meanwhile, the result of the GBDT classifier showed a sensitivity of 98.57%, precision of 89.13%, F1_score of 93.40%, and an AUC of 0.9119. Conclusion: The comparison of GBDT and SVM by controlling the variables of the feature values and parameters of a classifier is presented. GBDT obtained the better classification accuracy (90.00%) and F1_score (93.40%).


Introduction
Epilepsy is a chronic neural disease. According to the International League Against Epilepsy (ILAE), a seizure does not necessarily mean that a person has epilepsy, unless criteria for the diagnosis of epilepsy are met. The diagnosis of epilepsy depends on many factors, such as detailed and accurate seizure history and some assistant examinations, particularly electroencephalograms (EEGs), including normal EEGs, video EEGs (VEEGs) and ambulatory EEGs (AEEGs) [1]. The EEG is a powerful and important tool for the diagnosis and classification of seizure and epilepsy [2,3]. Interictal epileptic EEG is essential for the diagnosis of epilepsy, as epileptic EEG features may be obscured by artifacts [4].
Epilepsy (Ep) is a chronic neural disease, with recurrent, persistent and episodic characteristics. There are about five million epilepsy patients in the world, and there are about two million new cases every year, with an incidence of about 0.7% [5]. Epilepsy is caused by the hyper-synchronous discharge of neurons, which is an abnormal state, accompanied by the formation of abnormal epileptic brain networks. At this time, neurons show an extremely active discharge activity, which leads to a series of seizure symptoms such as fall down, fracture, coma, and so on. Almost all forms of epilepsy can be controlled by drug therapy. Thus, early diagnosis is the most important step in the treatment of epilepsy.
As we all know, scalp EEG is a significant auxiliary examination in the diagnosis of epilepsy. Electroencephalogram (EEG) is a wave image that records spontaneous bioelectricity in the brain and amplifies it through electrode leads. EEG, which is non-invasive and simple, is chosen as the first diagnostic method. During clinical work, VEEG and AEEG are commonly used in the diagnosis of epilepsy because long-time monitoring can increase the chance of detecting seizures. However, this method requires lots of time and energy to recognize epileptic EEG, and can easily lead to manual errors. Artificial intelligence may help solve this difficulty. With the development of technology, the number of electrodes used in clinical EEG examination ranges from 16 electrodes to 256 electrodes; the more detail that can be observed and the greater the amount of data obtained, the greater chance physicians have at diagnosis [6]. The automatic recognition of epileptic EEG is an artificial intelligence application developed from machine learning (ML).
The automatic recognition of epileptic EEG is an artificial intelligence application developed from machine learning (ML). Considering that the automatic recognition of epileptic EEG is an algorithm to distinguish epileptic EEG from non-epileptic EEG, the algorithm is the binary classification, and the result is "yes or no". Slevakumari et al. [7] obtained a sensitivity of 95.75%, specificity of 96.55%, and accuracy of 95.63% though SVM to distinguish epileptic EEG; Rizal et al. [8] used SVM classification to obtain accuracy result of 97.70%; and Jaiswal et al. [9] obtained an accuracy of 97.50% through SVM classification. Li et al. [10] obtained a sensitivity of 95.50%, a specificity of 98.00%, and an accuracy of 94.00% by the random forest method. Hu et al. [11] obtained a classification result with the highest accuracy of 92.0% using GBDT classification to distinguish EEG databases; another study [12] in 2019 obtained an accuracy of 84.22% with GBDT classification. Different feature values can affect the classification results. This makes it impossible to directly compare the classifier effects by classification results.
In this paper, we compare different EEG classifiers based on the same clinical high-lead EEG data and the same feature values to find a more suitable EGG classifier for epilepsy.

Method
The experiment is an EEG classification experiment. This study used the same characteristic values in order to compare the classification effects of different classifiers. The technology roadmap is shown in Figure 1. and energy to recognize epileptic EEG, and can easily lead to manual errors. Artific intelligence may help solve this difficulty. With the development of technology, the nu ber of electrodes used in clinical EEG examination ranges from 16 electrodes to 256 el trodes; the more detail that can be observed and the greater the amount of data obtaine the greater chance physicians have at diagnosis [6]. The automatic recognition of epilep EEG is an artificial intelligence application developed from machine learning (ML).
The automatic recognition of epileptic EEG is an artificial intelligence application d veloped from machine learning (ML). Considering that the automatic recognition of e leptic EEG is an algorithm to distinguish epileptic EEG from non-epileptic EEG, the alg rithm is the binary classification, and the result is "yes or no". Slevakumari et al. [7] o tained a sensitivity of 95.75%, specificity of 96.55%, and accuracy of 95.63% though SV to distinguish epileptic EEG; Rizal et al. [8] used SVM classification to obtain accura result of 97.70%; and Jaiswal et al. [9] obtained an accuracy of 97.50% through SVM cl sification. Li et al. [10] obtained a sensitivity of 95.50%, a specificity of 98.00%, and accuracy of 94.00% by the random forest method. Hu et al. [11] obtained a classificati result with the highest accuracy of 92.0% using GBDT classification to distinguish EE databases; another study [12] in 2019 obtained an accuracy of 84.22% with GBDT class cation. Different feature values can affect the classification results. This makes it impos ble to directly compare the classifier effects by classification results.
In this paper, we compare different EEG classifiers based on the same clinical hig lead EEG data and the same feature values to find a more suitable EGG classifier for e lepsy.

Method
The experiment is an EEG classification experiment. This study used the same ch acteristic values in order to compare the classification effects of different classifiers. T technology roadmap is shown in Figure 1.

EEG Data
This study included 21 participants, with 15 epileptic patients and 6 healthy participants, and a total of 105 EEG data. All participants in this study were from the Department Brain Sci. 2022, 12, 1197 3 of 10 of Neurology of Xiangya Third Hospital. The inclusion and exclusion criteria were as follows: Inclusion criteria: (1) diagnosis obeying the epilepsy diagnosis standard of the International League Against Epilepsy (ILAE); (2) age ≥ 15 years. Exclusion criteria: (1) a history of other brain-related diseases (trauma, infection, and so on); (2) unable to complete EEG tasks independently; (3) cannot tolerate long-term EEG examination.
All EEG data were collected by GSN HydroCel 256 leads and high-density EEG (EGI company, from Shanghai Nohe Medical Company, LTD, Shanghai, China). Then, we completed the pre-treatment of EEG, including filtering and ICA, with the EEGLAB toolbox [13] (2021.1, Arnaud Delorme and Scott Makeig, CA, USA) and Matlab software (2017b, MathWorks Company, Natick, MA, USA). In this study, there were 105,256 lead EEG data lasting 60 s, with the frequency band of 0-80 Hz.

Feature Extraction
This study used PSD, Shannon entropy, mean, kurtosis and skewness as characteristic values, and mean, kurtosis and skewness were processed by EMD.

Power Spectral Density (PSD)
PSD, known as the power spectrum, represents the signal power within a unit frequency band. The PSD shows the changes in signal power by frequency, that is, the power distribution of the signal in the frequency domain. The basic definition of PSD can be expressed as: In Equation (1), P represents the average power of power signal f (t) over the time period [− T /2, T /2]. Additionally, the unit of PSD is V 2 /Hz. In order to reduce the bias during PSD analysis, Pwelch's method [14] was used in the experiment.

Shannon Entropy
Shannon entropy, also known as Information entropy, was proposed by Clause Shannon [15] in his paper "Mathematical Principles of Communication" in 1948. Shannon pointed out that information is used to eliminate random uncertainties. The definition of Shannon entropy [16] is: In Equation (2), H(X) represents the sum of the probability of n events, and each probability of each event is p 1 , p 2 , · · · , p n . Additionally, 0 log 0 = 0, where p(x) is the probability of the event. The unit of Shannon extropy is bits. Shannon entropy can be used to describe the complexity of a system. The more complex a system is, the more different kinds of situations may occur, and the bigger the Shannon entropy of the system is. The simpler a system is, the fewer different kinds of situations may occur, and the smaller the Shannon entropy of the system is, which can be zero if it is simple enough.

Empirical Mode Decomposition (EMD)
Empirical mode decomposition is a new signal processing method creatively proposed by Huang E in NASA [17]. EMD can transform non-stationary signals into stationary signals to obtain more accurate EEG signals. The key point of this signal processing is that through the mode decomposition algorithm, complex signals can be decomposed into intrinsic mode function (IMF). The EMD transforms the non-stationary signals into stationary signals, making the instantaneous signals meaningful.
In this experiment, we used the EMD method to obtain three IMFs from each channel. Then, we calculated the time-domain values of the EEG signals through the IMFs. Mean, kurtosis and skewness were used to value the EEG characteristics. The calculation methods of mean, kurtosis and skewness are shown as follows: In Equation (3), X(t) represents the signal in the time domain, and N is the sampling point in the calculation. When the sampling points are infinitely many, as N → ∞ , we obtain the mean of the whole signal by (3). In Equations (4) and (5), X(t) also represents the signal in the time domain, µ is the mean of the time signal, and σ is the standard deviation

Classifier
Our study used support vector machine (SVM) and gradient-boosting decision tree (GBDT) classifiers to distinguish epileptic EEG from non-epileptic EEG.

Support Vector Machine (SVM)
SVM is a binary classification model. It can be divided into linear models and nonlinear models according to the type of input data [18]. In EEG classification, linear-separable SVM is more commonly used. In our experiment, the EEG data were divided into a training set and testing set with a ratio of 7:3 by random stratified sampling. Then, we selected the characteristic value by normalization and trained the SVM classifier by the RBF kernel [19] method. This method employed the successive grid search technique to find the optimal model parameter values C and gamma: the optimization range of C was 2 j , which j traversed from −4 to 4 in steps of 1; the optimization range of gamma was 2 i , which i traversed from −4 to 4 in steps of 1. Then, we obtained the best classifier model by 5-flod cross-validation. Finally, we obtained the classification results from the best model. The SVM classification process is shown in Figure 2.
Then, we calculated the time-domain values of the EEG signals through the IMFs. Mean kurtosis and skewness were used to value the EEG characteristics.
The calculation methods of mean, kurtosis and skewness are shown as follows: In Equation (3), ( ) represents the signal in the time domain, and N is the samplin point in the calculation. When the sampling points are infinitely many, as → ∞, we ob tain the mean of the whole signal by (3). In Equations (4) and (5), ( ) also represents th signal in the time domain, is the mean of the time signal, and is the standard devia tion of the time signal in which = ∑ ( ( ) − ) .

Classifier
Our study used support vector machine (SVM) and gradient-boosting decision tre (GBDT) classifiers to distinguish epileptic EEG from non-epileptic EEG.

Support Vector Machine (SVM)
SVM is a binary classification model. It can be divided into linear models and non linear models according to the type of input data [18]. In EEG classification, linear-sepa rable SVM is more commonly used. In our experiment, the EEG data were divided into training set and testing set with a ratio of 7:3 by random stratified sampling. Then, w selected the characteristic value by normalization and trained the SVM classifier by th RBF kernel [19] method. This method employed the successive grid search technique t find the optimal model parameter values C and gamma: the optimization range of C wa 2 , which j traversed from −4 to 4 in steps of 1; the optimization range of gamma was 2 which i traversed from −4 to 4 in steps of 1. Then, we obtained the best classifier model b 5-flod cross-validation. Finally, we obtained the classification results from the best mode The SVM classification process is shown in Figure 2.

GBDT Classifier
The gradient-boosting decision tree (GBDT) is a boosting algorithm based on the decision tree proposed by Firedman [20] in 2001. The GBDT algorithm uses a gradient algorithm, reducing the over-fitting problems of the traditional decision tree and making the classification more accurate and precise. Commonly, classification and regression tree (CART) is a kind of weak classifier in iterative classification; in each iteration classification, each weak classifier is trained based on the previous one fitted by the gradient algorithm [21]. In our study, the EEG database was divided into a training set and testing set with a ratio of 8:2 by random stratified sampling. Then, we selected characteristic values by t-test and normalization. Then, we obtained the best GBDT classifier model by 5-flod. Finally, we obtained the classification results from the best model. The GBDT classification process is shown as Figure 3.

GBDT Classifier
The gradient-boosting decision tree (GBDT) is a boosting algorithm based on the decision tree proposed by Firedman [20] in 2001. The GBDT algorithm uses a gradient algorithm, reducing the over-fitting problems of the traditional decision tree and making the classification more accurate and precise. Commonly, classification and regression tree (CART) is a kind of weak classifier in iterative classification; in each iteration classification, each weak classifier is trained based on the previous one fitted by the gradient algorithm [21]. In our study, the EEG database was divided into a training set and testing set with a ratio of 8:2 by random stratified sampling. Then, we selected characteristic values by t-test and normalization. Then, we obtained the best GBDT classifier model by 5-flod. Finally, we obtained the classification results from the best model. The GBDT classification process is shown as Figure 3.

Statistical Evaluation
The experimental results are a dichotomous result, because the classification shows two kinds of results which are "the EEG is epileptic EEG" and "the EEG is non-epileptic EEG". The test result of "epileptic EEG sample" is the positive sample, and "non-epileptic EEG sample" is the negative sample. We used the confusion matrix to evaluate the dichotomous data in the experiment.

Statistical Evaluation
The experimental results are a dichotomous result, because the classification shows two kinds of results which are "the EEG is epileptic EEG" and "the EEG is non-epileptic EEG". The test result of "epileptic EEG sample" is the positive sample, and "non-epileptic EEG sample" is the negative sample. We used the confusion matrix to evaluate the dichotomous data in the experiment.

Participant Information
Our study contained 21 participants, including 15 epileptic patients and 6 healthy participants. There were 6 males and 15 females in our study, as shown in Table 1. The numbers 1-15 were the people with epilepsy, and the numbers 16-21 were the healthy people. We used the nonparametric test of significance to evaluate the age of the two groups, which suggested that there was no significant difference between the age of the two groups (p > 0.05).  Dizzy  16  21  Female  ---17  22  Female  ---18  22  Male  ---19  21  Male  ---20  22  Female  ---21 54 Female ---

Classification Result
In our study, we used five kinds of characteristic values to express the information of the EEG data. Then, we classified the EEG data by two classifiers: SVM and GBDT. In order to evaluate the classifiers, we calculated sensitivity, specificity, accuracy, precision, F1_score, and AUC value, and obtained the results shown in Table 2 and Figure 4.  The result of the SVM classifier showed a sensitivity of 92.86%, specificity of 23.33%, accuracy of 72.00%, precision of 73.98%, F1_score of 82.28%, and AUC of 0.7500. Meanwhile, the result of the GBDT classifier showed a sensitivity of 98.57%, specificity of 70.00%, accuracy of 90.00%, precision of 89.13%, F1_score of 93.40%, and AUC of 0.9119. In the intuitive comparison of the results, the values of sensitivity were almost the same, but the values of specificity were very different, and the GBDT result was far better than the SVM. The comparison of accuracy, precision, AUC, and the overall evaluation index shows that the GBDT presents much better results than the SVM.

Discussion
In this paper, a comparison of GBDT and SVM by controlling the variables of the feature values and parameters of the classifier is presented. The EEG signals were acquired from 15 epileptic and health volunteers and recorded with the 256-channel GSN Hydocel. Then, the finite impulse response (FIR) filters and ICA method were applied to EEG signals for processing. The five feature values (PSD, Shannon entropy, mean, kurtosis and skewness, where mean, kurtosis and skewness were processed by EMD and three IMFs) were applied to EEG signals for describing EEG information. Finally, two classifiers (GBDT and SVM) were applied to distinguish epileptic EEG from non-epileptic EEG. GBDT obtained the better classification accuracy (90.00%) and F1_score (93.40%).
At present, SVM is still the mainstream choice in the field of EEG classification. The mainstream classifiers have good performance in epileptic EEG classification in the published studies. GBDT is a new ML classifier that is rarely applied in the recognition of epileptic EEG. Recent studies show that GBDT has a great classification performance in the classification of epileptic EEG. However, there are still no comparisons between SVM and GBDT. This study pays attention to this question and finds that GBDT, as a new classifier in the field of the automatic recognition of epileptic EEG, has a better classifier effect.
GBDT is an emerging classifier published in 2001, and is rarely used in EEG classification. Only 11 search records can be found in PubMed and Embase with the search terms "GBDT" AND "EEG". Among these, there are only six search records from the last 3 years. In the study by Huang et al. [22] in 2021, the GBDT classifier showed a sensitivity of 85.9%, specificity of 84.0%, and accuracy of 87.4% in children's EEG classification. SVM, as a traditional classifier, is used commonly in epileptic classification with a great performance.

Discussion
In this paper, a comparison of GBDT and SVM by controlling the variables of the feature values and parameters of the classifier is presented. The EEG signals were acquired from 15 epileptic and health volunteers and recorded with the 256-channel GSN Hydocel. Then, the finite impulse response (FIR) filters and ICA method were applied to EEG signals for processing. The five feature values (PSD, Shannon entropy, mean, kurtosis and skewness, where mean, kurtosis and skewness were processed by EMD and three IMFs) were applied to EEG signals for describing EEG information. Finally, two classifiers (GBDT and SVM) were applied to distinguish epileptic EEG from non-epileptic EEG. GBDT obtained the better classification accuracy (90.00%) and F1_score (93.40%).
At present, SVM is still the mainstream choice in the field of EEG classification. The mainstream classifiers have good performance in epileptic EEG classification in the published studies. GBDT is a new ML classifier that is rarely applied in the recognition of epileptic EEG. Recent studies show that GBDT has a great classification performance in the classification of epileptic EEG. However, there are still no comparisons between SVM and GBDT. This study pays attention to this question and finds that GBDT, as a new classifier in the field of the automatic recognition of epileptic EEG, has a better classifier effect.
GBDT is an emerging classifier published in 2001, and is rarely used in EEG classification. Only 11 search records can be found in PubMed and Embase with the search terms "GBDT" AND "EEG". Among these, there are only six search records from the last  Table 3, GBDT and SVM both have great performance. However, we cannot draw the conclusion of which classifier has a better classification performance.
Therefore, in order to compare the classification performance of GBDT and SVM, control variables are important. This study used the same feature values and same classifier parameters in order to control variables. Then, we used different classifiers to distinguish epileptic EEG and non-epileptic EEG. Therefore, the comparison between different classifiers makes sense.
According to the results of this study, all statistical evaluations suggest that GBDT, a rising classifier, has a better classification performance than SVM. GBDT has great sensitivity, accuracy, and F1_score in epileptic EEG recognition. Compared with the data in Table 3, the classification results of GBDT in this paper are not the best. However, as mentioned above, the selection of feature values can affect classification results, and we cannot directly evaluate a classifier by its accuracy. In this study, GBDT showed a better classification performance than SVM.
A limitation of the proposed study might be considered as the restricted number of participants. While the sample may seem small, the method shows good classification performance with an AUC of 0.9119. The recognition accuracy may be improved with more EEG signals and participants. Furthermore, feature extraction is a major step in our EEG methodology. We used time-domain features (mean, kurtosis and skew), a frequency-domain feature (PSD), and a non-linear feature (Shannon entropy) to calculate the information of EEG signals. The classification performance may be improved with the calculation of the more feature values; however, the addition of more features from 256 channels would definitely increase the complexity of the proposed method and computational burden. This makes it impossible for our method to be applied in a clinical real-time application in the future.

Conclusions
In this paper, a new automatic recognition of an epileptic EEG method comparing GBDT and SVM by controlling the variables of the feature values and parameters of the classifier was presented. After a preprocessing and feature extraction stage, this method was able to classify EEG recordings using the GBDT classifier. Our ambition is to apply this method efficiently in clinical epilepsy diagnosis to reduce the workload of physicians, increase the efficiency of epilepsy diagnosis, and benefit people with epilepsy.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of the Third Xiangya Hospital, CSU (September, 2022).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patients to publish this paper.