EEG Microstate Features as an Automatic Recognition Model of High-Density Epileptic EEG Using Support Vector Machine

Epilepsy is one of the most serious nervous system diseases; it can be diagnosed accurately by video electroencephalogram. In this study, we analyzed microstate epileptic electroencephalogram (EEG) to aid in the diagnosis and identification of epilepsy. We recruited patients with focal epilepsy and healthy participants from the Third Xiangya Hospital and recorded their resting EEG data. In this study, the EEG data were analyzed by microstate analysis, and the support vector machine (SVM) classifier was used for automatic epileptic EEG classification based on features of the EEG microstate series, including microstate parameters (duration, occurrence, and coverage), linear features (median, second quartile, mean, kurtosis, and skewness) and non-linear features (Petrosian fractal dimension, approximate entropy, sample entropy, fuzzy entropy, and Lempel–Ziv complexity). In the gamma sub-band, the microstate parameters as a model were the best for interictal epilepsy recognition, with an accuracy of 87.18%, recall of 70.59%, and an area under the curve of 94.52%. There was a recognition effect of interictal epilepsy through the features extracted from the EEG microstate, which varied within the 4~45 Hz band with an accuracy of 79.55%. Based on the SVM classifier, microstate parameters and EEG features can be effectively used to classify epileptic EEG, and microstate parameters can better classify epileptic EEG compared with EEG features.


Introduction
Epilepsy is one of the most common neural diseases, with recurrent, persistent, and episodic characteristics [1]. Epilepsy is transient, stereotyped, recurrent, and sudden. Abnormal electrical activity in the brain causes short-lived seizures [2]. According to the International League Against Epilepsy (ILAE), epilepsy can be diagnosed as two or more seizures within an interval of 24 h. Epilepsy is a state of abnormal neural activity, which is caused by the hypersynchronous discharge of neurons. At this time, neurons show extremely active discharge activity, which induces seizures, including a loss of consciousness, tics, etc. Epilepsy affects 8% of the world's population, both directly and indirectly [2], and patients with epilepsy (PWEs) rarely realize or predict the onset of a seizure before it occurs, which increases the risk of physical harm. The early diagnosis and subsequent evaluation of epilepsy is crucial. Due to delays in epilepsy diagnosis and untimely and inappropriate treatments, many patients develop serious complications, such as cognitive disorders, emotional disorders, etc. During a seizure, ictal epileptiform discharges (IEDs) are produced, which is a necessary condition for the diagnosis of epilepsy [3]. However, a short-term interval electroencephalogram (EEG) cannot accurately record the IEDs; therefore, the diagnosis of PWEs is based on video electroencephalogram (VEEG) in clinical practice. When interictal, electrical activity in the brains of PWEs remains relatively normal, usually without epileptiform discharges, which adds difficulty to the diagnosis of epilepsy [4]. Moreover, long-term EEG is relatively expensive, and the required labor is relatively high.

Materials and Methods
This study was an EEG classification experiment based on EEG microstate analysis. It was based on the support vector machine (SVM) classifier, which was used to compare the effects of different feature extraction methods on EEG classification. The technology roadmap is shown in Figure 1.
Brain Sci. 2022, 12, x FOR PEER REVIEW 3 of 18 epilepsy was carried out in order to obtain a model that can effectively classify the interictal EEG of PWEs and assist in the diagnosis of epilepsy.

Materials and Methods
This study was an EEG classification experiment based on EEG microstate analysis. It was based on the support vector machine (SVM) classifier, which was used to compare the effects of different feature extraction methods on EEG classification. The technology roadmap is shown in Figure 1.

Subjects and EEG Recording and Preprocessing
This was a single-center research study, and it passed the ethical review of the Third Xiangya Hospital (ID: 22187). EEG recordings were obtained from 32 PWEs and 20 healthy subjects, who were recruited from the Third Xiangya Hospital. The inclusion and exclusion criteria were as follows. Inclusion criteria: (1) diagnosed with focal epilepsy by two professional doctors according to the 2017-ILAE standard; (2) age ≥ 15 years or head circumference matched with electrode cap. Exclusion criteria: (1) history of other brain-

Subjects and EEG Recording and Preprocessing
This was a single-center research study, and it passed the ethical review of the Third Xiangya Hospital (ID: 22187). EEG recordings were obtained from 32 PWEs and 20 healthy subjects, who were recruited from the Third Xiangya Hospital. The inclusion and exclusion criteria were as follows. Inclusion criteria: (1) diagnosed with focal epilepsy by two professional doctors according to the 2017-ILAE standard; (2) age ≥ 15 years or head circumference matched with electrode cap. Exclusion criteria: (1) history of other brainrelated conditions (trauma, infection, etc.); (2) unable to complete EEG tasks independently; (3) unable to consent to EEG examination. Each participant received a 128-lead EEG examination with the GSN system, whose sampling rate was 1000 Hz, and the duration was 5 min. For each instance, 5 EEG values, with a duration of 45 s, were extracted to minimize noise and artifacts.
For the measurement data which showed a normal distribution, t-tests were selected for statistical analysis, and for the measurement data that did not fulfil the normal distribution and the count data, Mann-Whitney U nonparametric tests were selected for the statistical analysis.
We preprocessed all data through EEGLAB (2020b), a plug-in for MATLAB (vR2020b), to remove artifacts and noise, including filtering (0.5~4 Hz, 4~8 Hz, 8~13 Hz, 13~30 Hz, 30~45 Hz, 45~80 Hz, and 4~45 Hz), ICA, removal of the bad conductors, and the selection of non-IED data. Due to interference and the influence of patients, such as a sudden seizure or hyperactivity symptoms, which caused a large amount of artifacts that could not be removed, we finally obtained 135 epileptic EEG data values and 83 healthy EEG data values, whose duration was 45 s.

Microstate Analysis
For microstate analysis, we followed the standard steps for microstate segmentation proposed by Murray et al. We used MATLAB (vR2020b) and the EEGLAB toolbox to carry out the analysis. First, we calculated the field strength at each moment (the global field power (GFP)). GFP is defined as where v(t) is the electrode voltage vector (united as µv) at time t, n is the number of electrodes, and v i (t) represents the voltage of the ith electrode. In topographic maps with distinct or many peaks, when the GFP is high, the signal-to-noise ratio tends to be high; when the GFP is low, the signal-to-noise ratio tends to be low. We selected the topographic map at the peak of the GFP as the original topographic map to describe the surrounding EEG signals, which can effectively reduce the redundancy of EEG signals and the computational load. Four types of microstate topographic maps were selected as the original model topographic maps through cross-validation [13]. Subsequently, each original topographic map was spatially clustered to obtain the final model topographic map, in order to maximize the similarity between EEG samples and their designated microstate prototypes. In this study, we applied K-means spatial clustering. The flow chart of EEG microstate analysis is shown in Figure 2.

K-Means Clustering
In this study, we adopted the spatial clustering algorithm based on K-means clustering, as shown in Figure 3. The GMD value provided an electric-field-independent metric that describes the topological difference between two electric field topographic maps. The GMD value is defined as rounding EEG signals, which can effectively reduce the redundancy of EEG signals and the computational load. Four types of microstate topographic maps were selected as the original model topographic maps through cross-validation [13]. Subsequently, each original topographic map was spatially clustered to obtain the final model topographic map, in order to maximize the similarity between EEG samples and their designated microstate prototypes. In this study, we applied K-means spatial clustering. The flow chart of EEG microstate analysis is shown in Figure 2.   In the formula, represents the voltage value of the first topographic map, represents the voltage value of the second topographic map, and refers to the number of electrodes. The range of GMD values is from 0 to 2. Zero means that the two maps are completely consistent, and two represents the polarities of the two maps on the contrary. In the formula, u i represents the voltage value of the first topographic map, v i represents the voltage value of the second topographic map, and N refers to the number of electrodes. The range of GMD values is from 0 to 2. Zero means that the two maps are completely consistent, and two represents the polarities of the two maps on the contrary. The algorithm of K-means clustering comes from the research of Koening et al. [12]. For each model topographic map, we will obtain a time series of spatial correlation coefficients, and for each original topographic map, we can obtain a model topographic map figure with the lowest GMD. Based on these results, we calculated the overall proportion of variance explained (GEV, which describes the proportion of variance explained by the model topographic map over all original topographic maps) for the four selected model topographic maps. We matched the model topographic map with the original topographic map based on the GMD value, marked the original map as the type with the smallest GMD, and superimposed all the original topographic maps marked with the same model topographic map to obtain four new topographic patterns. (4) We repeated step 3 until the GEV value did not increase (i.e., the GEV reached the highest level), and the final four-mode topographic map is obtained. The original scalp potential topography obtained by each subject at each time was compared and matched with the microstate (pattern topographic map) obtained by clustering, the four microstates were marked as A, B, C, and D, and the subjects were obtained. The corresponding EEG microstate time series parameters were calculated. In the EEG microstate analysis, we extracted microstate parameters and EEG signal features, and we used them as models to classify epileptic EEG using the SVM classifier.

EEG Microstate Parameters
The basic temporal dynamics of microstates are described by occurrence, duration and coverage. The occurrence rate reflects the average times per second dominated by microstates. The duration is defined as the average duration (in milliseconds) of a given microstate. The coverage rate reflects the fraction of time for which a given microstate is active. These fractions are directly extracted through the algorithm in EEGLAB. In this study, the parameters (occurrence, duration, and coverage) were used as a model to classify epileptic EEG signals.

Feature Extraction
EEG signals are nonlinear; therefore, nonlinear features are commonly extracted to accurately classify signals. In this study, five types of linear features and five types of nonlinear features were extracted from the EEG microstate in the sub-band of 4~45, which is usually used in EEG analysis [18]. The linear features were the mean, second quartile, median, skewness, and kurtosis. The nonlinear features were the Petrosian fractal dimension (PFD), Lempel-Ziv complexity (LZC), and entropies (approximate entropy, sample entropy and fuzzy entropy).

Linear Feature Extraction
Some time-frequency features were used in signal processing in the time-frequency domain. These are governed by the equations given in Table 1.

Formula
Median , N is an odd

Nonlinear Feature Extraction PFD
The Petrosian fractal dimension (PFD) is a chaotic algorithm used to calculate EEG signal complexity [19]. The PFD quickly computes fractal dimensions by converting signals into binary sequences. It is governed by the following equation: Here, k denotes the number of signals, and N δ denotes the number of changes in the signal.

LZC
Based on the binary coarse-grained algorithm, we calculated the mean and threshold of LZC, compared the amplitude and threshold, and converted the original signal into a 0-1 sequence [20]. Then, the number of distinct patterns in the sequence was calculated from the complexity of the signal. It was governed by the following equation: where b(n) = n log i n , n is the length of the time series and i is the coarse-grained degree of the time series.

Entropy
Entropy measures the chaos within a system and is useful in measuring the ambiguity and variability in signals. In this study, four entropies were studied, including the approximate entropy (ApEn) [21], sample entropy (SampEn) [22] and fuzzy entropy [23]. ApEn is defined as , N is the length of the time series, r is the similarity capacity, and m is the embedded dimension. SampEn is defined as is the probability that the distance between the vector X m (j) and the vector X m (i) is less than r, N is the length of time series, r is the similarity capacity, and m is the embedded dimension. Fuzzy entropy is defined as N is the length of the time series, r is the similarity capacity, and m is the embedded dimension.

Training/Test Set Split
To avoid overfitting, we used cross-validation for the classification experiments. However, the data were amplified from a limited number of patients; therefore, specific data segmentation was required to prevent the classifier from learning the pattern of each patient. To this end, we randomly selected one PWE and one healthy person; for each participant, we included five data values with the same labels as the test set. The data of the other 26 PWE and 17 healthy participants (225 subjects in total) were applied in the cross-validation experiment as a training set.

EEG Signal Classification
SVM is a binary classification model. It can be divided into a linear model and nonlinear models according to the type of input data [24]. In our study, the EEG data were divided into a testing set with 1 epileptic EEG datum point and 1 heathy EEG datum point, and a training set. Then, we chose leave-one-out cross-validation to complete the classification task of epileptic EEG [25][26][27].
The extracted microstate parameter set and feature set were input into the SVM classifier separately. SVM has good generalization capacities to prevent overfitting. Compared with other classifiers, SVM has high robustness and generality to non-stationary signals and is widely used in machine learning analysis.

Evaluation of Classifier
The SVM classifier belongs to the second type of classifier, and its classification results are "Yes" and "No". We defined the EEG of epileptic patients as "positive EEG" and the EEG of healthy subjects as "negative EEG". We used accuracy, recall, and specificity to evaluate the classification efficiency of the SVM classifier for epileptic EEG in this study. The receiver operating characteristic (ROC) curve is a plot with recall on the y-axis and specificity on the x-axis. The area under the ROC curve (AUC) is a measure of the model performance. For practical situations, an AUC of over 70% is desirable [28].

Participants' Information
Our study recruited 44 participants, including 27 PWEs and 17 healthy participants. Their details are presented in Table 2. The clinical details of the PWEs are shown in Table 3. There was no significant difference in age, gender, and education level between PWEs and healthy participants (p > 0.05), as shown in Table 2. In this section, the classification results of epilepsy and healthy states, in the absence of IEDs, from 128-lead EEG data are presented based on the EEG microstate parameters and the features of EEG signals.
Microstate maps across all participants from all sub-bands are shown in Figure 4. To identify any resemblance in the topographies from different sub-bands, topographic analysis was performed to compare microstate maps among sub-bands. No significant difference was observed in any microstate map in any sub-band.

EEG Microstates Parameters' Classification
In this section, the classification of EEG microstate parameters is presented. Table 4 shows the performance of EEG microstate parameters in the SVM classifier. The results of three evaluation indexes (accuracy, recall, and specificity), at different EEG frequency sub-bands, are given. There were differences in performance in the classification of epilepsy when using the EEG features extracted from different frequency bands. The classification results of different sub-bands through SVM are shown in Table 4 and Figure 5. It is observed that the EEG microstate parameters are more suitable for the EEG classification of epilepsy in these sub-bands (α-band, β-band, and γ-band), with an accuracy of 87.18%, recall of 70.59%, and specificity of 100.00%. Furthermore, the results show that the δ-band and high band (45~80 Hz) do not contain important data for microstate analysis, because the accuracy, recall, specificity, and AUC are much lower than those of other sub-bands.

EEG Microstates Parameters' Classification
In this section, the classification of EEG microstate parameters is presented. Table 4 shows the performance of EEG microstate parameters in the SVM classifier. The results of three evaluation indexes (accuracy, recall, and specificity), at different EEG frequency subbands, are given. There were differences in performance in the classification of epilepsy when using the EEG features extracted from different frequency bands. The classification results of different sub-bands through SVM are shown in Table 4 and Figure 5. It is observed that the EEG microstate parameters are more suitable for the EEG classification of epilepsy in these sub-bands (α-band, β-band, and γ-band), with an accuracy of 87.18%, recall of 70.59%, and specificity of 100.00%. Furthermore, the results show that the δ-band and high band (45~80 Hz) do not contain important data for microstate analysis, because the accuracy, recall, specificity, and AUC are much lower than those of other sub-bands.   Table 4. Four criteria were used to evaluate the classifier model. Obviously, the classification effect of the classifier model in the γ-band was better than in other sub-bands.
The ROC curves for different sub-bands are shown in Figure 6. Comparing the AUC in different frequency bands, we found that the EEG microstate parameters in the gamma frequency band had the best effect on subject classification, because the AUC of the gamma band was higher than that of other sub-bands for the microstate parameters. Moreover, it was found that the alpha band and beta band performed relatively well. The results are presented in Table 4.  Table 4. Four criteria were used to evaluate the classifier model. Obviously, the classification effect of the classifier model in the γ-band was better than in other sub-bands. The ROC curves for different sub-bands are shown in Figure 6. Comparing the AUC in different frequency bands, we found that the EEG microstate parameters in the gamma frequency band had the best effect on subject classification, because the AUC of the gamma band was higher than that of other sub-bands for the microstate parameters. Moreover, it was found that the alpha band and beta band performed relatively well. The results are presented in Table 4.

EEG Feature Set Classification
In this section, the classification of EEG signal features that were extracted from the EEG microstate time series for each single channel in 4~45 Hz is presented. Table 5 shows the performance of the EEG single features in the SVM classifier. The results for accuracy, recall and specificity are presented. We can conclude that the SVM classifiers which use the feature set that included the median, mean, second quartile, kurtosis, skewness, fuzzy entropy, PFD approximate entropy, sample entropy, and LZC could effectively classify epileptic EEG signals, with an accuracy of 79.55%, recall of 81.84%, and specificity of 76.47%.

Discussion
In this study, we analyzed the epileptic EEG microstate sequence, and we could intelligently classify epileptic EEG through the features of the microstate sequence. We analyzed the EEG microstates of 27 epileptic EEG signals and 17 healthy EEG signals. Based on the microstate analysis, we extracted the microstate parameters and microstate sequence feature set and input them into the SVM classifier to classify the epileptic EEG signals. Based on the SVM classifier, the microstate parameters (duration, occurrence, and coverage) with an accuracy of 87.18% and the feature set (median, mean, second quartile, kurtosis, skewness, fuzzy entropy, PFD approximate entropy, sample entropy, and LZC) with an accuracy of 79.55%, extracted from the EEG microstate sequence with an accuracy of 79.55%, could be used as a classification model to classify epileptic EEG signals.
In 2018, Kiran et al. used machine learning to classify the EEG microstate parameters of the EEG of PWEs, and the classification accuracy rate could reach more than 76.1%; they also described the microstate changes of intractable epilepsy [8]. The results suggest that large-scale EEG microstate changes exist in PWEs, and this change can be used as a bioelectrical marker for the intelligent identification of epilepsy [23]. In this study, based on machine learning, the SVM classifier used the EEG microstate parameters as a classification model to classify the interictal EEG signals of epileptic patients and the resting EEG signals of healthy subjects. We found that in the γ (30~45 Hz) frequency band, the classification model was the most accurate for the epilepsy group and the heathy group, reaching 87.18%. This result is inconsistent with the results of Ahmedi et al. [23] on the classification of epilepsy EEG signals and psychogenic non-epileptic interictal EEG; their results showed that β (13~30 Hz) had the best classification effect. In this study, we selected patients with focal epilepsy as the experimental group; Ahmedi et al. [23] did not clearly define the epilepsy categories, and the seizure mechanism of PWEs with different parts or categories was inconsistent, which may have led to differences in the results of the two studies. These findings [8,23] demonstrate that we can efficiently classify epilepsy and healthy patients' short-range EEG signals using machine learning algorithms based on EEG microstate parameters in the absence of abnormal epilepsy discharges.
EEG signals have linear and nonlinear characteristics [20,29]. In recent years, various models have been proposed for different EEG signal data, and feature extraction and feature classification are frequently used in various diseases, such as the diagnosis and monitoring of neuropsychiatric diseases [30]. Based on the amplitude-integrated electroencephalography (aEEG) and compress spectrum array (CDSA) theory, Abend et al. [31] performed the EEG identification of epileptic seizures; the sensitivity of identifying long-term epileptic discharges was 88%, whereas the sensitivity of identifying short-term epileptic discharges was 40% [31]. The researchers [32] then used the short-term index to classify EEG signals, revealing pathological brain electrical activity, and achieved 99.6% accuracy in the classification of healthy and pathological EEG. This means that EEG signatures can effectively identify disease EEG signals. In this study, we also combined the high temporal resolution of EEG microstates to extract the EEG signal features of the EEG microstate time series and input them into an SVM classifier to classify PWEs versus healthy subjects. Compared with previous research using epileptic interictal EEG as a subject, we obtained a better model with relatively high classification accuracy, and we chose the short-term interictal EEG signal as a research object to make the model more universal. However, compared with previous research [23,33] using EEG during the ictal stage, there was a poor classification effect and low accuracy achieved by the classification model in this study. Following an analysis of previous research materials and methods, the main reasons for the inconsistent results may be as follows: different EEG recording methods, and different EEG amplitude and frequency evolution due to different epilepsy periods selected by EEG.
Multivariable analysis based on machine learning provides an opportunity to understand the system by analyzing many features simultaneously [34]. Therefore, we can reduce the I-type error and obtain the optimization model through multivariable analysis. In other words, features extracted from EEG data can be combined with a variety of analyses to de-velop more reliable and effective models. Previous studies have reported that EEG features can been used to identify a variety of neuropsychiatric diseases, such as epilepsy [23] and schizophrenia [13]. To verify the effect of EEG microstate parameters used for epileptic EEG, we extracted the EEG feature set from EEG microstate series and compared the effect of the EEG feature set with the EEG microstate parameters. The result showed that the classification effect of microstate parameters with an accuracy of 87.18% was better than that of EEG features with an accuracy of 79.55%. Based on the results, we can suggest that the microstate parameters contain information which is difficult to obtain through traditional EEG analysis. This means that microstate parameters can be used as a diagnostic biomarker classifying epilepsy. Compared with previous studies [23,35], there were some limitations in this research: (1) this was a single-center study, and the collection of samples was limited; (2) because we chose the interictal EEG, the results could not represent the EEG signals of epileptic patients; (3) due to the low-quality EEG recording and processing technology, the EEG microstate topographic map that we obtained was not completely consistent with previous research, and there was noise interference, which greatly limits our follow-up research. Despite these limitations, the current results report the microstate changes in epilepsy. Even in the absence of epileptic discharge, the accuracy of the feature prediction of epilepsy reached 87.18%. We anticipate a follow-up study of large-sample, multi-center EEG microstates for epilepsy identification.

Conclusions
In this study, we explored the possibility of using short-range high-lead EEG signals between seizures for the automatic classification of epilepsy. Therefore, based on microstate analysis technology, we extracted microstate parameters and EEG signal features to classify epileptic EEG signals using the SVM classifier. We found that the microstate parameters (accuracy rate of 89.18%) and EEG features (accuracy rate of 79.55%) extracted from microstate sequences can effectively classify epileptic EEG signals. Moreover, compared with EEG features, microstate parameters can more effectively classify epileptic EEG signals. However, this conclusion has some limitations. We expect that this discovery can be verified by large-sample and multi-center experiments in the future. In addition, we expect to use a variety of classifiers to complete the classification of epileptic EEG to study the impact of classifiers on the intelligent classification of epilepsy. Informed Consent Statement: Informed consent was obtained from all the subjects involved in the study.

Data Availability Statement:
The data are not publicly available due to patients' privacy protection.

Conflicts of Interest:
The authors declare no conflict of interest.