Classiﬁcation of EEG Signals for Prediction of Epileptic Seizures

: Epilepsy is a common brain disorder that causes patients to face multiple seizures in a single day. Around 65 million people are affected by epilepsy worldwide. Patients with focal epilepsy can be treated with surgery, whereas generalized epileptic seizures can be managed with medications. It has been noted that in more than 30% of cases, these medications fail to control epileptic seizures, resulting in accidents and limiting the patient’s life. Predicting epileptic seizures in such patients prior to the commencement of an oncoming seizure is critical so that the seizure can be treated with preventive medicines before it occurs. Electroencephalogram (EEG) signals of patients recorded to observe brain electrical activity during a seizure can be quite helpful in predicting seizures. Researchers have proposed methods that use machine and/or deep learning techniques to predict epileptic seizures using scalp EEG signals; however, prediction of seizures with increased accuracy is still a challenge. Therefore, we propose a three-step approach. It includes preprocessing of scalp EEG signals with PREP pipeline, which is a more sophisticated alternative to basic notch ﬁltering. This method uses a regression-based technique to further enhance the SNR, with a combination of handcrafted, i.e., statistical features such as temporal mean, variance, and skewness, and automated features using CNN, followed by classiﬁcation of interictal state and preictal state segments using LSTM to predict seizures. We train and validate our proposed technique on the CHB-MIT scalp EEG dataset and achieve accuracy of 94%, sensitivity of 93.8% , and 91.2% speciﬁcity. The proposed technique achieves better sensitivity and speciﬁcity than existing methods.


Introduction
Patients experience seizures in epilepsy due to disruption in the functionality of neurons inside the brain. Around 65 million people worldwide are affected by epilepsy [1]. Conventional methods to treat epilepsy patients are through medication and surgery; however, successive seizures cannot be controlled using existing treatments in around 30% of patients [2]. Therefore, it is very important to predict subsequent seizures in time. Upcoming seizures, if detected early, can be stopped to avoid serious damage, which in some cases can be fatal. Mostly, such patients are monitored and examined using electroencephalogram (EEG) recordings [3,4]. These recording are then visually analyzed by doctors for a more comprehensive understanding of the patient's seizures. The procedure is subject to human error and is time consuming and highly subjective in nature. The need for an automatic seizure detection system arises, mainly aimed at accelerating the analysis process, making it accurate, and to assuage the workload of neurologists. EEG signals are recorded in two ways: one is scalp EEG, in which probes/electrodes are placed on the scalp of the subject, and the other is intracranial EEG, where EEG electrodes are fixed invasively on the tissues of the brain [5].
EEG signals of an epileptic patient can be broadly categorized into four states [6]: the preictal state [7] is the state a few minutes before the actual occurrence of the seizure; the ictal state [8] is the state when the seizure is actually occurring; the postictal state [9] is the state after the seizure has passed; and the interictal state is the state between two consecutive seizures, which can also be called the normal state. Figure 1 presents a multichannel plot of one hour of EEG signals. In seizure prediction, the preictal state is useful as it begins a few minutes prior to the seizure. The preictal state is used for the detection and forecasting of seizures. It provides valuable information about the start of a seizure, as it begins a couple of minutes before the seizure actually occurs [6]. Predicting the ictal state by classifying the interictal and preictal state can help in averting seizures and the damage caused because of it by allowing timely administration of medicine. Figure 1 verifies the transformation in the electrical activity of the patient's brain between the preictal and the interictal states; in terms of both frequency and amplitude, there is an observable increase in the preictal state [10]. Prediction of epileptic seizures  includes preprocessing for noise reduction, feature set extraction, and categorization of interictal and preictal state segments. Inter-electrode interference, powerline noise, and noise owing to movement-related cortical potentials and ECG signals all cause noise in EEG signals during recording. Using machine learning and deep learning approaches, researchers have developed various ways for time/frequency domain extraction and automation. Automated features have been derived from many types of CNNs, whereas handcrafted features include time and frequency domain univariate and multivariate features. Machine learning classifiers such as SVM, decision tree, and MLP and deep learning-based classifiers such as CNN, RNN, and LSTM have been used to classify interictal and preictal state segments. Prediction accuracy remains a difficulty since it necessitates excellent preprocessing, features with large interclass variance, and improved classification. We present a convolutional neural network and SVM-based technique for classification of EEG signals for epileptic seizure prediction in this research.

Related Work
Researchers [36][37][38][39][40][41][42][43][44][45][46][47][48][49][50][51][52][53][54][55] have proposed several techniques to predict seizures, including traditional machine learning approaches and deep learning techniques. EEG signals are generally susceptible to noise, especially scalp EEG, where electrodes that acquire the EEG signals are placed far from the source, i.e., on the scalp. Multiple types of noise affect the SNR of EEG signals, including powerline noise [56] between 50-60 Hz, baseline noise [57] that occurs because of the electrical interference of electrodes with each other, and artifacts that are generated because of human movements such as eye blink, pulse, etc. Researchers have proposed preprocessing techniques for increasing the signal-tonoise ratio (SNR) of electroencephalogram signals. The authors of [36,46,58] removed noise using bandpass filters. The authors of [38] transformed the time domain signals to frequency domain using fast Fourier transform (FFT). Researchers [39,59] have also used short time Fourier transform (STFT) to preprocess the EEG signals. STFT is very useful in preprocessing because of the non-stationary nature of EEG signals. The authors of [45] broke the signals down into multiple intrinsic mode functions (IMFs) on the basis of the frequency components using empirical mode decomposition (EMD) [60,61] as well as discrete wavelet transform (DWT) [62] for the purpose of preprocessing. In [41], the authors proposed DWT for preprocessing. Making a surrogate channel [63], using a common spatial pattern (CSP) [64], local mean decomposition (LMD) [60], or adaptive filtering [65] are some of the other ways of removing noise from EEGs.
Data after preprocessing is usually in large quantities and of a higher dimension. In this form, EEG signals are not suitable to be passed to classifier for classification. Therefore, a feature set is required, which is a subset of data that has lower dimensions and is not redundant. The process of converting data into a feature set is called feature extraction and extracts distinct features for classification. Both handcrafted and automated features have been extracted in existing methods. Handcrafted features of EEGs usually include univariate [66] and multivariate features [67]. Time domain features include Lyapunov exponent PCA [68], Hjorth parameters [69], approximate entropy [70], and statistical moments [71] and include the mean, also know as the average, variance, which is the deviation from the mean; skewness, which can be called distortion or asymmetry; kurtosis, which is the sharpness of the peak; and entropy, which is the measure of randomness in data [72]. Hjorth parameters are complexity and mobility. These parameters are helpful in the classification of EEG signals. Variance of the EEG signal through time is called Hjorth activity. The following equations give the mathematical representation of activity, mobility, and complexity, respectively.
Mobility(y(t)) = Activity( Complexity(y(t)) = Mobility( . Average frequency is given by mobility, whereas variation in frequency is given by complexity. Spectral features, which are frequency domain features, include spectral moments, spectral skewness, spectral centroid, variational coefficients, and power spectral density. Handcrafted features such as zero-crossings intervals [36], phase-locking values [45,46], bag of waves [37], and common spatial pattern filtering [44], have been extracted. Convolutional neural networks (CNNs) are used by [41,73] for feature extraction. A convolutional neural network extracts features in such a way that it keeps the class information; by doing this, features that have high inter-class variance are extracted. Ref. [74] used Hilbert vibration decomposition for the extraction of amplitude modulation/frequency modulation subcomponents of signals of non-stationary nature.
Multiple traditional machine learning and a few cutting-edge deep learning methods have been employed for classification after extracting features. Machine learning classifiers include k-nearest neighbors (KNN) [69], naive Bayes [68], decision tree, and SVM. Deep learning-based classifiers include CNN, recurrent neural network (RNN), and long shortterm memory (LSTM). Table 1 presents a comparison between these techniques along with their performance metrics results. Table 1 compares the existing state-of-the-art methods proposed in recent years. Preprocessing plays a vital role in achieving increased sensitivity when predicting an epileptic seizure. Moreover, extraction of multivariate features helps in getting better prediction results. Neural network-based classifiers and support vector machines seem to perform better than other classifiers. Average anticipation time and sensitivity are interrelated. If a prediction method has increased sensitivity, this leads to increased average anticipation time. However, wrong selection of a classification technique could lead to increased false positives and reduced sensitivity. Therefore, selection of the classifier can affect overall performance. In these studies, it is seen that damage due to epileptic seizures can be avoided by predicting the epileptic seizure through identifying the beginning of the preictal state. Effective preprocessing techniques are needed to remove that noise that was introduced in the acquisition of the EEG signal. Extracting and selecting features have also proven to be a major challenge.

Overview of Proposed Method
We propose a patient-specific method for seizure prediction that predicts a seizure by detecting the start of preictal state. A flowchart for the proposed method is shown in Figure 2. The dataset used in this study is the widely used, publicly available CHB-MIT scalp EEG dataset [75]. It has recordings of 22 subjects with 23-channel signal recording and a sampling frequency of 256 Hz. PREP pipeline [76] is used to remove the powerline noise. After noise is removed from the dataset, a sliding window of 30 s with 50% overlap is selected, and short time Fourier transform (STFT) is applied in order to further enhance the SNR and to convert from the time to frequency domain. The overlapping window is selected only in the preictal data. This is done to overcome the class imbalance between the interictal and preictal class. As stated earlier, the interictal state is the normal state of the brain, and the preictal state is recorded a few minutes before the occurrence of a seizure, so there is an inherent imbalance between the amount of data available for the two states. An overlapping window is used to perform oversampling of the preictal state. This is done across the board to all the channels and for each occurrence of preictal state. The interictal data is converted to frequency domain using a 30 s non-overlapping window. Figure 3 shows a visual representation of the overlapping and non-overlapping windows. Statistical moments have been extracted as handcrafted features, and CNN is applied for feature extraction. A custom, three-layer CNN is used that takes a 65 × 117 × 23 matrix as the input size. Batch normalization is used to minimize internal covariate shift. Dense layers of the CNN architecture are removed. A feature vector containing both handcrafted and automated features has been created and given to the LSTM for classification.

Preprocessing
PREP pipeline can be summarized as: (1) Line-noise removal without restricting to a single filtering technique; (2) Robust referencing of the signal relative to an approximation of the "true" average reference; (3) Detection and interpolation of bad channels; and (4) Retention of sufficient data to allow users to use another method or to reverse interpolation of any channel. In theory, line noise is presumed to be at 60 Hz, but practically, the exact line frequency is unknown and variable. A regression model is applied across a range of frequencies centered on each potential line-noise frequency, and the line-noise frequency that maximizes the SNR is selected. This approach is advantageous over notch filtering because it only removes deterministic line components and preserves the spectral energy. Extensive testing proved that the line-noise removal algorithms did not yield good results if no high-pass filtering was done [76]. Thus, a high-pass filter at 1 Hz was applied before removing line-noise.
Short time Fourier transform (STFT) is used to transform EEG signals from the time to frequency domain. EEG signals are not stationary in nature, so STFT provides better results of preprocessing by capturing changes of short duration. In this study, we applied short term Fourier transform on a window of 30 for the data of the preictal state. The window used for preictal class was a 15 s overlap window, and a 30 s non-overlapping window was used for data of the interictal state to cater to the class imbalance problem. This conversion from time domain into frequency domain resulted in a spectrogram as shown in Figure 3. This spectrogram is given as input to the CNN to extract features.

Feature Extraction
Statistical moments have been extracted from all 23 channels as handcrafted features and include mean [77], standard deviation, and skewness [78], and are given by Equations (4)-(6), respectively.
where x i denotes the selected window of the EEG signal, and the total number of samples is given by N. After extraction of these handcrafted features, a custom CNN architecture was also used for feature extraction. CNNs are widely used for automated extraction of features and to classify time series and image data. Typical CNNs have multiple convolutional layers with different numbers of filters. Afterwards, the size of the layer is reduced by using a pooling layer that is then fed to fully connected layers used for classification. The last layer of such systems has neurons equal to the number of classes. Weight updating was as follows: Weight values are denoted by W, l represents the layer number, bias is denoted by B, and x and m are regularization parameters. After convolution, an activation function is used; some of the commonly used activation functions are rectified linear unit, sigmoid activation function, and the softmax activation function, and is computed using following equations.
A custom, three-layered CNN was proposed to extract machine-learned features. The first convolutional layer consists of 16 filters of 3 × 3, the second layer has 32 filters of 3 × 3, and the final convolutional layer is comprised of 64 filters of 3 × 3. All these layers are followed by ReLU activation function and batch normalization. A flattened layer is then applied to get the machine-learned features to a size of 7192. Figure 4 presents the proposed architecture of the CNN, which consists of three layers of convolution, and the number of parameters to be trained in each layer is listed in Table 2.

Classification
For classification, we propose LSTM, a version of a recurrent neural network [79]. After concatenating statistics and CNN features, the combined feature set is transformed to a sequence length of 50 and fed into an LSTM for classification. The LSTM has many gates, including a forget gate and input gates, for storing and forgetting prior cell information. Forget ( f (t)), input (i(t)), and previous LSTM layer (H t−1 ) weights [80], plus cell states and new weights are computed as follows: The suggested method uses an LSTM with 256 neurons at the input and 02 neuron at the output to classify preictal state and interictal state EEG patterns. For classification using LSTM, the proposed approach includes 775,682 trainable parameters.

Results and Discussion
This study proposes a patient-specific seizure prediction system using data from 22 subjects, including 17 males and 5 females. Scalp EEG signals were used to classify between interictal, which is the normal EEG state, and preictal, which is the state a few minutes prior to the beginning of the seizure, states. Preictal class samples were labeled as the positive class, so it is imperative to achieve higher TPR and low false positives. Sensitivity and specificity have been used to validate the proposed method and are computed using the the following equations: where TP stands for true positive, TN denotes true negative, FP is false positive, and FN represents false negative. The first experiment was devised to monitor the effects of different preprocessing techniques. Multiple experiments were performed to identify an optimal window size varying between 5 s to 120 s, and showed that a non-overlapping window of 30 s better characterizes the EEG signals. Therefore, a non-overlapping window (NOW) of 30 s was selected for both the interictal and preictal state segments. We then applied short time Fourier transform on the selected window, and no noise removal was done. Features were extracted using CNN, and the fully connected/dense layers at the end of the CNN were applied for classification. The results obtained in this setting are 64.5% sensitivity and 62.6% specificity. In the second experimental setting, bandpass/bandstop filtering was applied to remove line noise in the preprocessing, while the rest of the settings were kept the same as in the previous experimental setup. We applied Butterworth bandstop filters from 47-53 Hz and 109-112 Hz to remove line noise. Butterworth bandstop filters give a maximally flat response. A high-pass filter was also applied at 1 Hz to remove the DC component. In this experiment, 72.4% sensitivity and 70.3% specificity were achieved. In the third experiment, the filter setting was kept the same, and the issue of class imbalance was targeted. There is a class imbalance issue in the dataset as the ratio of interictal and preictal class samples is 10:1. With the help of an overlapping window (OW) for preictal state segments with an overlap of 15 s, the ratio can be reduced to 5:1. This oversampling greatly improved the results. NOW for interictal and OW for preictal classes (15 s overlap), bandpass filtering, CNN for feature extraction, and ANN for classification yielded the best results, with 78.3% sensitivity and 76.1% specificity. These settings were kept constant in subsequent experiments. Table 3 shows the different experimental settings and the results achieved in each experiment. The second experiment was devised to select the best automated feature extraction model. In the first iteration, we kept the preprocessing settings the same as the first experiment and selected the state-of-the-art Resnet-50, a deep neural network with 50 layers, for feature extraction. Residual learning was also introduced using skip connections. After feature extraction, classification was done using fully-connected layers. The results obtained using this approach were not comparable to the state-of-the-art. A sensitivity of 67.4% and a specificity of 54.3% was achieved. In the second iteration, a much smaller network, Visual Geometry Group-16 was used. In this iteration, the same settings were used as in the first iteration, with the only difference being the use of VGG-16 in place of Resnet. Similarly, ANN was used for classification. The results improved drastically with this change, and sensitivity of 87.3% and specificity of 83.2% were achieved. The results obtained using this setting were on par with the state-of-the-art systems, but to further enhance and analyze the effects of different feature extraction models, we designed a custom convolutional neural network with fewer layers. Similar architectures were also found in the literature after thorough study. The details of this network are explained in the Proposed Method section. The best results were achieved with this network, where a sensitivity of 89.3% and a specificity of 85.2% was achieved. This network was kept constant along with the preprocessing in the subsequent experiment. Table 4 shows the settings and the results achieved in each iteration in tabular form. The third experiment was devised to analyze the efficiency of different classifiers. In this experiment, we fixed the two best settings from the previous experiments, i.e., non-overlapping window for the interictal class and overlapping window of 15 s for the preictal class, with STFT and bandpass/bandstop filtering and the custom CNN for feature extraction. In the first iteration, we used ANN, which is similar to the third iteration of Experiment 2. In the second iteration, we used decision tree and achieved sensitivity of 81.2% and specificity of 76.4%. In the third iteration, support vector machine was used after feature extraction. The results obtained by application of LSTM for classification were sensitivity of 92.8% and specificity of 90.7%. Table 5 shows the settings and the results achieved in each iteration in tabular form.
Different deep learning approaches were tried for automated feature extraction, including Resnet-101, Resnet-50, and VGG-16. Table 6 gives a comparison of these approaches with respect to sensitivity and specificity. Preprocessing was kept the same for all three experiment, with a non-overlapping window of 30 s for the interictal state and a 15 s overlapping window for the preictal state and PREP pipeline to remove the noise; k-fold cross-validation was applied for validation of results by keeping the value of k = 10. The proposed system achieved an average accuracy of 94%, 93.8% sensitivity, 91.2% specificity with standard deviation ranges between 1 to 1.5% for k folds in all performance measures. Table 7 shows the results of k-fold cross-validation achieved from the proposed method. An average prediction time of 19.5 min was achieved. Figure 5 and Table 8 compare the results of the proposed system with the current state-of-the-art systems. Singh et al. [52] achieved better results in terms of accuracy; however, they did not reported the average anticipation time, which is of prime importance in epilepsy prediction systems, as achieving greater accuracy with less time to control the seizure limits the usefulness of that method. Therefore, the proposed system performs better both in terms of specificity and sensitivity.  The receiver operating characteristics curves of state-of-the-art systems and the proposed technique are shown in Figure 6. The sensitivity of the system is plotted against the FPR in these receiver operating characteristics curves to compare the overall performance of several state-of-the-art approaches. This allows us to determine whether or not the performance is satisfactory. If an increase in sensitivity does not result in an increase in false positive alarms, the system is said to be working well. The suggested system clearly beats the current state-of-the-art systems in terms of achieving high TPR with low FPR. It is concluded from the aforementioned evidence that the system proposed in this study is an effective seizure prediction method.

Conclusions and Future Work
This research proposes a method for seizure prediction in epileptic patients using scalp EEG. Patients that have epilepsy can lead a risk-free life if timely and accurate seizure prediction is ensured. In comparison to existing methods, the proposed method uses a regression-based alternative to notch filtering to increase SNR, then combines automated feature extraction with CNN and handcrafted features and performs classification with an LSTM classifier to achieve better sensitivity and specificity. The model was trained and validated on the CHB-MIT scalp EEG dataset, and it achieved 94% accuracy, 93.8% sensitivity, and 91.2% specificity. In the future, intelligent algorithms such as CNN and GAN-based denoising methods [49] can be used to preprocess the data to further increase the SNR. From the deep learning aspect, large numbers of parameters need to be learned; research can be done to make the algorithms more efficient by reducing the number of operations and learnable parameters, which will lead to less-computationally intense models. This study proposes patient-specific seizure prediction; a lot of research can be done to develop a patient nonspecific seizure prediction method.