Seizure Prediction Based on Transformer Using Scalp Electroencephalogram

: Epilepsy is a chronic and recurrent brain dysfunction disease. An acute epileptic attack will interfere with a patient’s normal behavior and consciousness, having a great impact on their life. The purpose of this study was to design a seizure prediction model to improve the quality of patients’ lives and assist doctors in making diagnostic decisions. This paper presents a transformer-based seizure prediction model. Firstly, the time-frequency characteristics of electroencephalogram (EEG) signals were extracted by short-time Fourier transform (STFT). Secondly, a three transformer tower model was used to fuse and classify the features of the EEG signals. Finally, when combined with the attention mechanism of transformer networks, the EEG signal was processed as a whole, which solves the problem of length limitations in deep learning models. Experiments were conducted with a Children’s Hospital Boston and the Massachusetts Institute of Technology database to evaluate the performance of the model. The experimental results show that, compared with previous EEG classiﬁcation models, our model can enhance the ability to use time, frequency, and channel information from EEG signals to improve the accuracy of seizure prediction.


Introduction
Epilepsy, a common chronic brain disease, is caused by sudden excessive discharge of brain nerve cells. Nearly 1% of the world's population is suffering from epilepsy. During seizures, patients have transient involuntary convulsions in one part of the body (partial seizures) or the entire body (generalized seizures), sometimes accompanied by a loss of consciousness and urinary and fecal incontinence [1], greatly affecting the patients' life quality. The electroencephalogram (EEG) is a representative signal containing information on brain electrical activity, which is used as a tool for clinical diagnosis and analysis of epilepsy [2].
Since the 1970s, researchers have carried out a lot of research on seizure detection and prediction tasks [3][4][5]. Automatic epilepsy detection technology can help doctors improve the accuracy of epilepsy diagnosis, greatly save time, enhance the efficiency of diagnosis, and strive for more rescue time. Seizure detection comprises mainly two parts: feature extraction and classification of EEG signals.
In terms of feature extraction, this can be roughly divided into linear analysis methods and nonlinear analysis methods. Linear analysis methods mainly include time-domain analysis, frequency-domain analysis, and time-frequency domain analysis. Early EEG analysis directly extracted features from time-series signals, including time-domain features such as peak value, rhythm, duration, and sharpness [3]. After that, researchers also extracted such characteristics as spike rhythm, the relative amplitude of the EEG signal [4], fractional linear prediction error energy [6], and line length [7] to classify EEG signals. Although the time-domain waveform contains all EEG signals, this method lacks objectivity with large errors. Studies in the literature [8,9] transformed the original EEG signal from time-domain to frequency-domain and extracted the corresponding spectral components for frequency domain analysis. The frequency domain is very important when dealing with epilepsy. In medicine, the brainwaves can be divided into five main bands in the frequency domain, namely, delta (1-4 Hz), theta (4)(5)(6)(7)(8), alpha (8)(9)(10)(11)(12)(13), beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and gamma . After analyzing the above medical frequency bands, Perez et al. [10] further divided the beta band into four sub-bands (beta-1, 12-15 Hz; beta-2, 15-18 Hz; beta-3, 18-25 Hz; and hi-beta, [25][26][27][28][29][30], which had a positive effect on the analysis of EEG signals. Recent studies have shown that the sub-bands are also useful in the classification of epileptic EEG signals. Tsioura et al. [11] first proposed the systematic evaluation of frequency sub-bands for the classification accuracy of epileptic EEG. The results showed that additional frequency band analysis was conducive to the detection of epileptic EEG. However, EEG signals are non-stationary and time-varying, and the premise of frequency domain analysis is that the stationary random signals and the features extracted in the frequency domain do not contain time information. Therefore, a time-frequency analysis method combining the time domain and frequency domain has been gradually developed. Truong et al. [12] used short-time Fourier transform (STFT) to extract time-frequency information from EEG signals and automatically generate optimized features for each patient. In addition, wavelet transform [13][14][15][16] is also an effective tool for time-frequency feature extraction of EEG signals. Studies show that brain activity has complex dynamic characteristics, so it can be regarded as a nonlinear dynamic system. More and more researchers have become interested in nonlinear analysis methods for EEG signals. Li et al. [17] used a multiscale complexity measure to extract the nonlinear feature of the scale-related Lyapunov exponent for classification. Brari [18] proposed a novel EEG feature extraction approach for determining a correlation dimension to analyze the nonlinear characteristics of epileptic EEG signals.
Traditional machine learning algorithms and deep learning models have been successfully applied to the classification of time-series signals, especially EEG signals. In early research, the seizure detection task for EEG signals mainly used traditional machine learning algorithms such as decision tree classifier [19,20], support vector machine [21,22], k-nearest neighbor [23] (KNN), and random forest [24]. Recently, deep learning algorithms have also achieved remarkable results for EEG seizure detection. Because EEG signals are variable and long, recurrent neural networks (RNN), which are appropriate for time series information, have also been gradually applied to EEG signal processing. However, due to the short-term memory problem of RNN, the long short-term memory network (LSTM) [25,26], which combines short-term and long-term memory through a gated structure, has been more widely used in epileptic seizure tasks. In addition, the convolutional neural network (CNN) [27] has the ability of automatic feature learning, which not only simplifies the process of artificially constructing features but also significantly improves the performance of seizure detection. Now, the frontier research on epilepsy EEG has gradually shifted from epilepsy detection to seizure prediction. Since seizure activity is unpredictable and drug-resistant epilepsy patients lack reliable treatment, there is an urgent need to develop an accurate and reliable seizure prediction model. Truong et al. [12] proposed a generalized retrospective and patient-specific seizure prediction method based on STFT and CNN. After that, Truong et al. [28] added the generative adversarial network as a feature extractor to train in an unsupervised way and connect an external classifier for classification to solve the difficulty of manual marking of seizure data and make better use of key data. Kostas et al. [29] introduced the LSTM network into EEG signal prediction of seizures, which significantly improved its performance. Jee et al. [30] proposed a patient-specific EEG channel selection optimization method based on permutation entropy, which combined the genetic algorithm and KNN to predict seizures. Alshebeili et al. [2] designed a seizure prediction framework based on statistical analysis and digital band-limiting filters, which can make robust decisions on signal activities and is suitable for long prediction horizons. Aung et al. [31] studied the advantages of multivariate multi-scale modified distribution entropy by using artificial neural networks and proposed an epilepsy prediction system. However, these deep learning models can only achieve very limited performance in long sequences, and the attention mechanism can process the signal as a whole without being limited by the sequence length. Therefore, we consider applying it to our long-term monitoring of EEG signals. In recent years, the transformer model based on the attention mechanism proposed by Vaswani et al. [32] has achieved great success in the field of natural language processing, and it has been gradually applied to the tasks of computer vision and time series prediction and regression. Tao et al. [33] used the transformer to classify EEG data for brain vision and motor imagination to decode EEG signals from human brain activities.
In this paper, scalp EEG is used to predict the onset of epilepsy to provide enough treatment time for doctors and patients and prevent the occurrence of seizure events. The proposed seizure prediction model in this paper is based on transformer networks for extracting and fusing the three-dimensional features of EEG signals and has achieved remarkable results in seizure prediction. The main contributions of this paper are as follows: 1.
Based on the transformer's self-attention coding layer and gating mechanism, a transformer network with a three-tower structure is established to extract and fuse the features of epileptic EEG signals from different dimensions, which improves the learning ability of the time, spectrum, and spatial information.

2.
A feature engineering scheme for the EEG sequence prediction task is proposed that uses STFT to extract the hidden laws of EEG signals and improve the upperperformance limit of the model.

3.
A three-tower transformer network is proposed to deal with the seizure prediction task. The experimental results for a Children's Hospital Boston and the Massachusetts Institute of Technology (CHB-MIT) dataset show that our method is superior to the existing ones.

EEG Database
In this study, a multi-channel scalp EEG database was used for the experiment. The EEG database we used is the CHB-MIT scalp EEG database collected by Children's Hospital Boston. The data, which was originally published in Shoeb's Ph.D. thesis [34] and can be accessed at https://physionet.org/physiobank/database/chbmit accessed on 11 April 2022, contains 24 medical records from 23 patients with intractable epilepsy. The first 23 cases were from 22 patients (17 females, aged 1.5-19 years; 5 males, aged 3-22 years. Chb01 and Chb21 were obtained from the same female subject at an interval of 1.5 years, and Chb24 had no clear gender or age recorded). Table 1 shows the basic characteristics of the database. The Children's Hospital Boston assessed the potential conditions for surgical intervention after all the epileptic patients stopped antiepileptic drugs for a period of time and monitored the patients for several days. Scalp EEG signals were sampled at 256 Hz with 16-bit resolution, and the position of electrode placement followed the international 10-20 electrode positions system. The total duration of EEG recordings was nearly 983 h, with 198 seizures.

Preprocessing
Epileptic EEG signals can be divided into three stages, namely, interictal, preictal, and ictal. In our seizure prediction task, we mainly focused on the preictal and interictal periods. Therefore, we discarded the ictal EEG segments and transformed seizure prediction into a binary classification problem. At the same time, we rearranged the EEG data for each patient according to the electrode order. In this dataset, the interictal and preictal data were severely unbalanced since some patients had fewer seizures during the monitoring period. In order to overcome this problem, we used overlapping sliding windows in the training stage to obtain more preictal fragments. To ensure that the ratio of the two types of training data was close to 1, we set the window size of preictal to W p as follows: where n p and n i represent the number of EEG segments in each patient's preictal and interictal stage, respectively, and W represents the size of the EEG window. STFT has been widely used in the field of signal processing, and many studies have proved that it has advantages in the analysis of time series [35,36]. Therefore, in this paper, STFT is used to convert the EEG signal into a two-dimensional matrix composed of the time domain and frequency domain, and the EEG signal is analyzed in the time-frequency domain. We selected the cosine analysis window to perform the STFT on the 5 s sample, and then used log 10 to calculate the intensity value. The EEG recordings in the CHB-MIT dataset were contaminated by 60 Hz power line noise. Therefore, in the experiment, we removed the components in the frequency range of 57-63 Hz, 117-123 Hz (power line frequency is 60 Hz), and the direct current (DC) component (0 Hz) so as to conveniently and effectively remove the interference of power line noise and DC component. After STFT processing, the dimension of each 5S EEG segment was (c, 114, 9), where c is the number of EEG channels, and 114 and 9 represent the number of frequency points and time steps, respectively.

Classification
Because of its good performance in natural language processing, transformer has been applied to computer vision, time series classification, and prediction tasks. In recent years, it has also shown good effects in the classification of epilepsy EEG [33,37]. EEG signals contain abundant time, frequency, and spatial information. In this paper, we used transformer networks to analyze and classify the characteristics of epileptic EEG signals in three dimensions. The overall architecture of the network model proposed in this paper is shown in Figure 1. The traditional transformer is used in the processing of text sequence, and the text data (L, D) is generally two-dimensional, where L and D represents the length and dimension of the word vector, respectively. However, after time-frequency domain transformation, the EEG segment has a three-dimensional matrix (C, F, T), in which C, F, and T represent the channel, frequency, and time step of the EEG signal, respectively. We flatten the 3D matrix x ∈ R C×F×T from three different dimensions into two dimensions to obtain x c ∈ R C×(F·T) , x f ∈ R F×(C·T) , x s ∈ R T×(C·F) . These three matrices are taken as the three inputs for the model. Because the three inputs are continuous, the embedding layer of the model is replaced by a full connection layer. To the best of our knowledge, the order of each channel in the EEG sequence has no absolute or relative correlation, so we only added positional encoding on the basis of frequency-wise and step-wise input embeddings. Then, three sets of encoders were used to capture the correlation of each dimension sequence from the aspects of step, frequency, and channel information. Each layer of the encoder has two operations, namely feed forward and multi-head attention composed of multiple self-attention. The inputs of the self-attention mechanism are Q (query), K (key), and V (values), and the calculation formula for its output matrix is shown as follows: The multi-head attention mechanism further improves the self-attention layer, expands the ability of the model to focus on different positions, and gives multiple "representation subspaces" in the attention layer, which is expressed as follows: MultiHead(Q, K, V) = Concat(head 1 , . . . , head m ),where : where m is the number of attention heads, and W i Q , W i K and W i V are the learned projection matrices.
The output matrix of the multi-head attention layer is transferred into a feed-forward neural network to enhance feature extraction. Then, the model adds a gating mechanism to integrate the characteristics of time, frequency, and channel direction. We set the outputs of the three towers as C, S and F, connected them into vectors, obtained H through the linear projection layer, and then assigned gating weights g 1 , g 2 , and g 3 to each output through the softmax function. Finally, the weight of each gate corresponds to the output of the corresponding tower, and the eigenvector y is obtained through the following formula.
Finally, the feature vector y passes through a linear full connection layer and changes it into a vector with dimensions (Batch_size,2), and the classification results of the EEG segments are then obtained.

Performance Evaluation
In the experiment, the prediction effect was achieved by learning and classifying the EEG data characteristics of preictal and interictal segments. In order to evaluate the prediction effect of the prediction system, we introduce some performance evaluation indexes. In this paper, Accuracy, Sensitivity, Specificity, Precision, Recall, and F1-Score are used as the evaluation indexes for the model. The calculation formulae are as follows: where the true positive (TP) and true negative (TN) are correctly classified as preictal and interictal EEG segments, respectively. False positives (FP) and false negatives (FN) indicated that they were incorrectly predicted as preictal and interictal EEG segments.
In real life, however, it is necessary to warn patients and doctors in advance of the impending seizure, so that the medical doctors can be prepared to properly manage the episode. In order to evaluate the performance of the seizure prediction model, we introduced seizure prediction horizon (SPH) and seizure occurrence period (SOP). SOP is defined as the time-period for predicting seizures, while SPH refers to the time-period from the alarm to the beginning of SOP, that is, the period of clinical intervention. The successful prediction of epilepsy means that seizures must occur after SPH and during SOP (Figure 2). If there is a seizure during SPH or no seizure in SOP, it is considered a false alarm. In clinical use, the SOP should not be set too long, otherwise it will increase the anxiety of patients and cause mental stress. The setting of SPH should provide doctors with enough time for clinical interventions [38]. In order to define an appropriate period for SPH and SOP, researchers have studied both. According to the survey in [39], the optimal warning time is 3-5 min. Therefore, we considered setting the SPH to 3-5 min. Nesaei [40] proposed that SPH + SOP should be more than 10 min and less than 90 min to provide treatment for patients and avoid undesirable anxiety. Furthermore, the SOP in many seizure prediction studies [12,38,41] is generally 30 min. Based on the above considerations, SPH and SOP in this paper are set to 3 min and 30 min, respectively, that is, when the model gives a correct alarm, the patient's seizure should occur between 3 min and 33 min later. However, the SOP and SPH are usually unknown clinically, and researchers usually chose values based on assumptions [41]. Studies have shown that the electrical changes that occur in the brain before seizures are difficult to capture with the human eye [42]. Furthermore, due to the specificity of seizures, the length of the pre-onset period will vary from a few minutes to a few hours [43]. This may cause our hypothesis for pre-seizure to deviate from the ground truth, resulting in the wrong label for individual EEG segments that do not have the characteristics of typical pre-seizure, thus affecting the accuracy of the prediction results. The model we proposed is primarily designed to distinguish between preictal and interictal EEG segments, where sporadic false positives during interictal periods are common and false alarms will appear. The false positive rate (FPR), defined as the number of false alarms per hour, is an important index for evaluating the prediction of seizures. In order to reduce the false-positive rate, we further post-processed the classification results of the model. In this paper, we adopt the k-of-n method put forward by [12] to post-process the seizure prediction task. In the process of our experiment, we set n = 30 and k = 24, that is, in the prediction of 30 consecutive EEG segments, at least 24 segments are predicted to be positive before the alarm will be sent out, and the whole process is regarded as an epilepsy prediction process.

Results
For the purpose of verifying the effectiveness of the proposed model, we conducted corresponding tests on the CHB-MIT dataset. In the experiment, the interictal phase was defined as the period 4 h before the seizure and 4 h after the end of the seizure. The division of epileptic EEG signals is shown in Figure 3. According to the annotation file for the database, in 24 cases some EEG recording electrodes added or deleted EEG channels during the measurement process. Therefore, we selected the EEG data from the following 18 electrodes that were considered in most cases for analysis: FP1-F7, F7-T7, T7-P7, P7-O1,  (Figure 4). In addition, because the channels for cases 12 and 13 changed frequently during the recording process, the EEG recordings may have been polluted, and case 24 had frequent seizures, so we did not have enough interictal data for training. Therefore, in the experiment, we removed the EEG data for these three cases and applied the data for the remaining 21 patients to evaluate our model.  The workflow of the seizure prediction system is shown in Figure 5. First, the EEG signals in the preictal and interictal segments were extracted and labeled. Second, overlapping sliding windows were used to balance the number of EEG segments in each patient's preictal and interictal segments. Third, the time-frequency domain characteristic information from the above EEG segments was extracted by STFT to obtain the three-dimensional spectrogram matrixes. The number of preictal and interictal spectrograms is summarized in Table 2. Fourth, the feature matrix was inputted to the transformer network model with a three-tower structure. The characteristics of epileptic EEG signals are learned from the step-wise, frequency-wise, and channel-wise encoders, and the classification of EEG signals is realized through the gating mechanism. Finally, the seizure prediction task is finished after post-processing.  We divided the EEG segments from all patients into a training set and a test set at a ratio of 9:1. The model proposed in this paper was built in a Python 3.7 environment using PyTorch 1.9, and the code is available at https://github.com/xutianyu540/Transformernetwork-with-a-three-tower-structure accessed on 11 April 2022. At the same time, the gated transformer networks (GTN) model [44] was established for comparative experiments. The classification results from the final experiments are shown in Table 3. Table 3. The comparison between the performance of the GTN 1 and our model.

Patient
No. In Table 3, we compare the sensitivity, specificity, and precision of our model with GTN. It can be seen that for all patients, the average sensitivity and specificity of our model were 96.01% and 96.23%, respectively, which were significantly higher than the corresponding values of 92.11% and 91.72% for the GTN model. This indicates that our model achieved satisfactory results in the classification of both preictal and interictal states. Compared with the GTN model, the average precision of our method improved from 91.12% to 95.86%. Figure 6 shows a comparison of the classification accuracy and F1 score for each patient between the GTN model and our model. It can be seen that our model was greatly improved in accuracy and F1 score. The results show that the proposed model improves the utilization of temporal, spectral, and spatial information of EEG signals, and has a better effect on the classification of epileptic EEG signals. The classification results were subsequently post-processed to obtain the prediction results from our seizure prediction model. The FPR classified for our network alone and after post-processing is shown in Table 4. It can be seen that after post-processing, our FPR decreased, on average, from 0.19/h to 0.047/h. This effectively eliminated sporadic errors from the process of model classification and improved the performance of our epileptic seizure prediction system so as to better complete the prediction task.

Discussion
In traditional methods, researchers usually use manually constructed features to extract the time domain, frequency domain, and nonlinear features of EEG signals. We proposed a new method that can better use temporal, spectral, and spatial information for extraction. First, the spectrum of 18 channel EEG segments is obtained by STFT, and then, the time, frequency and channel information from the spectrum are analyzed by the stepwise, frequency-wise, and channel-wise encoders in the transformer tower, respectively. When combined with the gated unit, the features of time, frequency, and spatial direction are fused to realize the classification of EEG signals. The analysis results show that our model has better performance and more accurate classification results.
The CHB-MIT dataset used in this study has also been used for performance evaluation in other studies. Truong et al. [12] combined STFT and CNN to automatically extract and classify the time-frequency features of EEG signals, and in this instance, the prediction sensitivity was 81.2% and FPR was 0.16/h. Rukhsar et al. [45] extracted eight time-based features and predicted seizures through multivariate statistical process control (MSPC), and obtained 88.89% sensitivity and 0.39/h FPR. Xu et al. [46] proposed an end-to-end deep learning solution based on CNN. The overall sensitivity and FPR for scalp EEG data were 98.8% and 0.074/h, respectively. Tang et al. [47] proposed a novel framework of multiview convolutional gated recurrent network (Mv-CGRN) and embedded the attention mechanism in Mv-CGRN, and determined the best feature combination of each patient by adaptively adjusting the weight parameters, achieving an average sensitivity of 94.50% and an average FPR of 0.118/h. Zhao et al. [48] used a binary single-dimensional convolutional neural network (BSDCNN) to predict seizures with a sensitivity of 88.89% and an FPR of 0.39 per hour. Zhang et al. [49] used a simple CNN model to classify the correlation matrix obtained by calculating the Pearson correlation coefficient to distinguish the preictal states from the interictal ones and obtained a sensitivity of 92.9%. Table 5 shows a comparison of the results for our method with the classification algorithms from the above literature. From the comparative results, our method does not require the complex process of manually constructing features and it has a lower FPR and higher sensitivity. Although the CNN network proposed by [46] had higher sensitivity, this study only evaluated EEG data of seven patients without good generalization. Therefore, from the perspective of overall performance, our model is more effective at the epilepsy prediction task. In this work, the proposed model has several limitations. (1) After the feature information is fused by the gated unit, the model only uses the full connection layer to classify it, which may lead to redundant parameters and insufficient expression of spatial structure. (2) Since different patients have specific seizures, their physiological preictal periods are different. In this paper, if we set SPH as a fixed value, the defined preictal segments and the prototypical physiological preictal signature (the ground truth) do not match completely [41], which will lead to some training samples being labeled incorrectly. In the future, we will further optimize the network structure and explore more accurate classifiers to classify the characteristic information of an epileptic EEG. In addition, we also need to analyze the EEG signals of patients in the preictal period to set a more appropriate SPH value.

Conclusions
An effective seizure prediction method will not only better help doctors diagnose and reduce the pain of patients, it can also help them avoid dangerous activities such as driving or swimming before the onset of seizures. In this paper, the time domain, frequency domain, and channel information of EEG signals were fused, combined with a gating unit and transformer model to classify EEG signals for the prediction of patients' seizures. The prediction sensitivity and FPR of our model were 96.01% and 0.047/h, respectively, which achieves effective classification of epileptic EEG signals and has good seizure prediction performance. In the future work, we will further optimize our network performance and apply it to other datasets to achieve a more effective seizure prediction system.

Data Availability Statement:
The data presented in this study are openly available in https:// physionet.org/physiobank/database/chbmit (accessed on 10 September 2021).